This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Domain-Independent Dynamic Programming

Ryo Kuroiwa [email protected] J. Christopher Beck [email protected]
Abstract

For combinatorial optimization problems, model-based paradigms such as mixed-integer programming (MIP) and constraint programming (CP) aim to decouple modeling and solving a problem: the ‘holy grail’ of declarative problem solving. We propose domain-independent dynamic programming (DIDP), a novel model-based paradigm based on dynamic programming (DP). While DP is not new, it has typically been implemented as a problem-specific method. We introduce Dynamic Programming Description Language (DyPDL), a formalism to define DP models based on a state transition system, inspired by artificial intelligence (AI) planning. We show that heuristic search algorithms can be used to solve DyPDL models and propose seven DIDP solvers. We experimentally compare our DIDP solvers with commercial MIP and CP solvers (solving MIP and CP models, respectively) on common benchmark instances of eleven combinatorial optimization problem classes. We show that DIDP outperforms MIP in nine problem classes, CP also in nine problem classes, and both MIP and CP in seven. DIDP also achieves superior performance to existing state-based solvers including domain-independent AI planners.

keywords:
Dynamic Programming , Combinatorial Optimization , Heuristic Search , State Space Search
journal: Artificial Intelligence
\affiliation

[nii]organization=National Institute of Informatics, addressline=2-1-2 Hitotsubashi, Chiyoda-ku, city=Tokyo, postcode=101-8430, country=Japan

\affiliation

[uoft]organization=Department of Mechanical and Industrial Enginnering, University of Toronto,addressline=5 King’s College Road, city=Toronto, postcode=M5S 3G8, state=Ontario, country=Canada

{highlights}

We propose domain-independent dynamic programming (DIDP), a novel model-based paradigm for combinatorial optimization.

The modeling language for DIDP is designed so that a user can investigate efficient models by incorporating redundant information.

We implement DIDP solvers using heuristic search in an open-source software framework.

The DIDP solvers outperform commercial mixed-integer programming and constraint programming solvers in a number of problem classes.

1 Introduction

Combinatorial optimization is a class of problems requiring a set of discrete decisions to be made to optimize an objective function. It has wide real-world application fields including transportation [1], scheduling [2, 3], and manufacturing [4] and thus has been an active research topic in artificial intelligence (AI) [5, 6, 7] and operations research (OR) [8]. Among methodologies to solve combinatorial optimization problems, model-based paradigms such as mixed-integer programming (MIP) and constraint programming (CP) are particularly important as they represent steps toward the ‘holy grail’ of declarative problem-solving [9]: in model-based paradigms, a user formulates a problem as a mathematical model and then solves it using a general-purpose solver, i.e., a user just needs to define a problem to solve it. Benefitting from this declarative nature, MIP and CP have been applied to a wide range of combinatorial optimization problems [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20].

Dynamic programming (DP) [21] is a commonly used computational method to solve diverse decision-making problems, but little work has considered DP as a model-based paradigm for combinatorial optimization. In DP, a problem is described by a Bellman equation, a recursive equation where the optimal objective value of the original problem is defined by the optimal objective values of subproblems (states). Although a Bellman equation can be viewed as a declarative mathematical model, it has typically been solved by a problem-specific algorithm in previous work on DP for combinatorial optimization problems [22, 23, 24, 25, 26, 27, 28, 29].

We propose domain-independent dynamic programming (DIDP), a novel model-based paradigm for combinatorial optimization based on DP. Our modeling language is inspired by domain-independent AI planning [30], where a problem is described by a common modeling language based on a state transition system. At the same time, DIDP follows the approach of OR, which investigates efficient optimization models. For example, in MIP, different constraints can result in different strengths of linear relaxations while sharing the same integer feasible region. In CP, global constraints are specifically designed for common substructures of combinatorial optimization problems so that a solver can exploit them to achieve high performance. Moreover, problem-specific DP methods for combinatorial optimization sometimes exploit problem-specific information to reduce the effort to solve the Bellman equations [23, 25, 26, 28, 29]. Following these approaches, DIDP allows (but does not require) a user to explicitly include information that is implied by the problem definition and can be potentially exploited by a solver. This design opens up the possibility of investigating the development of better models as in MIP and CP. To solve the models, we develop general-purpose DIDP solvers using state space search, particularly heuristic search algorithms [31, 32, 33], which are also used in AI planners [34, 35, 36].

1.1 Related Work

Some prior studies used DP as a problem-independent framework. There are four main directions explored in the literature: theoretical formalisms for DP, automated DP in logic programming languages, DP languages or software for applications other than combinatorial optimization, and decision diagram-based solvers. However, compared to DIDP, they are insufficient to be model-based paradigms for combinatorial optimization; they either are purely theoretical, are not explicitly designed for combinatorial optimization, or require additional information specific to the solving algorithm.

Some problem-independent formalisms for DP have been developed, but they were not actual modeling languages. In their seminal work, Karp and Held [37] introduced a sequential decision process (sdp), a problem-independent formalism for DP based on a finite state automaton. While they used sdp to develop DP algorithms problem-independently, they noted that “In many applications, however, the state-transition representation is not the most natural form of problem statement” and thus introduced a discrete decision process, a formalism to describe a combinatorial optimization problem, from which sdp is derived. This line of research was further investigated by subsequent work [38, 39, 40, 41, 42, 43, 44]. In particular, Kumar and Kanal [44] proposed a theoretical formalism based on context-free grammar.

In logic programming languages, a recursive function can be defined with memoization [45] or tabling [46, 47] techniques that store the result of a function evaluation in memory and reuse it given the same arguments. To efficiently perform memoization, Puchinger and Stuckey [48] proposed allowing a user to define a bound on the objective value to prune states, motivated by applications to combinatorial optimization. Picat [49] hybridizes logic programming with other paradigms including MIP, CP, and AI planning, and the AI planning module uses tabling with state pruning based on a user-defined objective bound. Unlike DIDP, the above approaches cannot model dominance between different states since memoization or tabling can avoid the evaluation of a state only when the state is exactly the same as a previously evaluated one.

DP languages and software have been developed for multiple application fields. Algebraic dynamic programming (ADP) [50] is a software framework to formulate a DP model using context-free grammar. It was designed for bioinformatics and was originally limited to DP models where a state is described by a string. While ADP has been extended to describe a state using data structures such as a set, it is still focused on bioinformatics applications [51]. Dyna is a declarative programming language for DP built on logic programming, designed for natural language processing and machine learning applications [52, 53]. For optimal control, DP solvers have been developed in MATLAB [54, 55].

Hooker [56] pointed that a decision diagram (DD), a data structure based on directed graphs, can be used to represent a DP model, and a solution can be extracted as a path in the DD. While constructing such a DD may require large amount of computational space and time, Bergman et al. [57] proposed DD-based branch-and-bound, an algorithm to solve a DP model by repeatedly constructing relaxed and restricted DDs, which are approximations of the exact DD; they are computationally cheaper to construct and give bounds on the optimal objective value. To construct a relaxed DD, DD-based branch-and-bound requires a merging operator, a function to map two states to a single state. The currently developed general-purpose solvers using DD-based branch-and-bound, ddo [57, 58] and CODD [59], require a user to provide a merging operator in addition to a DP model. Therefore, compared to DIDP, they are DD solvers rather than model-based paradigms based on DP.

Several approaches do not fit in the above directions. DP2PNSolver [60] takes a DP model coded in a Java style language, gDPS, as input and compiles it to program code, e.g., Java code. It is not designed to incorporate information more than a Bellman equation, and the output program code solves the Bellman equation by simple state enumeration. DP was also used as a unified modeling format to combine CP with reinforcement learning [61, 62], but this approach is still in the scope of CP; in their framework, a CP model based on a DP model is given to a CP solver, and a reinforcement learning agent trained in a Markov decision process based on the same DP model is used for a value selection heuristic of the CP solver.

Our modeling language is inspired by Planning Domain Definition Language (PDDL) [63, 64], a standard modeling language for AI planning, which is based on a state transition system. While PDDL and related languages such as PDDL+ [65] and Relational Dynamic Influence Diagram Language (RDDL) [66] are dominant in AI planning, other modeling languages for state transition systems have been proposed in AI. Hernádvölgyi et al. [67] proposed PSVN, where a state is described by a fixed length vector. PSVN is designed to allow the automatic generation of a heuristic function for heuristic search algorithms. In the CP community, HADDOCK [68], a modeling language for DDs based on a state transition system, was proposed and used with a CP solver for constraint propagation.

Martelli and Montanari [69] and subsequent work [70, 71] showed that a generalized version of heuristic search algorithms can be used for DP. Unified frameworks for DP, heuristic search, and branch-and-bound were proposed by Ibaraki [72] and Kumar and Kanal [44]. Holte and Fan [73] discussed the similarity between abstraction heuristics for heuristic search and state space relaxation for DP.

1.2 Contributions

We summarize the contributions of this paper as follows:

  • 1.

    We propose DIDP, a novel model-based paradigm for combinatorial optimization based on DP. While existing model-based paradigms such as MIP and CP use constraint-based representations of problems, DIDP uses a state-based representation.

  • 2.

    We develop a state-based modeling language for DIDP. Although our language is inspired by PDDL, an existing language for AI planning, it is specifically designed for combinatorial optimization and has features that PDDL does not have: a user can investigate efficient optimization models by incorporating redundant information such as dominance relation and bounds.

  • 3.

    We formally show that a class of state space search algorithms can be used as DIDP solvers under reasonable theoretical conditions. We implement such solvers in a software framework and demonstrate that they empirically outperform MIP, CP, and existing state-based solvers including AI planners in a number of combinatorial optimization problems. Since our framework is published as open-source software, AI researchers can use it as a platform to develop and apply state space search algorithms to combinatorial optimization problems.

1.3 Overview

In Section 2, we introduce DP with an example. In Section 3, we define Dynamic Programming Description Language (DyPDL), a modeling formalism for DIDP. We also present the theoretical properties of DyPDL including its computational complexity. In Section 4, we present YAML-DyPDL, a practical modeling language for DyPDL. In Section 5, we propose DIDP solvers. First, we show that state space search can be used to solve DyPDL models under reasonable theoretical conditions. In particular, we prove the completeness and optimality of state space search for DyPDL. Then, we develop seven DIDP solvers using existing heuristic search algorithms. In Section 6, we formulate DyPDL models for existing combinatorial optimization problem classes. In Section 7, we experimentally compare the DIDP solvers with commercial MIP and CP solvers using the common benchmark instances of eleven problem classes. We show that the best-performing DIDP solver outperforms both MIP and CP on seven problem classes while the worst performer does so on six of eleven. We also demonstrate that DIDP achieves superior or competitive performance to existing state-based frameworks, domain-independent AI planners, Picat, and ddo. Section 8 concludes the paper.

This paper is an extended version of two conference papers [74, 75], which lacked formal and theoretical aspects due to space limitations. This paper has the following new contributions:

  • 1.

    We formally define DyPDL and theoretically analyze it.

  • 2.

    We formally define heuristic search for DyPDL and show its completeness and optimality.

  • 3.

    We introduce DyPDL models for two additional problems, the orienteering problem with time windows and the multi-dimensional knapsack problem, and use them in the experimental evaluation. These two problems are maximization problems in contrast to the nine minimization problems used in the conference papers.

  • 4.

    We show the optimality gap achieved by DIDP solvers in the experimental evaluation. Our dual bound computation method for beam search is improved from our previous work [76].

  • 5.

    We empirically investigate the importance of heuristic functions in a DIDP solver.

  • 6.

    We empirically compare the DIDP solvers with domain-independent AI planners, Picat, and ddo.

2 Dynamic Programming

Dynamic Programming (DP) is a computational problem-solving method, where a problem is recursively decomposed to subproblems, and each problem is represented by a state [21]. In this paper, we focus on DP to solve combinatorial optimization problems, where we make a finite number of discrete decisions to optimize an objective function. In particular, we assume that a state is transformed into another state by making a decision, and a solution is a finite sequence of decisions. We acknowledge that DP has more general applications beyond our assumptions; for example, when DP is applied to Markov decision processes, the outcome of a decision is represented by a probability distribution over multiple states, and a solution is a policy that maps a state to a probability distribution over decisions [21, 77, 78]. However, this is not the topic of this work as we restrict ourselves to solutions that can be represented by a path of states.

As a running example, we use a combinatorial optimization problem, the traveling salesperson problem with time windows (TSPTW) [79]. In this problem, a set of customers N={0,,n1}N=\{0,...,n-1\} is given. A solution is a tour starting from the depot (index 0), visiting each customer exactly once, and returning to the depot. Visiting customer jj from ii requires the travel time cij0c_{ij}\geq 0. In the beginning, t=0t=0. The visit to customer ii must be within a time window [ai,bi][a_{i},b_{i}]. Upon earlier arrival, waiting until aia_{i} is required. The objective is to minimize the total travel time, excluding the waiting time. Finding a valid tour for TSPTW is NP-complete [79].

Dumas et al. [23] applied DP to solve TSPTW. In their approach, a state is a tuple of variables (U,i,t)(U,i,t), where UU is the set of unvisited customers, ii is the current location, and tt is the current time. At each step, we consider visiting one of the unvisited customers as a decision. the value function VV maps a state to the optimal cost of visiting all customers in UU and returning to the depot starting from location ii at time tt. The value function is defined by the following recursive equation called a Bellman equation [21]:

V(U,i,t)={ci0if U=minjU:t+cijbjcij+V(U{j},j,max{t+cij,aj})if U.\displaystyle\begin{split}&V(U,i,t)=\begin{cases}c_{i0}&\text{if }U=\emptyset\\ \min\limits_{j\in U:t+c_{ij}\leq b_{j}}c_{ij}+V(U\setminus\{j\},j,\max\{t+c_{ij},a_{j}\})&\text{if }U\neq\emptyset.\\ \end{cases}\end{split} (1)

When all customers are visited (U=U=\emptyset), the objective value is defined to be the travel time to return to the depot. Otherwise, we consider visiting each customer jj from ii. The objective value is defined at the minimum of the sum of the travel time to jj and the optimal objective value of the resulting state. We assume the second line of Equation (1) to be \infty if there is no jUj\in U with t+cijbjt+c_{ij}\leq b_{j}. The optimal objective value for the original problem is V(N{0},0,0)V(N\setminus\{0\},0,0). Dumas et al. [23] solved the Bellman equation by enumerating states.

The state-based problem representation in DP is fundamentally different from existing model-based paradigms such as mixed-integer programming (MIP) and constraint programming (CP), which are constraint-based: the problem is defined by a set of decision variables, constraints on the variables, and an objective function. For example, a MIP model for TSPTW [10] uses a binary variable xijx_{ij} to represent if customer jj is visited from ii and a continuous variable tit_{i} to represent the time to visit ii. The model includes constraints ensuring that each customer is visited exactly once (jN{i}xij=jN{i}xji=1\sum_{j\in N\setminus\{i\}}x_{ij}=\sum_{j\in N\setminus\{i\}}x_{ji}=1), and the time window is satisfied (aitibia_{i}\leq t_{i}\leq b_{i}). The objective function is represented as iNjN{i}cijxij\sum_{i\in N}\sum_{j\in N\setminus\{i\}}c_{ij}x_{ij}. Similarly, a CP model uses a variable to represent the time to visit each customer and constraints to ensure that the time window is satisfied [11].

3 Dynamic Programming Description Language (DyPDL)

To shift from problem-specific DP methods to a declarative model-based paradigm based on DP, we formally define a class of models that we focus on in this paper. We introduce a solver-independent theoretical formalism, Dynamic Programming Description Language (DyPDL). Since DP is based on a state-based representation, we design DyPDL inspired by AI planning formalisms such as STRIPS [80] and SAS+ [81], where a problem is described as a state transition system. In DyPDL, a problem is described by states and transitions between states, and a solution corresponds to a sequence of transitions satisfying particular conditions. A state is a complete assignment to state variables.

Definition 1.

A state variable is either an element, set, or numeric variable. An element variable vv has domain Dv=0+D_{v}=\mathbb{Z}^{+}_{0} (nonnegative integers). A set variable vv has domain Dv=20+D_{v}=2^{\mathbb{Z}^{+}_{0}} (sets of nonnegative integers). A numeric variable vv has domain Dv=D_{v}=\mathbb{Q} (rational numbers).

Definition 2.

Given a set of state variables 𝒱={v1,,vn}\mathcal{V}=\{v_{1},...,v_{n}\}, a state is a tuple of values S=(d1,,dn)S=(d_{1},...,d_{n}) where diDvid_{i}\in D_{v_{i}} for i=1,,ni=1,...,n. We denote the value did_{i} of variable viv_{i} in state SS by S[vi]S[v_{i}] and the set of all states by 𝒮=Dv1××Dvn\mathcal{S}=D_{v_{1}}\times...\times D_{v_{n}}.

Example 1.

In our TSPTW example in Equation (1), a state is represented by three variables, a set variable UU, an element variable ii, and a numeric variable tt.

A state can be transformed into another state by changing the values of the state variables. To describe such changes, we define expressions: functions returning a value given a state.

Definition 3.

An element expression e:𝒮0+e:\mathcal{S}\to\mathbb{Z}^{+}_{0} is a function that maps a state to a nonnegative value. A set expression e:𝒮20+e:\mathcal{S}\to 2^{\mathbb{Z}^{+}_{0}} is a function that maps a state to a set of nonnegative values. A numeric expression e:𝒮e:\mathcal{S}\to\mathbb{Q} is a function that maps a state to a numeric value. A condition c:𝒮{,}c:\mathcal{S}\to\{\bot,\top\} is a function that maps a state to a Boolean value. Given a state SS, we denote c(S)=c(S)=\top by ScS\models c and c(S)=c(S)=\bot by S⊧̸cS\not\models c. For a set of conditions CC, we denote cC,Sc\forall c\in C,S\models c by SCS\models C and cC,S⊧̸c\exists c\in C,S\not\models c by S⊧̸CS\not\models C.

With the above expressions, we formally define transitions, which transform one state into another. A transition has an effect to update state variables, preconditions defining when it is applicable, and a cost expression to update the objective value.

Definition 4.

A transition τ\tau is a 3-tuple 𝖾𝖿𝖿τ,𝖼𝗈𝗌𝗍τ,𝗉𝗋𝖾τ\langle\mathsf{eff}_{\tau},\mathsf{cost}_{\tau},\mathsf{pre}_{\tau}\rangle where 𝖾𝖿𝖿τ\mathsf{eff}_{\tau} is the effect, 𝖼𝗈𝗌𝗍τ\mathsf{cost}_{\tau} is the cost expression, and 𝗉𝗋𝖾τ\mathsf{pre}_{\tau} is the set of preconditions.

  • 1.

    The effect 𝖾𝖿𝖿τ\mathsf{eff}_{\tau} is a tuple of expressions (e1,,en)(e_{1},...,e_{n}) where ei:𝒮Dvie_{i}:\mathcal{S}\to D_{v_{i}}. We denote the expression eie_{i} corresponding to a state variable viv_{i} by 𝖾𝖿𝖿τ[vi]\mathsf{eff}_{\tau}[v_{i}].

  • 2.

    The cost expression 𝖼𝗈𝗌𝗍τ:×𝒮\mathsf{cost}_{\tau}:\mathbb{Q}\times\mathcal{S}\to\mathbb{Q} is a function that maps a numeric value xx and a state SS to a numeric value 𝖼𝗈𝗌𝗍τ(x,S)\mathsf{cost}_{\tau}(x,S).

  • 3.

    𝗉𝗋𝖾τ\mathsf{pre}_{\tau} is a set of conditions, i.e., c:𝒮{,}c:\mathcal{S}\to\{\bot,\top\} for each c𝗉𝗋𝖾τc\in\mathsf{pre}_{\tau}, and each of such a condition is called a precondition of τ\tau.

Given a set of transitions 𝒯\mathcal{T}, the set of applicable transitions 𝒯(S)\mathcal{T}(S) in a state SS is defined as 𝒯(S)={τ𝒯S𝗉𝗋𝖾τ}\mathcal{T}(S)=\{\tau\in\mathcal{T}\mid S\models\mathsf{pre}_{\tau}\}. Given a state SS and a transition τ𝒯(S)\tau\in\mathcal{T}(S), the successor state S[[τ]]S[\![\tau]\!], which results from applying τ\tau in SS, is defined as S[[τ]][v]=𝖾𝖿𝖿τ[v](S)S[\![\tau]\!][v]=\mathsf{eff}_{\tau}[v](S) for each variable vv. For a sequence of transitions σ=σ1,,σm\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle, the state S[[σ]]S[\![\sigma]\!], which results from applying σ\sigma in SS, is defined as S[[σ]]=S[[σ1]][[σ2]][[σm]]S[\![\sigma]\!]=S[\![\sigma_{1}]\!][\![\sigma_{2}]\!]...[\![\sigma_{m}]\!]. If σ\sigma is an empty sequence, i.e., σ=\sigma=\langle\rangle, S[[σ]]=SS[\![\sigma]\!]=S.

Example 2.

In our TSPTW example in Equation (1), we consider a transition τj\tau_{j} to visit customer jj. The effect of τj\tau_{j} is defined as 𝖾𝖿𝖿τj[U](S)=S[U]{j}\mathsf{eff}_{\tau_{j}}[U](S)=S[U]\setminus\{j\}, 𝖾𝖿𝖿τj[i](S)=j\mathsf{eff}_{\tau_{j}}[i](S)=j, and 𝖾𝖿𝖿τj[t](S)=max{S[t]+cS[i],j,aj}\mathsf{eff}_{\tau_{j}}[t](S)=\max\{S[t]+c_{S[i],j},a_{j}\}. The preconditions are SjS[U]S\mapsto j\in S[U] and SS[t]+cS[i],jbjS\mapsto S[t]+c_{S[i],j}\leq b_{j}. The cost expression is defined as 𝖼𝗈𝗌𝗍τj(x,S)=cS[i],j+x\mathsf{cost}_{\tau_{j}}(x,S)=c_{S[i],j}+x.

We define a base case, where further transitions are not considered, and the cost is defined by a function of the state. We use the term ‘base case’ to represent the conditions where the recursion of a Bellman equation stops.

Definition 5 (Base case).

A base case BB is a tuple CB,𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B\langle C_{B},\mathsf{base\_cost}_{B}\rangle where CBC_{B} is a set of conditions and 𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B:𝒮\mathsf{base\_cost}_{B}:\mathcal{S}\to\mathbb{Q} is a numeric expression. A state SS with SCBS\models C_{B} is called a base state, and its base cost is 𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)\mathsf{base\_cost}_{B}(S).

Example 3.

In our TSPTW example in Equation (1), a base case BB is defined by CB={SS[U]=}C_{B}=\{S\mapsto S[U]=\emptyset\} and 𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)=cS[i],0\mathsf{base\_cost}_{B}(S)=c_{S[i],0}.

Now, we define the state transition system based on the above definitions. We additionally introduce the target state, which is the start of the state transition system, and the state constraints, which must be satisfied by all states. The name ‘target state’ is inspired by a Bellman equation; the target state is the target of the Bellman equation, corresponding to the original problem, to which we want to compute the optimal objective value. The cost of a solution is computed by repeatedly applying the cost expression of a transition backward from the base cost; this is also inspired by recursion in a Bellman equation, where the objective value of the current state is computed from the objective values of the successor states.

Definition 6.

A DyPDL model is a tuple 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with a specification of minimization or maximization, where 𝒱\mathcal{V} is the set of state variables, S0S^{0} is a state called the target state, 𝒯\mathcal{T} is the set of transitions, \mathcal{B} is the set of base cases, and 𝒞\mathcal{C} is a set of conditions called state constraints.

A solution for a DyPDL model is a sequence of transitions σ=σ1,,σm\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle such that

  • 1.

    All transitions are applicable, i.e., σi𝒯(Si1)\sigma_{i}\in\mathcal{T}(S^{i-1}) where Si=Si1[[σi]]S^{i}=S^{i-1}[\![\sigma_{i}]\!] for i=1,,mi=1,...,m.

  • 2.

    The target state and all intermediate states do not satisfy a base case, i.e., B,Si⊧̸CB\forall B\in\mathcal{B},S^{i}\not\models C_{B} for i=0,,m1i=0,...,m-1.

  • 3.

    The target state, all intermediate states, and the final state satisfy the state constraints, i.e., Si𝒞S^{i}\models\mathcal{C} for i=0,,mi=0,...,m.

  • 4.

    The final state is a base state, i.e., B,SmCB\exists B\in\mathcal{B},S^{m}\models C_{B}.

If the target state is a base state and satisfies the state constraints, an empty sequence \langle\rangle is a solution. Given a state SS, an SS-solution is a sequence of transitions satisfying the above conditions except that it starts from a state SS instead of the target state.

For minimization, the cost of an SS-solution σ=σ1,,σm\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle is recursively defined as follows:

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)={𝖼𝗈𝗌𝗍σ1(𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,,σm,S[[σ1]]),S) if |σ|1minB:SCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S) if |σ|=0\mathsf{solution\_cost}(\sigma,S)=\begin{cases}\mathsf{cost}_{\sigma_{1}}(\mathsf{solution\_cost}(\langle\sigma_{2},...,\sigma_{m}\rangle,S[\![\sigma_{1}]\!]),S)&\text{ if }|\sigma|\geq 1\\ \min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)&\text{ if }|\sigma|=0\\ \end{cases}

where |σ||\sigma| is the number of transitions in σ\sigma, i.e., |σ|=m|\sigma|=m, and σ2,,σm=\langle\sigma_{2},...,\sigma_{m}\rangle=\langle\rangle if m=1m=1. For maximization, we replace min\min with max\max. For a solution for the DyPDL model (S0S^{0}-solution) σ\sigma, we denote its cost by 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ)\mathsf{solution\_cost}(\sigma), omitting S0S^{0}. An optimal SS-solution for minimization (maximization) is an SS-solution with its cost less (greater) than or equal to the cost of any SS-solution. An optimal solution for a DyPDL model is an optimal S0S^{0}-solution.

Example 4.

In our TSPTW example in Equation (1), the state variables are UU, ii, and tt as in Example 1, the target state is S0=(N{0},0,0)S^{0}=(N\setminus\{0\},0,0), a transition to visit customer jj is defined for each jN{0}j\in N\setminus\{0\} as in Example 2, and one base case is defined as in Example 3. For a solution corresponding a tour π1,,πn1\langle\pi_{1},...,\pi_{n-1}\rangle, where πiN{0}\pi_{i}\in N\setminus\{0\} and πiπj\pi_{i}\neq\pi_{j} for iji\neq j, the cost is defined as c0,π1++cπn2,πn1+cπn1,0c_{0,\pi_{1}}+...+c_{\pi_{n-2},\pi_{n-1}}+c_{\pi_{n-1},0}. The optimization direction is minimization. We do not use state constraints in this example at this point, but we show a use case later in Section 3.3.

3.1 Complexity

We have defined expressions as functions of states and have not specified further details. Therefore, the complexity of an optimization problem with a DyPDL model depends on the complexity of evaluating expressions. In addition, for example, if we have an infinite number of preconditions, evaluating the applicability of a single transition may not terminate in finite time. Given these facts, we consider the complexity of a model whose definition is finite.

Definition 7.

A DyPDL model is finitely defined if the following conditions are satisfied:

  • 1.

    The numbers of the state variables, transitions, base cases, and state constraints are finite.

  • 2.

    Each transition has a finite number of preconditions.

  • 3.

    Each base case has a finite number of conditions.

  • 4.

    All the effects, the cost expression, the preconditions of the transitions, the conditions and the costs of the base cases, and the state constraints can be evaluated in finite time.

Even with this restriction, finding a solution for a DyPDL model is an undecidable problem. We show this by reducing one of the AI planning formalisms, which is undecidable, into a DyPDL model. We define a numeric planning task and its solution in Definitions 8 and 9 following Helmert [82].

Definition 8.

A numeric planning task is a tuple VP,VN,Init,Goal,Ops\langle V_{P},V_{N},\textit{Init},\textit{Goal},\textit{Ops}\rangle where VPV_{P} is a finite set of propositional variables, VNV_{N} is a finite set of numeric variables, Init is a state called the initial state, Goal is a finite set of propositional and numeric conditions, and Ops is a finite set of operators.

  • 1.

    A state is defined by a pair of functions (α,β)(\alpha,\beta), where α:VP{,}\alpha:V_{P}\to\{\bot,\top\} and β:VN\beta:V_{N}\to\mathbb{Q}.

  • 2.

    A propositional condition is written as v=v=\top where vVPv\in V_{P}. A state (α,β)(\alpha,\beta) satisfies it if α(v)=\alpha(v)=\top.

  • 3.

    A numeric condition is written as f(v1,,vn) relop 0f(v_{1},...,v_{n})\textbf{ relop }0 where v1,,vnVNv_{1},...,v_{n}\in V_{N}, f:nf:\mathbb{Q}^{n}\to\mathbb{Q} maps nn numeric variables to a rational number, and relop{=,,<,,,>}\textbf{relop}\in\{=,\neq,<,\leq,\geq,>\}. A state (α,β)(\alpha,\beta) satisfies it if f(β(v1),,β(vn)) relop 0f(\beta(v_{1}),...,\beta(v_{n}))\textbf{ relop }0.

An operator in Ops is a pair Pre,Eff\langle\textit{Pre},\textit{Eff}\rangle, where Pre is a finite set of conditions (preconditions), and Eff is a finite set of propositional and numeric effects.

  • 1.

    A propositional effect is written as vtv\leftarrow t where vVPv\in V_{P} and t{,}t\in\{\bot,\top\}.

  • 2.

    A numeric effect is written as vf(v1,,vn)v\leftarrow f(v_{1},...,v_{n}) where v,v1,,vnVNv,v_{1},...,v_{n}\in V_{N} and f:nf:\mathbb{Q}^{n}\to\mathbb{Q} maps nn numeric variables to a rational number.

All functions that appear in numeric conditions and numeric effects are restricted to functions represented by arithmetic operators {+,,,/}\{+,-,\cdot,/\}, but the divisor must be a non-zero constant.

Definition 9.

Given a numeric planning task VP,VN,Init,Goal,Ops\langle V_{P},V_{N},\textit{Init},\textit{Goal},\textit{Ops}\rangle, the state transition graph is a directed graph where nodes are states and there is an edge ((α,β),(α,β))((\alpha,\beta),(\alpha^{\prime},\beta^{\prime})) if there exists an operator Pre,EffOps\langle\textit{Pre},\textit{Eff}\rangle\in\textit{Ops} satisfying the following conditions.

  • 1.

    (α,β)(\alpha,\beta) satisfies all conditions in Pre.

  • 2.

    α(v)=t\alpha^{\prime}(v)=t if vtEffv\leftarrow t\in\textit{Eff} and α(v)=α(v)\alpha^{\prime}(v)=\alpha(v) otherwise.

  • 3.

    β(v)=f(β(v1),,β(vn))\beta^{\prime}(v)=f(\beta(v_{1}),...,\beta(v_{n})) if vf(v1,,vn)Effv\leftarrow f(v_{1},...,v_{n})\in\textit{Eff} and β(v)=β(v)\beta^{\prime}(v)=\beta(v) otherwise.

A solution for the numeric planning task is a path from the initial state to a state that satisfies all goal conditions in Goal in the state transition graph.

Example 5.

We represent TSPTW as a numeric planning task similar to the DyPDL model in Example 4. Unlike the DyPDL model, we ignore the objective value since the definition of a numeric planning task does not consider it. In our numeric planning task, a proposition ui=u_{i}=\top represents that customer ii is not visited, vi=v_{i}=\top represents that customer ii is visited, and li=l_{i}=\top represents that the current location is customer ii. We have only one numeric variable tt representing the current time. Therefore,

  • 1.

    VP={u1,,un1,v1,,vn1,l0,,ln1}V_{P}=\{u_{1},...,u_{n-1},v_{1},...,v_{n-1},l_{0},...,l_{n-1}\}.

  • 2.

    Vn={t}V_{n}=\{t\}.

  • 3.

    Init=(α0,β0)\textit{Init}=(\alpha^{0},\beta^{0}) such that iN{0},α0(ui)=,α0(vi)=,α0(li)=\forall i\in N\setminus\{0\},\alpha^{0}(u_{i})=\top,\alpha^{0}(v_{i})=\bot,\alpha^{0}(l_{i})=\bot, α0(l0)=\alpha^{0}(l_{0})=\top, and β0(t)=0\beta^{0}(t)=0.

  • 4.

    Goal={v1=,,vn=,l0=}\textit{Goal}=\{v_{1}=\top,...,v_{n}=\top,l_{0}=\top\}.

To visit customer jN{0}j\in N\setminus\{0\} from iNi\in N, we define two operators, one of which represents earlier arrival. In addition, we define an operator to return to the depot 0 from customer ii after visiting all customers. We have the following three types of operators.

  • 1.

    Pre={li=,uj=,t+cijaj<0}\textit{Pre}=\{l_{i}=\top,u_{j}=\top,t+c_{ij}-a_{j}<0\} and Eff={li,lj,uj,vj,taj}\textit{Eff}=\{l_{i}\leftarrow\bot,l_{j}\leftarrow\top,u_{j}\leftarrow\bot,v_{j}\leftarrow\top,t\leftarrow a_{j}\}.

  • 2.

    Pre={li=,uj=,t+cijaj0,t+cijbj0}\textit{Pre}=\{l_{i}=\top,u_{j}=\top,t+c_{ij}-a_{j}\geq 0,t+c_{ij}-b_{j}\leq 0\} and Eff={li,lj,uj,vj,tt+cij}\textit{Eff}=\{l_{i}\leftarrow\bot,l_{j}\leftarrow\top,u_{j}\leftarrow\bot,v_{j}\leftarrow\top,t\leftarrow t+c_{ij}\}.

  • 3.

    Pre={li=,v1=,,vn1=}\textit{Pre}=\{l_{i}=\top,v_{1}=\top,...,v_{n-1}=\top\ \} and Eff={li,l0,tt+ci0}\textit{Eff}=\{l_{i}\leftarrow\bot,l_{0}\leftarrow\top,t\leftarrow t+c_{i0}\}.

Helmert [82] showed that finding a solution for the above-defined numeric planning task is undecidable. To show the undecidability of DyPDL, we reduce a numeric planning task into a DyPDL model by replacing all propositional variables with a single set variable.

Theorem 1.

Finding a solution for a finitely defined DyPDL model is undecidable.

Proof.

Let VP,VN,Init,Goal,Ops\langle V_{P},V_{N},\textit{Init},\textit{Goal},\textit{Ops}\rangle be a numeric planning task. We compose a DyPDL model as follows:

  • 1.

    If VPV_{P}\neq\emptyset, we introduce a set variable PP^{\prime} in the DyPDL model. For each numeric variable vVNv\in V_{N} in the numeric planning task, we introduce a numeric variable vv^{\prime} in the DyPDL model.

  • 2.

    Let (α0,β0)=Init(\alpha^{0},\beta^{0})=\textit{Init}. We index propositional variables in VPV_{P} using i=0,,|VP|1i=0,...,|V_{P}|-1 and denote the ii-th variable by uiu_{i}. In the target state S0S^{0} of the DyPDL model, S0[P]={i{0,,|VP|1}α0(ui)=}S^{0}[P^{\prime}]=\{i\in\{0,...,|V_{P}|-1\}\mid\alpha^{0}(u_{i})=\top\} and S0[v]=β0(v)S^{0}[v^{\prime}]=\beta^{0}(v) for each numeric variable vVNv\in V_{N}.

  • 3.

    We introduce a base case B=CB,0B=\langle C_{B},0\rangle in the DyPDL model. For each propositional condition ui=u_{i}=\top in Goal, we introduce a condition SiS[P]S\mapsto i\in S[P^{\prime}] in CBC_{B}. For each numeric condition f(v1,,vn) relop 0f(v_{1},...,v_{n})\textbf{ relop }0 in Goal, we introduce a condition Sf(S[v1],,S[vn]) relop 0S\mapsto f(S[v_{1}],...,S[v_{n}])\textbf{ relop }0 in CBC_{B}.

  • 4.

    For each operator o=Pre,Effo=\langle\textit{Pre},\textit{Eff}\rangle, we introduce a transition o=𝖾𝖿𝖿o,𝖼𝗈𝗌𝗍o,𝗉𝗋𝖾oo^{\prime}=\langle\mathsf{eff}_{o^{\prime}},\mathsf{cost}_{o^{\prime}},\mathsf{pre}_{o^{\prime}}\rangle with 𝖼𝗈𝗌𝗍o(x,S)=x+1\mathsf{cost}_{o^{\prime}}(x,S)=x+1.

  • 5.

    For each propositional condition ui=u_{i}=\top in Pre, we introduce SiS[P]S\mapsto i\in S[P^{\prime}] in 𝗉𝗋𝖾o\mathsf{pre}_{o^{\prime}}. For each numeric condition f(v1,,vn) relop f(v_{1},...,v_{n})\textbf{ relop } 0 in Pre, we introduce Sf(S[v1],,S[vn]) relop 0S\mapsto f(S[v^{\prime}_{1}],...,S[v^{\prime}_{n}])\textbf{ relop }0 in 𝗉𝗋𝖾o\mathsf{pre}_{o^{\prime}}.

  • 6.

    Let Add={i{0,,|VP|1}uiEff}\textit{Add}=\{i\in\{0,...,|V_{P}|-1\}\mid u_{i}\leftarrow\top\in\textit{Eff}\} and Del={i{0,,|VP|1}uiEff}\textit{Del}=\{i\in\{0,...,|V_{P}|-1\}\mid u_{i}\leftarrow\bot\in\textit{Eff}\}. We have 𝖾𝖿𝖿o[P](S)=(S[P]Del)Add\mathsf{eff}_{o^{\prime}}[P^{\prime}](S)=(S[P^{\prime}]\setminus\textit{Del})\cup\textit{Add}. We have 𝖾𝖿𝖿o[v](S)=f(S[v1],,S[vn])\mathsf{eff}_{o^{\prime}}[v^{\prime}](S)=f(S[v^{\prime}_{1}],...,S[v^{\prime}_{n}]) if vf(v1,,vn)Effv\leftarrow f(v_{1},...,v_{n})\in\textit{Eff} and 𝖾𝖿𝖿o[v](S)=S[v]\mathsf{eff}_{o^{\prime}}[v^{\prime}](S)=S[v^{\prime}] otherwise.

  • 7.

    The set of state constraints is empty.

The construction of the DyPDL model is done in finite time. The numbers of propositional variables, numeric variables, goal conditions, transitions, preconditions, and effects are finite. Therefore, the DyPDL model is finitely defined.

Let σ=o1,,om\sigma=\langle o^{\prime}_{1},...,o^{\prime}_{m}\rangle be a solution for the DyPDL model. Let Sj=Sj1[[oj]]S^{j}=S^{j-1}[\![o^{\prime}_{j}]\!] for j=1,,mj=1,...,m. Let (αj,βj)(\alpha^{j},\beta^{j}) be a numeric planning state such that αj(ui)=\alpha^{j}(u_{i})=\top if iSj[P]i\in S^{j}[P^{\prime}], αj(ui)=\alpha^{j}(u_{i})=\bot if iSj[P]i\notin S^{j}[P^{\prime}], and βj(v)=Sj[v]\beta^{j}(v)=S^{j}[v^{\prime}]. Note that (α0,β0)=Init(\alpha^{0},\beta^{0})=\textit{Init} satisfies the above condition by construction. We prove that the state transition graph for the numeric planning task has edge ((αj,βj),(αj+1,βj+1))((\alpha^{j},\beta^{j}),(\alpha^{j+1},\beta^{j+1})) for j=0,,m1j=0,...,m-1, and (αm,βm)(\alpha^{m},\beta^{m}) satisfies all conditions in Goal.

Let oj=(Prej,Effj)o_{j}=(\textit{Pre}_{j},\textit{Eff}_{j}). Since ojo^{\prime}_{j} is applicable in SjS^{j}, for each propositional condition ui=u_{i}=\top in Prej\textit{Pre}_{j}, the set variable satisfies iSj[P]i\in S^{j}[P^{\prime}]. For each numeric condition f(v1,,vn) relop 0f(v_{1},...,v_{n})\textbf{ relop }0 in Prej\textit{Pre}_{j}, the numeric variables satisfy f(Sj[v1],,Sj[vn]) relop 0f(S^{j}[v^{\prime}_{1}],...,S^{j}[v^{\prime}_{n}])\textbf{ relop }0. By construction, αj(ui)=\alpha^{j}(u_{i})=\top for iSj[P]i\in S^{j}[P^{\prime}] and f(βj(v1),,βj(vn)) relop 0f(\beta^{j}(v_{1}),...,\beta^{j}(v_{n}))\textbf{ relop }0. Therefore, (αj,βj)(\alpha^{j},\beta^{j}) satisfies all conditions in Prej\textit{Pre}_{j}. Similarly, (αm,βm)(\alpha^{m},\beta^{m}) satisfies all conditions in Goal since SmS^{m} satisfies all conditions in the base case.

Let Addj={i{0,,|VP|1}uiEffj}\textit{Add}_{j}=\{i\in\{0,...,|V_{P}|-1\}\mid u_{i}\leftarrow\top\in\textit{Eff}_{j}\} and Delj={i{0,,|VP|1}uiEffj}\textit{Del}_{j}=\{i\in\{0,...,|V_{P}|-1\}\mid u_{i}\leftarrow\bot\in\textit{Eff}_{j}\}. By construction, Sj+1[P]=(Sj[P]Delj)AddjS^{j+1}[P^{\prime}]=(S^{j}[P^{\prime}]\setminus\textit{Del}_{j})\cup\textit{Add}_{j}. Therefore, for ii with uiEffju_{i}\leftarrow\bot\in\textit{Eff}_{j}, we have iSj+1[P]i\notin S^{j+1}[P^{\prime}], which implies αj+1(ui)=\alpha^{j+1}(u_{i})=\bot. For ii with uiEffju_{i}\leftarrow\top\in\textit{Eff}_{j}, we have iSj+1[P]i\in S^{j+1}[P^{\prime}], which implies αj+1(ui)=\alpha^{j+1}(u_{i})=\top. For other ii, iSj+1[P]i\in S^{j+1}[P^{\prime}] if iSj[P]i\in S^{j}[P^{\prime}] and iSj+1[P]i\notin S^{j+1}[P^{\prime}] if iSj[P]i\notin S^{j}[P^{\prime}], which imply αj+1(ui)=αj(ui)\alpha^{j+1}(u_{i})=\alpha^{j}(u_{i}). For vv with vf(v1,,vn)Effjv\leftarrow f(v_{1},...,v_{n})\in\textit{Eff}_{j}, we have Sj+1[v]=f(Sj[v1],,Sj[vn])=f(βj(v1),,βj(vn))S^{j+1}[v^{\prime}]=f(S^{j}[v^{\prime}_{1}],...,S^{j}[v^{\prime}_{n}])=f(\beta^{j}(v_{1}),...,\beta^{j}(v_{n})), which implies βj+1(v)=f(βj(v1),,βj(vn))\beta^{j+1}(v)=f(\beta^{j}(v_{1}),...,\beta^{j}(v_{n})). For other vv, we have Sj+1[v]=Sj[v]=βj(v)S^{j+1}[v^{\prime}]=S^{j}[v^{\prime}]=\beta^{j}(v), which implies βj+1(v)=βj(v)\beta^{j+1}(v)=\beta^{j}(v). Therefore, edge ((αj,βj),(αj+1,βj+1))((\alpha^{j},\beta^{j}),(\alpha^{j+1},\beta^{j+1})) exists in the state transition graph.

Thus, by solving the DyPDL model, we can find a solution for the numeric planning task. Since the numeric planning task is undecidable, finding a solution for a DyPDL model is also undecidable. ∎

Note that applying this reduction to the numeric planning task in Example 5 does not result in the DyPDL model in Example 4. For example, Example 5 defines two operators for each pair of customers and one operator for each customer to return to the depot, and thus DyPDL model constructed from this numeric planning task has 2n(n1)+n12n(n-1)+n-1 transitions in total. In contrast, Example 4 defines one transition for each customer, resulting in n1n-1 transitions in total. This difference is due to the expressiveness of DyPDL, i.e., referring the value of a state variable in preconditions and effects, enables us to model the same problem with fewer transitions.

With the above proof, DyPDL can be considered a superset of the numeric planning formalism in Definition 8, which corresponds to a subset of Planning Domain Definition Language (PDDL) 2.1 [64] called level 2 of PDDL 2.1 [82]. As we will present later, our practical modeling language for DyPDL enables a user to include dominance between states and bounds on the solution cost, which cannot be modeled in PDDL. In contrast, full PDDL 2.1 has additional features such as durative actions, which cannot be modeled in DyPDL.

Helmert [82] and subsequent work [83] proved undecidability for more restricted numeric planning formalisms than Definition 8 such as subclasses of a restricted numeric planning task, where functions in numeric conditions are linear and numeric effects increase or decrease a numeric variable only by a constant. We expect that these results can be easily applied to DyPDL since our reduction is straightforward. Previous work also investigated conditions with which a numeric planning task becomes more tractable, e.g., decidable or PSPACE-complete [82, 84, 85]. We also expect that we can generalize such conditions to DyPDL. However, in this paper, we consider typical properties in DP models for combinatorial optimization problems.

In our TSPTW example, we have a finite number of states; each state is the result of a sequence of visits to a finite number of customers. In addition, such a sequence does not encounter the same state more than once since the cardinality of the set of unvisited customers, UU, is strictly decreasing. We formalize these properties as finiteness and acyclicity. In practice, all DP models presented and evaluated in this paper are finite and acyclic.

First, we introduce reachability. A state SS^{\prime} is reachable from another state SS if there exists a sequence of transitions that transforms SS into SS^{\prime}. The definition is similar to that of an SS-solution.

Definition 10 (Reachability).

Let SS be a state satisfying the state constraints. A state SS^{\prime} is reachable from a state SS with a non-empty sequence of transitions σ=σ1,,σm\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle if the following conditions are satisfied:

  • 1.

    All transitions are applicable, i.e., σ1𝒯(S)\sigma_{1}\in\mathcal{T}(S) and σi+1𝒯(Si)\sigma_{i+1}\in\mathcal{T}(S^{i}) where S1=S[[σ1]]S^{1}=S[\![\sigma_{1}]\!] and Si+1=Si[[σi+1]]S^{i+1}=S^{i}[\![\sigma_{i+1}]\!] for i=1,,m1i=1,...,m-1.

  • 2.

    All intermediate states do not satisfy a base case, i.e., B,Si⊧̸CB\forall B\in\mathcal{B},S^{i}\not\models C_{B} for i=1,,m1i=1,...,m-1.

  • 3.

    All intermediate states satisfy the state constraints, i.e., Si𝒞S^{i}\models\mathcal{C} for i=1,,m1i=1,...,m-1.

  • 4.

    The final state is SS^{\prime}, i.e., S=SmS^{\prime}=S^{m}.

We say that SS^{\prime} is reachable from SS if there exists a non-empty sequence of transitions with the above conditions. We say that SS is a reachable state if it is the target state or reachable from the target state.

Definition 11.

A DyPDL model is finite if it is finitely defined, and the set of reachable states is finite.

Definition 12.

A DyPDL model is acyclic if any reachable state is not reachable from itself.

If a model is finite, we can enumerate all reachable states and check if there is a base state in finite time. If there is a reachable base state, and the model is acyclic, then there are a finite number of solutions, and each solution has a finite number of transitions. Therefore, by enumerating all sequences with which a state is reachable, identifying solutions, and computing their costs, we can find an optimal solution in finite time.

Theorem 2.

A finite and acyclic DyPDL model has an optimal solution, or the model is infeasible. A problem to decide if a solution whose cost is less (greater) than a given rational number exists for minimization (maximization) is decidable.

3.2 The Bellman Equation for DyPDL

A Bellman equation is useful to succinctly represents a DP model as we have presented the example DP model for TSPTW using it (Equation (1)). Here, we make an explicit connection between DyPDL and a Bellman equation. Focusing on finite and acyclic DyPDL models, which are typical properties in combinatorial optimization problems, we present the Bellman equation representing the optimal solution cost.111While a Bellman equation can be used for a cyclic model by considering a fixed point, we do not consider it in this paper as such a model is not used in this paper, and it complicates our discussion. A Bellman equation requires a special cost structure, the Principle of Optimality [21]. We formally define it in the context of DyPDL. Intuitively, an optimal SS-solution must be constructed from an optimal S[[τ]]S[\![\tau]\!]-solution, where τ\tau is a transition applicable in SS.

Definition 13.

Given any reachable state SS and an applicable transition τ𝒯(S)\tau\in\mathcal{T}(S), a DyPDL model satisfies the Principle of Optimality if for any x,yx,y\in\mathbb{Q}, xy𝖼𝗈𝗌𝗍τ(x,S)𝖼𝗈𝗌𝗍τ(y,S)x\leq y\rightarrow\mathsf{cost}_{\tau}(x,S)\leq\mathsf{cost}_{\tau}(y,S).

With this property, we give the Bellman equation for a DyPDL model, defining the value function VV that maps a state to the optimal SS-solution cost or \infty (-\infty) if an SS-solution does not exist in minimization (maximization).

Theorem 3.

Consider a finite, acyclic, and minimization DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle satisfying the Principle of Optimality. For each reachable state SS, there exists an optimal SS-solution with a finite number of transitions, or there does not exist an SS-solution. Let 𝒮\mathcal{S}^{\prime} be a set of reachable states, and V:𝒮{}V:\mathcal{S}^{\prime}\to\mathbb{Q}\cup\{\infty\} be a function of a state that returns \infty if there does not exist an SS-solution or the cost of an optimal SS-solution otherwise. Then, VV satisfies the following equation:

V(S)={if S⊧̸𝒞minB:SCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)else if B,SCBminτ𝒯(S)𝖼𝗈𝗌𝗍τ(V(S[[τ]]),S)else if τ𝒯(S),V(S[[τ]])<else.V(S)=\begin{cases}\infty&\text{if }S\not\models\mathcal{C}\\ \min\limits_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)&\text{else if }\exists B\in\mathcal{B},S\models C_{B}\\ \min\limits_{\tau\in\mathcal{T}(S)}\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S)&\text{else if }\exists\tau\in\mathcal{T}(S),V(S[\![\tau]\!])<\infty\\ \infty&\text{else.}\end{cases} (2)

For maximization, we replace \infty with -\infty, min\min with max\max, and << with >>.

Proof.

Since the model is acyclic, we can define a partial order over reachable states where SS precedes its successor state S[[τ]]S[\![\tau]\!] if τ𝒯(S)\tau\in\mathcal{T}(S). We can sort reachable states topologically according to this order. Since the set of reachable states is finite, there exists a state that does not precede any reachable state. Let SS be such a state. Then, one of the following holds: S⊧̸𝒞S\not\models\mathcal{C}, SS is a base state, or 𝒯(S)=\mathcal{T}(S)=\emptyset by Definition 10. If S⊧̸𝒞S\not\models\mathcal{C}, there does not exist an SS-solution and V(S)=V(S)=\infty, which is consistent with the first line of Equation (2). If SS satisfies a base case, since the only SS-solution is an empty sequence by Definition 6, V(S)=minB:SCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)V(S)=\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S), which is consistent with the second line of Equation (2). If SS is not a base state and 𝒯(S)=\mathcal{T}(S)=\emptyset, then there does not exist an SS-solution and V(S)=V(S)=\infty, which is consistent with the fourth line of Equation (2).

Assume that for each reachable state S[[τ]]S[\![\tau]\!] preceded by a reachable state SS in the topological order, one of the following conditions holds:

  1. 1.

    There does not exist an S[[τ]]S[\![\tau]\!]-solution, i.e., V(S[[τ]])=V(S[\![\tau]\!])=\infty.

  2. 2.

    There exists an optimal S[[τ]]S[\![\tau]\!]-solution with a finite number of transitions with cost V(S[[τ]])<V(S[\![\tau]\!])<\infty.

If the first case holds for each τ𝒯(S)\tau\in\mathcal{T}(S), there does not exist an SS-solution, and V(S)=V(S)=\infty. Since V(S[[τ]])=V(S[\![\tau]\!])=\infty for each τ\tau, V(S)=V(S)=\infty is consistent with the fourth line of Equation (2). The first case of the assumption also holds for SS. If the second case holds for some τ\tau, there exists an optimal S[[τ]]S[\![\tau]\!]-solution with a finite number of transitions and the cost V(S[[τ]])<V(S[\![\tau]\!])<\infty. By concatenating τ\tau and the optimal S[[τ]]S[\![\tau]\!]-solution, we can construct an SS-solution with a finite number of transitions and the cost 𝖼𝗈𝗌𝗍τ(V(S[[τ]]),S)\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S). This SS-solution is better or equal to any other SS-solution σ=σ1,,σm\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle starting with σ1=τ\sigma_{1}=\tau since

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)=𝖼𝗈𝗌𝗍τ(𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,,σm,S[[τ]]),S)𝖼𝗈𝗌𝗍τ(V(S[[τ]]),S)\mathsf{solution\_cost}(\sigma,S)=\mathsf{cost}_{\tau}(\mathsf{solution\_cost}(\langle\sigma_{2},...,\sigma_{m}\rangle,S[\![\tau]\!]),S)\geq\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S)

by the Principle of Optimality (Definition 13). By considering all possible τ\tau,

V(S)=minτ𝒯(S)𝖼𝗈𝗌𝗍τ(V(S[[τ]]),S),V(S)=\min\limits_{\tau\in\mathcal{T}(S)}\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S),

which is consistent with the third line of Equation (2). The second case of the assumption also holds for SS. We can prove the theorem by mathematical induction. The proof for maximization is similar. ∎

3.3 Redundant Information

In AI planning, a model typically includes only information necessary to define a problem [63, 86]. In contrast, in operations research (OR), an optimization model often includes redundant information implied by the other parts of the model, e.g., valid inequalities in MIP. While such information is unnecessary to define a model, it sometimes improves the performance of solvers. Redundant information has also been exploited by some problem-specific DP methods in previous work [23, 48, 29]. For example, Dumas et al. [23] used redundant information implied by the Bellman equation for TSPTW (Equation (1)). Given a state (U,i,t)(U,i,t), if a customer jUj\in U cannot be visited by the deadline bjb_{j} even if we use the shortest path with travel time cijc^{*}_{ij}, then the state does not lead to a solution, so it is ignored. While this technique was algorithmically used in previous work, we can represent it declaratively using the value function and formulate it as a state constraint.

V(U,i,t)= if jU,t+cij>bj.V(U,i,t)=\infty\text{ if }\exists j\in U,t+c^{*}_{ij}>b_{j}. (3)
Example 6.

In our TSPTW Example, Inequality (3) can be represented by a set of state constraints {SjUS[t]+cS[i],jbjjN{0}}\{S\mapsto j\not\in U\lor S[t]+c^{*}_{S[i],j}\leq b_{j}\mid j\in N\setminus\{0\}\}.

While we have represented Equation (3) using state constraints, we can also consider other types of redundant information. We introduce three new concepts, state dominance, dual bound functions, and forced transitions, to declaratively represent such information.

3.3.1 State Dominance

Another technique used by Dumas et al. [23] is state dominance. If we have two states (U,i,t)(U,i,t) and (U,i,t)(U,i,t^{\prime}) with ttt\leq t^{\prime}, then (U,i,t)(U,i,t) leads to at least as good a solution as (U,i,t)(U,i,t^{\prime}), so the latter is ignored. In terms of the value function:

V(U,i,t)V(U,i,t) if tt.V(U,i,t)\leq V(U,i,t^{\prime})\text{ if }t\leq t^{\prime}. (4)

We define state dominance for DyPDL. Intuitively, one state SS dominates another state SS^{\prime} if SS always leads to an as good or a better solution, and thus SS^{\prime} can be ignored when we have SS. In addition to this intuition, we require that SS leads to a not longer solution. This condition ensures that a state SS does not dominate its successor S[[τ]]S[\![\tau]\!] on an optimal solution; S[[τ]]S[\![\tau]\!] should not be ignored even if we have SS since the optimal SS-solution cannot be found without considering S[[τ]]S[\![\tau]\!].

Definition 14 (State Dominance).

For a minimization DyPDL model, a state SS dominates another state SS^{\prime}, denoted by SSS^{\prime}\preceq S, iff, for any SS^{\prime}-solution σ\sigma^{\prime}, there exists an SS-solution σ\sigma such that 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)\mathsf{solution\_cost}(\sigma,S)\leq\mathsf{solution\_cost}(\sigma^{\prime},S^{\prime}) and σ\sigma has no more transitions than σ\sigma^{\prime}. For maximization, we replace \leq with \geq.

Our definition of dominance is inspired by simulation-based dominance in AI planning [87]. In that paradigm, SS dominates SS^{\prime} only if for each applicable transition τ\tau^{\prime} in SS^{\prime}, there exists an applicable transition τ\tau in SS such that S[[τ]]S[\![\tau]\!] dominates S[[τ]]S^{\prime}[\![\tau^{\prime}]\!]. Also, if SS dominates a base state SS^{\prime} (a goal state in planning terminology), SS is also a base state. In addition to the original transitions, simulation-based dominance adds a NOOP transition, which stays in the current state. In simulation-based dominance, if we have an SS^{\prime}-solution, we also have SS-solution with an equal number of transitions (or possibly fewer transitions if we exclude NOOP). Therefore, intuitively, Definition 14 is a generalization of simulation-based dominance; a formal discussion is out of the scope of this paper.

In practice, it may be difficult to always detect if one state dominates another or not, and thus an algorithm may focus on dominance that can be easily detected based on a sufficient condition, e.g., ttt\leq t^{\prime} for (U,i,t)(U,i,t) and (U,i,t)(U,i,t^{\prime}). We define an approximate dominance relation to represent such a strategy. First, we clarify the condition that should be satisfied by an approximate dominance relation.

Theorem 4.

For a DyPDL model, the dominance relation is a preorder, i.e., the following conditions hold.

  • 1.

    SSS\preceq S for a state SS (reflexivity).

  • 2.

    S′′SSSS′′SS^{\prime\prime}\preceq S^{\prime}\land S^{\prime}\preceq S\rightarrow S^{\prime\prime}\preceq S for states SS, SS^{\prime}, and S′′S^{\prime\prime} (transitivity).

Proof.

The first condition holds by Definition 14. For the second condition, for any S′′S^{\prime\prime}-solution σ′′\sigma^{\prime\prime}, there exists an equal or better SS^{\prime}-solution σ\sigma^{\prime} having no more transitions than σ′′\sigma^{\prime\prime}. There exists an equal or better SS-solution σ\sigma for σ\sigma^{\prime} having no more transitions than σ\sigma^{\prime}. Therefore, the cost of σ\sigma is equal to or better than σ′′\sigma^{\prime\prime}, and σ\sigma has no more transitions than σ′′\sigma^{\prime\prime}. ∎

Definition 15.

For a DyPDL model, an approximate dominance relation a\preceq_{a} is a preorder over two states such that SaSSSS^{\prime}\preceq_{a}S\rightarrow S^{\prime}\preceq S for reachable states SS and SS^{\prime}.

Example 7.

For our TSPTW example, Inequality (4) is represented by an approximate dominance relation such that SaSS^{\prime}\preceq_{a}S iff S[U]=S[U]S[U]=S^{\prime}[U], S[i]=S[i]S[i]=S^{\prime}[i], and S[t]S[t]S[t]\leq S^{\prime}[t].

  • 1.

    The reflexivity holds since S[U]=S[U]S[U]=S[U], S[i]=S[i]S[i]=S[i], and S[t]S[t]S[t]\leq S[t].

  • 2.

    The transitivity holds since S[U]=S[U]S[U]=S^{\prime}[U] and S[U]=S′′[U]S^{\prime}[U]=S^{\prime\prime}[U] imply S[U]=S′′[U]S[U]=S^{\prime\prime}[U], S[i]=S[i]S[i]=S^{\prime}[i] and S[i]=S′′[i]S^{\prime}[i]=S^{\prime\prime}[i] imply S[i]=S′′[i]S[i]=S^{\prime\prime}[i], and S[t]S[t]S[t]\leq S^{\prime}[t] and S[t]S′′[t]S^{\prime}[t]\leq S^{\prime\prime}[t] imply S[t]S′′[t]S[t]\leq S^{\prime\prime}[t].

  • 3.

    All SS- and SS^{\prime}-solutions have the same number of transitions since each solution has |S[U]|=|S[U]||S[U]|=|S^{\prime}[U]| transitions to visit all unvisited customers.

An approximate dominance relation is sound but not complete: it always detects the dominance if two states are the same and otherwise may produce a false negative but never a false positive.

Similarly to Inequality (4), we can represent approximate dominance relation in DyPDL using the value function. In what follows, we assume that <x<-\infty<x<\infty for any xx\in\mathbb{Q}.

Theorem 5.

Given a finite and acyclic DyPDL model satisfying the Principle of Optimality, let VV be the value function of the Bellman equation for minimization. Given an approximate dominance relation a\preceq_{a}, for reachable states SS and SS^{\prime},

V(S)V(S) if SaS.V(S)\leq V(S^{\prime})\text{ if }S^{\prime}\preceq_{a}S. (5)

For maximization, we replace \leq with \geq.

Proof.

For reachable states SS and SS^{\prime} with SaSS^{\prime}\preceq_{a}S, assume that there exist SS- and SS^{\prime}-solutions. Then, V(S)V(S) (V(S)V(S^{\prime})) is the cost of an optimal SS-solution (SS^{\prime}-solution). By Definitions 14 and 15, for minimization, an optimal SS-solution has an equal or smaller cost than any SS^{\prime}-solution, so V(S)V(S)V(S)\leq V(S^{\prime}). If there does not exist an SS-solution, by Definition 14, there does not exist an SS^{\prime}-solution, so V(S)=V(S)=V(S)=V(S^{\prime})=\infty. If there does not exist an SS^{\prime}-solution, V(S)V(S)=V(S)\leq V(S^{\prime})=\infty. The proof for the maximization is similar. ∎

3.3.2 Dual Bound Function

While not used by Dumas et al. [23], bounds on the value function have been used by previous problem-specific DP methods [48, 29]. When we know an upper bound on the optimal solution cost for minimization, and we can prove that a state cannot lead to a solution with its cost less than the upper bound, we can ignore the state. For our TSPTW example, we define a lower bound function based on the one used for a sequential ordering problem by previous work [88]. Since the minimum travel time to visit customer jj is cjin=minkN{j}ckjc^{\text{in}}_{j}=\min_{k\in N\setminus\{j\}}c_{kj}, the cost of visiting all customers in UU and returning to the depot is at least the sum of cjinc^{\text{in}}_{j} for each jU{0}j\in U\cup\{0\}. Similarly, we also use the minimum travel time from jj, cjout=minkN{j}cjkc^{\text{out}}_{j}=\min_{k\in N\setminus\{j\}}c_{jk}, to underestimate the total travel time. Using the value function,

V(U,i,t)max{jU{0}cjin,jU{i}cjout}.V(U,i,t)\geq\max\left\{\sum_{j\in U\cup\{0\}}c^{\text{in}}_{j},\sum_{j\in U\cup\{i\}}c^{\text{out}}_{j}\right\}. (6)

We formalize this technique as a dual bound function that underestimates (overestimates) the cost of a solution in minimization (maximization).

Definition 16.

For a DyPDL model, a function η:𝒮{,}\eta:\mathcal{S}\mapsto\mathbb{Q}\cup\{\infty,-\infty\} is a dual bound function iff, for any reachable state SS and any SS-solution σ\sigma, η(S)\eta(S) is a dual bound on 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)\mathsf{solution\_cost}(\sigma,S), i.e., η(S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)\eta(S)\leq\mathsf{solution\_cost}(\sigma,S) for minimization. For maximization, we replace \leq with \geq.

Example 8.

For our TSPTW example, Inequality (6) is represented as η(S)=max{jS[U]{0}cjin,jS[U]{S[i]}cjout}\eta(S)=\max\left\{\sum\limits_{j\in S[U]\cup\{0\}}c^{\text{in}}_{j},\sum\limits_{j\in S[U]\cup\{S[i]\}}c^{\text{out}}_{j}\right\}.

A function that always returns -\infty (\infty) for minimization (maximization) is trivially a dual bound function. If there exists an SS-solution σ\sigma for minimization, η(S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)<\eta(S)\leq\mathsf{solution\_cost}(\sigma,S)<\infty. Otherwise, η(S)\eta(S) can be any value, including \infty. Thus, if a dual bound function can detect that an SS-solution does not exist, the function should return \infty to tell a solver that there is no SS-solution. For maximization, a dual bound function should return -\infty in such a case.

Theorem 6.

Given a finite and acyclic DyPDL model satisfying the Principle of Optimality, let VV be the value function of the Bellman equation for minimization. Given a dual bound function η\eta, for a reachable state SS,

V(S)η(S).V(S)\geq\eta(S). (7)

For maximization, we replace \geq with \leq.

Proof.

For a reachable state SS, if there exists an SS-solution, the cost of an optimal SS-solution is V(S)V(S). By Definition 16, η(S)\eta(S) is a lower bound of the cost of any SS-solution, so η(S)V(S)\eta(S)\leq V(S). Otherwise, η(S)V(S)=\eta(S)\leq V(S)=\infty. The proof for maximization is similar. ∎

3.3.3 Forced Transitions

We introduce yet another type of redundant information, forced transitions. Since forced transitions are not identified in our TSPTW example, we first present a motivating example, the talent scheduling problem [2]. In this problem, a set of actors AA and a set of scenes NN are given, and the objective is to find a sequence of scenes to shoot to minimize the total cost paid for the actors. In a scene sNs\in N, a set of actors AsAA_{s}\subseteq A plays for dsd_{s} days. We incur cost cac_{a} of actor aa for each day they are on location. If an actor plays on days ii and jj, they are on location on days i,i+1,,ji,i+1,...,j even if they do not play on day i+1i+1 to j1j-1. The objective is to find a sequence of scenes such that the total cost is minimized.

Garcia de la Banda et al. [29] proposed a DP method for talent scheduling, where a state is represented by a set of unscheduled scenes QQ. The actors who played in scenes NQN\setminus Q have already arrived at the location and are staying there if they play in a scene in QQ. Therefore, the set of actors on location is

L(Q)=(sQAssNQAs).L(Q)=\left(\bigcup_{s\in Q}A_{s}\cap\bigcup_{s\in N\setminus Q}A_{s}\right).

At each step, we shoot a scene ss from QQ and pay the cost dsaAsL(Q)cad_{s}\sum_{a\in A_{s}\cup L(Q)}c_{a}. The Bellman equation is as follows:

V(Q)={0if Q=minsQdsaAsL(Q)ca+V(Q{s})if Q.V(Q)=\begin{cases}0&\text{if }Q=\emptyset\\ \min\limits_{s\in Q}d_{s}\sum_{a\in A_{s}\cup L(Q)}c_{a}+V(Q\setminus\{s\})&\text{if }Q\neq\emptyset.\end{cases} (8)

In this model, we can shoot a scene ss without paying an extra cost if As=L(Q)A_{s}=L(Q). Therefore, if such an scene exists, we should shoot it next ignoring other scenes. Using bs=dsaAscsb_{s}=d_{s}\sum_{a\in A_{s}}c_{s}, which can be precomputed,

V(Q)=bs+V(Q{s}) if sQ,As=L(Q).V(Q)=b_{s}+V(Q\setminus\{s\})\text{ if }\exists s\in Q,A_{s}=L(Q). (9)

We can exploit more redundant information in the preconditions of the transitions and a dual bound function, following Garcia de la Banda et al. [29]. We present the complete model with such information in B.

We formalize Equation (9) as forced transitions in DyPDL.

Definition 17.

Given a set of applicable transitions 𝒯(S)\mathcal{T}(S) in a state SS, a transition τ𝒯(S)\tau\in\mathcal{T}(S) is a forced transition if an SS-solution does not exists, or for each SS-solution σ\sigma^{\prime}, there exists an SS-solution σ\sigma whose first transition is τ\tau and satisfies 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)\mathsf{solution\_cost}(\sigma,S)\leq\mathsf{solution\_cost}(\sigma^{\prime},S) for minimization. For maximization, we replace \leq with \geq.

Theorem 7.

Given a finite and acyclic DyPDL model satisfying the Principle of Optimality, let VV be the value function of the Bellman equation for minimization. Let SS be a reachable state with V(S)<V(S)<\infty for minimization or V(S)>V(S)>-\infty for maximization and τ\tau be a forced transition applicable in a state SS. Then,

V(S)=𝖼𝗈𝗌𝗍τ(V(S[[τ]]),S).V(S)=\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S). (10)
Proof.

We assume minimization, and the proof for maximization is similar. Since V(S)<V(S)<\infty, there exists an SS-solution. Since the model is finite and acyclic, there are a finite number of SS-solutions, and we can find an optimal SS-solution. By Definition 17, there exists an optimal SS-solution τ,σ1,,σm\langle\tau,\sigma_{1},...,\sigma_{m}\rangle. Since V(S)V(S) is the cost of an optimal SS-solution,

V(S)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(τ,σ1,,σm,S)=𝖼𝗈𝗌𝗍τ(𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ1,,σm,S[[τ]]),S).V(S)=\mathsf{solution\_cost}(\langle\tau,\sigma_{1},...,\sigma_{m}\rangle,S)=\mathsf{cost}_{\tau}(\mathsf{solution\_cost}(\langle\sigma_{1},...,\sigma_{m}\rangle,S[\![\tau]\!]),S).

Since an S[[τ]]S[\![\tau]\!]-solution σ1,,σm\langle\sigma_{1},...,\sigma_{m}\rangle exists, there exists an optimal S[[τ]]S[\![\tau]\!]-solution σ1,,σm\langle\sigma^{\prime}_{1},...,\sigma^{\prime}_{m^{\prime}}\rangle with cost V(S[[τ]])𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ1,,σm,S[[τ]])V(S[\![\tau]\!])\leq\mathsf{solution\_cost}(\langle\sigma_{1},...,\sigma_{m}\rangle,S[\![\tau]\!]). By the Principle of Optimality (Definition 13),

V(S)𝖼𝗈𝗌𝗍τ(V(S[[τ]]),S).V(S)\geq\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S).

Since τ,σ1,,σm\langle\tau,\sigma^{\prime}_{1},...,\sigma^{\prime}_{m^{\prime}}\rangle is also an SS-solution,

V(S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(τ,σ1,,σm,S)=𝖼𝗈𝗌𝗍τ(𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ1,,σm,S[[τ]]),S)=𝖼𝗈𝗌𝗍τ(V(S[[τ]]),S).V(S)\leq\mathsf{solution\_cost}(\langle\tau,\sigma^{\prime}_{1},...,\sigma^{\prime}_{m^{\prime}}\rangle,S)=\mathsf{cost}_{\tau}(\mathsf{solution\_cost}(\langle\sigma^{\prime}_{1},...,\sigma^{\prime}_{m^{\prime}}\rangle,S[\![\tau]\!]),S)=\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S).

Therefore, V(S)=𝖼𝗈𝗌𝗍τ(V(S[[τ]]),S)V(S)=\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S). ∎

4 YAML-DyPDL: A Practical Modeling Language

As a practical modeling language for DyPDL, we propose YAML-DyPDL on top of a data serialization language, YAML 1.2.222https://yaml.org/ YAML-DyPDL is inspired by PDDL in AI planning [63]. However, in PDDL, a model typically contains only information necessary to define a problem, while YAML-DyPDL allows a user to explicitly model redundant information, i.e., implications of the definition. Such is the standard convention in OR and is commonly exploited in problem-specific DP algorithms for combinatorial optimization (e.g., Dumas et al. [23]). In particular, in YAML-DyPDL, a user can explicitly define an approximate dominance relation (Definition 15), dual bound functions (Definition 16), and forced transitions (Definition 17). In PDDL, while forced transitions may be realized with preconditions of actions, approximate dominance relation and dual bound functions cannot be modeled.

In the DyPDL formalism, expressions and conditions are defined as functions. In a practical implementation, the kinds of functions that can be used as expressions are defined by the syntax of a modeling language. In YAML-DyPDL, for example, arithmetic operations (e.g., addition, subtraction, multiplication, and division) and set operations (e.g., adding an element, removing an element, union, intersection, and difference) using state variables can be used. We give an example of YAML-DyPDL here. A detailed description of the syntax is given as software documentation in our repository.333https://github.com/domain-independent-dp/didp-rs/blob/main/didp-yaml/docs/dypdl-guide.md

4.1 Example

1cost_type: integer
2reduce: min
3objects:
4 - customer
5state_variables:
6 - name: U
7 type: set
8 object: customer
9 - name: i
10 type: element
11 object: customer
12 - name: t
13 type: integer
14 preference: less
15tables:
16 - name: a
17 type: integer
18 args:
19 - customer
20 - name: b
21 type: integer
22 args:
23 - customer
24 - name: c
25 type: integer
26 args:
27 - customer
28 - customer
29 - name: cstar
30 type: integer
31 args:
32 - customer
33 - customer
34 - name: cin
35 type: integer
36 args:
37 - customer
38 - name: cout
39 type: integer
40 args:
41 - customer
42cost_type: integer
43reduce: min
44objects:
45 - customer
46state_variables:
47 - name: U
48 type: set
49 object: customer
50 - name: i
51 type: element
52 object: customer
53 - name: t
54 type: integer
55 preference: less
56tables:
57 - name: a
58 type: integer
59 args:
60 - customer
61 - name: b
62 type: integer
63 args:
64 - customer
65 - name: c
66 type: integer
67 args:
68 - customer
69 - customer
70 - name: cstar
71 type: integer
72 args:
73 - customer
74 - customer
75 - name: cin
76 type: integer
77 args:
78 - customer
79 - name: cout
80 type: integer
81 args:
82 - customer
83transitions:
84 - name: visit
85 parameters:
86 - name: j
87 object: U
88 effect:
89 U: (remove j U)
90 i: j
91 t: (max (+ t (c i j)) (a j))
92 cost: (+ (c i j) cost)
93 preconditions:
94 - (<= (+ t (c i j)) (b j))
95constraints:
96 - condition: (<= (+ t (cstar i j)) (b j))
97 forall:
98 - name: j
99 object: U
100base_cases:
101 - conditions:
102 - (is_empty U)
103 cost: (c i 0)
104dual_bounds:
105 - (+ (sum cin U) (cin 0))
106 - (+ (sum cout U) (cout i))
Figure 1: YAML-DyPDL domain file for TSPTW.

We present how the DyPDL model in Example 4 is described by YAML-DyPDL. Following PDDL, we require two files, a domain file and a problem file, to define a DyPDL model. A domain file describes a class of problems by declaring state variables and constants and defining transitions, base cases, and dual bound functions using expressions. In contrast, a problem file describes one problem instance by defining information specific to that instance, e.g., the target state and the values of constants.

Figure 1 shows the domain file for the DyPDL model of TSPTW. The domain file is a map in YAML, which associates keys with values. In YAML, a key and a value are split by :. Keys and values can be maps, lists of values, strings, integers, and floating-point numbers. A list is described by multiple lines starting with -, and each value after - is an element of the list. In YAML, we can also use a JSON-like syntax,444https://www.json.org/json-en.html where a map is described as { key_1: value_1, …, key_n: value_n }, and a list is described as [value_1, …, value_n].

4.1.1 Cost Type

The first line defines key cost_type and its value integer, meaning that the cost of the DyPDL model is computed in integers. While the DyPDL formalism considers numeric expressions that return a rational number, in a software implementation, it is beneficial to differentiate integer and continuous values. In YAML-DyPDL, we explicitly divide numeric expressions into integer and continuous expressions. The value of the key reduce is min, which means that we want to minimize the cost.

4.1.2 Object Types

The key objects, whose value is a list of strings, defines object types. In the example, the list only contains one value, customer. An object type is associated with a set of nn nonnegative integers {0,,n1}\{0,...,n-1\}, where nn is defined in a problem file. The customer object type represents a set of customers N={0,,n1}N=\{0,...,n-1\} in TSPTW. The object type is used later to define a set variable and constants.

4.1.3 State Variables

The key state_variables defines state variables. The value is a list of maps describing a state variable. For each state variable, we have key name defining the name and key type defining the type, which is either element, set, integer, or continuous.

The variable U is the set variable UU representing the set of unvisited customers. YAML-DyPDL requires associating a set variable with an object type. The variable UU is associated with the object type, customer, by object: customer. Then, the domain of UU is restricted to 2N2^{N}. This requirement arises from practical implementations of set variables; we want to know the maximum cardinality of a set variable to efficiently represent it in a computer program (e.g., using a fixed length bit vector).

The variable i is the element variable ii representing the current location. YAML-DyPDL also requires associating an element variable with an object type for readability; by associating an element variable with an object type, it is easier to understand the meaning of the variable. However, the domain of the element variable is not restricted by the number of objects, nn; while objects are indexed from 0 to n1n-1, a user may want to use nn to represent none of them.

The variable t is the numeric variable tt representing the current time. For this variable, the preference is defined by preference: less, which means that a state having smaller tt dominates another state if UU and ii are the same. Such a variable is called a resource variable. Resource variables define approximate dominance relation in Definition 15: given two states SS and SS^{\prime}, if a state S[v]S[v]S[v]\geq S^{\prime}[v] for each resource variable vv where greater is preferred (preference: greater), S[v]S[v]S[v]\leq S^{\prime}[v] for each resource variable where less is preferred (preference: less), and S[v]=S[v]S[v]=S^{\prime}[v] for each non-resource variable vv, then SS dominates SS^{\prime}. This relation trivially satisfies reflexivity and transitivity, and thus it is a preorder.

4.1.4 Tables

The value of the key tables is a list of maps declaring tables of constants. A table maps a tuple of objects to a constant. The table a represents the beginning of the time window aja_{j} at customer jj, so the values in the table are integers (type: integer). The concrete values are given in a problem file. The key args defines the object types associated with a table using a list. For a, one customer jj is associated with the value aja_{j}, so the list contains only one string customer. The tables b, cin, and cout are defined for the deadline bjb_{j}, the minimum travel time to a customer cjinc^{\text{in}}_{j}, and the minimum travel time from a customer cjoutc^{\text{out}}_{j}, respectively. The table c is for ckjc_{kj}, the travel time from customer kk to jj. This table maps a pair of customers to an integer value, so the value of args is a list equivalent to [customer, customer]. Similarly, the shortest travel time ckjc^{*}_{kj} is represented by the table cstar.

4.1.5 Transitions

The value of the key transitions is a list of maps defining transitions. Using parameters, we can define multiple transitions in the same scheme but associated with different objects. The key name defines the name of the parameter, j, and object defines the object type. Basically, the value of the key object should be the name of the object type, e.g., customer. However, we can also use the name of a set variable. In the example, by using object: U, we state that the transition is defined for each object jNj\in N with a precondition jUj\in U.

The key preconditions defines preconditions by using a list of conditions. In YAML-DyPDL, conditions and expressions are described by arithmetic operations in a LISP-like syntax. In the precondition of the transition in our example, (c i j) corresponds to cijc_{ij}, so (<=<= (+ t (c i j)) (b j)) corresponds to t+cijbjt+c_{ij}\leq b_{j}. The key effect defines the effect by using a map, whose keys are names of the state variables. For set variable U, the value is a set expression (remove j U), corresponding to U{j}U\setminus\{j\}. For element variable i, the value is an element expression j, corresponding to jj. For integer variable t, the value is an integer expression (max (+ t (c i j)) (a j)), corresponding to max{t+cij,aj}\max\{t+c_{ij},a_{j}\}. The key cost defines the cost expression (+ (c i j) cost), corresponding to cij+xc_{ij}+x. In the example, the cost expression must be an integer expression since the cost_type is integer. In the cost expression, we can use cost to represent the cost of the successor state (xx). We can also have a key forced, whose value is Boolean, indicating that transition is known to be a forced transition when it is applicable. We do not have it in the example, which means the transition is not known to be forced.

4.1.6 State Constraints

The value of the key constraints is a list of state constraints. In the DyPDL model, we have jU,t+cijbj\forall j\in U,t+c^{*}_{ij}\leq b_{j}. Similarly to the definition of transitions, we can define multiple state constraints with the same scheme associated with different objects using forall. The value of the key forall is a map defining the name of the parameter and the associated object type or set variable. The value of the key condition is a string describing the condition, (<=<= (+ t (cstar i j)) (b j)), which uses the parameter j.

4.1.7 Base Cases

The value of the key base_cases is a list of maps defining base cases. Each map has two keys, conditions and cost. The value of the key conditions is a list of conditions, and the value of the key cost is a numeric expression (must be an integer expression in the example since cost_type is integer). The condition (is_empty U) corresponds to U=U=\emptyset, and the cost (c i 0) corresponds to ci0c_{i0}.

4.1.8 Dual Bound Functions

The value of the key dual_bounds is a list of numeric expressions describing dual bound functions. In the example, we use (+ (sum cin U) (cin 0)) and (+ (sum cout U) (cout i)) corresponding to jUcjin+c0in=jU{0}cjin\sum_{j\in U}c^{\text{in}}_{j}+c^{\text{in}}_{0}=\sum_{j\in U\cup\{0\}}c^{\text{in}}_{j} and jUcjout+ciout=jU{i}cjout\sum_{j\in U}c^{\text{out}}_{j}+c^{\text{out}}_{i}=\sum_{j\in U\cup\{i\}}c^{\text{out}}_{j}, respectively. Since cost_type is integer, they are integer expressions.

4.1.9 Problem File

1object_numbers: 2 customer: 4 3target: 4 U: [1, 2, 3] 5 i: 0 6 t: 0 7table_values: 8 a: { 1: 5, 2: 0, 3: 8 } 9 b: { 1: 16, 2: 10, 3: 14 } 10 c: 11 { 12 [0, 1]: 3, [0, 2]: 4, [0, 3]: 5, 13 [1, 0]: 3, [1, 2]: 5, [1, 3]: 4, 14 [2, 0]: 4, [2, 1]: 5, [2, 3]: 3, 15 [3, 0]: 5, [3, 1]: 4, [3, 2]: 3, 16 } 17 cstar: 18 { 19 [0, 1]: 3, [0, 2]: 4, [0, 3]: 5, 20 [1, 0]: 3, [1, 2]: 5, [1, 3]: 4, 21 [2, 0]: 4, [2, 1]: 5, [2, 3]: 3, 22 [3, 0]: 5, [3, 1]: 4, [3, 2]: 3, 23 } 24 cin: { 0: 3, 1: 3, 2: 3, 3: 3 } 25 cout: { 0: 3, 1: 3, 2: 3, 3: 3 }

Figure 2: YAML-DyPDL problem file for TSPTW.

Turning to the problem file (Figure 2), the value of object_numbers is a map defining the number of objects for each object type. The value of target is a map defining the values of the state variables in the target state. For the set variable U, a list of nonnegative integers is used to define a set of elements in the set. The value of table_values is a map defining the values of the constants in the tables. For a, b, cin, and cout, a key is the index of an object, and a value is an integer. For c and cstar, a key is a list of the indices of objects.

4.2 Complexity

In Section 3, we showed that finding a solution for a DyPDL model is undecidable in general by reducing a numeric planning task to a DyPDL model. YAML-DyPDL has several restrictions compared to Definition 6. A set variable is associated with an object type, restricting its domain to a subset of a given finite set. In addition, expressions are limited by the syntax. However, these restrictions do not prevent the reduction.

Theorem 8.

Finding a solution for a finitely defined DyPDL model is undecidable even with the following restrictions.

  • 1.

    The domain of each set variable vv is restricted to 2Nv2^{N_{v}} where Nv={0,,nv1}N_{v}=\{0,...,n_{v}-1\}, and nvn_{v} is a positive integer.

  • 2.

    Numeric expressions and element expressions are functions represented by arithmetic operations {+,,,/}\{+,-,\cdot,/\}.

  • 3.

    Set expressions are functions constructed by a set of constants, set variables, and the intersection, union, and difference of two set expressions.

  • 4.

    A condition compares two numeric expressions, compares two element expressions, or checks if a set expression is a subset of another set expression.

Proof.

We can follow the proof of Theorem 1 even with the restrictions. Since the number of propositional variables in the set VPV_{P} in a numeric planning task is finite, we can use nP=|VP|n_{P^{\prime}}=|V_{P}| for the set variable PP^{\prime} representing propositional variables. Arithmetic operations {+,,,/}\{+,-,\cdot,/\} are sufficient for numeric expressions by Definition 8. Similarly, if we consider a condition iS[P]i\in S[P^{\prime}], which checks if ii is included in a set variable PP^{\prime}, as {i}S[P]\{i\}\subseteq S[P^{\prime}], the last two restrictions do not prevent the compilation of the numeric planning task to the DyPDL model. ∎

With the above reduction, we can use a system to solve a YAML-DyPDL model for the numeric planning formalism in Theorem 1.

5 State Space Search for DyPDL

We use state space search [31], which finds a path in an implicitly defined graph, to solve a DyPDL model. In particular, we focus on heuristic search algorithms [89, 33], which estimate path costs using a heuristic function. Once we interpret the state transition system defined by a DyPDL model as a graph, it is intuitive that we can use state space search to solve the model. However, in practice, state space search is not always applicable; a DyPDL model needs to satisfy particular conditions. Therefore, in what follows, we formally present state space search algorithms and the conditions with which they can be used to solve DyPDL models.

Definition 18.

Given a DyPDL model, the state transition graph is a directed graph where nodes are reachable states and there is an edge from SS to SS^{\prime} labeled with τ\tau, (S,S,τ)(S,S^{\prime},\tau), iff τ𝒯(S)\tau\in\mathcal{T}(S) and S=S[[τ]]S^{\prime}=S[\![\tau]\!].

We use the term path to refer to both a sequence of edges in the state transition graph and a sequence of transitions as they are equivalent. A state SS^{\prime} is reachable from SS iff there exists a path from SS to SS^{\prime} in the state transition graph, and an SS-solution corresponds to a path from SS to a base state. Trivially, if a model is acyclic, the state transition graph is also acyclic.

5.1 Cost Algebras

For a DyPDL model, we want to find a solution that minimizes or maximizes the cost. Shortest path algorithms such as Dijkstra’s algorithm [90] and A* [91] find the path minimizing the sum of the weights associated with the edges. In DyPDL, the cost of a solution can be more general, defined by cost expressions of the transitions. Edelkamp et al. [92] extended the shortest path algorithms to cost-algebraic heuristic search algorithms, which can handle more general cost structures. They introduced the notion of cost algebras, which define the cost of a path using a binary operator to combine edge weights and an operation to select the best value. Following their approach, first, we define a monoid.

Definition 19.

Let AA be a set, ×:A×AA\times:A\times A\rightarrow A be a binary operator, and 𝟏A\mathbf{1}\in A. A tuple A,×,𝟏\langle A,\times,\mathbf{1}\rangle is a monoid if the following conditions are satisfied.

  • 1.

    x×yAx\times y\in A for x,yAx,y\in A (closure).

  • 2.

    x×(y×z)=(x×y)×zx\times(y\times z)=(x\times y)\times z for x,y,z,Ax,y,z,\in A (associativity).

  • 3.

    x×𝟏=𝟏×x=xx\times\mathbf{1}=\mathbf{1}\times x=x for xAx\in A (identity).

Next, we define isotonicity, a property of a set and a binary operator with regard to comparison. Since minimization or maximization over rational numbers is sufficient for our use case, we restrict the set AA to rational numbers, and the comparison operator to \leq. The original paper by Edelkamp et al. [92] is more general.

Definition 20 (Isotonicity).

Given a set A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\} and a binary operator ×:A×AA\times:A\times A\rightarrow A, AA is isotone if xyx×zy×zx\leq y\rightarrow x\times z\leq y\times z and xyz×xz×yx\leq y\rightarrow z\times x\leq z\times y for x,y,zAx,y,z\in A.

With a monoid and isotonicity, we define a cost algebra.

Definition 21.

Let A,×,𝟏\langle A,\times,\mathbf{1}\rangle be a monoid where A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\} is isotone. The monoid A,×,𝟏\langle A,\times,\mathbf{1}\rangle is a cost algebra if xA,𝟏x\forall x\in A,\mathbf{1}\leq x for minimization or xA,𝟏x\forall x\in A,\mathbf{1}\geq x for maximization.

5.2 Cost-Algebraic DyPDL Models

To apply cost-algebraic heuristic search, we focus on DyPDL models where cost expressions satisfy particular conditions. First, we define a monoidal DyPDL model, where cost expressions are represented by a binary operator in a monoid.

Definition 22.

Let A,×,𝟏\langle A,\times,\mathbf{1}\rangle be a monoid where A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}. A DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle is monoidal with A,×,𝟏\langle A,\times,\mathbf{1}\rangle if the cost expression of every transition τ𝒯\tau\in\mathcal{T} is represented as 𝖼𝗈𝗌𝗍τ(x,S)=wτ(S)×x\mathsf{cost}_{\tau}(x,S)=w_{\tau}(S)\times x where wτ:𝒮A{,}w_{\tau}:\mathcal{S}\to A\setminus\{-\infty,\infty\} is a numeric expression , and the cost 𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B\mathsf{base\_cost}_{B} of each base case BB\in\mathcal{B} returns a value in A{,}A\setminus\{-\infty,\infty\}.

We also define a cost-algebraic DyPDL model, which requires stricter conditions.

Definition 23.

A monoidal DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with a monoid A,×,𝟏\langle A,\times,\mathbf{1}\rangle is cost-algebraic if A,×,𝟏\langle A,\times,\mathbf{1}\rangle is a cost algebra.

For example, the DP model for TSPTW is cost-algebraic with a cost algebra 0+,+,0\langle\mathbb{Q}_{0}^{+},+,0\rangle since the cost expression of each transition is defined as (x,S)cS[i],j+x(x,S)\mapsto c_{S[i],j}+x with cS[i],j0c_{S[i],j}\geq 0.

When a model is monoidal, we can associate a weight to each edge in the state transition graph. The weight of a path can be computed by repeatedly applying the binary operator to the weights of the edges in the path.

Definition 24.

Given a monoidal DyPDL model with A,×,𝟏\langle A,\times,\mathbf{1}\rangle, the weight of an edge (S,S,τ)(S,S^{\prime},\tau) is wτ(S)w_{\tau}(S). The weight of a path (S,S1,σ1),(S1,S2,σ2),,(Sm1,Sm,σm)\langle(S,S^{1},\sigma_{1}),(S^{1},S^{2},\sigma_{2}),...,(S^{m-1},S^{m},\sigma_{m})\rangle defined by a sequence of transitions σ\sigma is

wσ(S)=wσ1(S)×wσ2(S1)××wσm(Sm1).w_{\sigma}(S)=w_{\sigma_{1}}(S)\times w_{\sigma_{2}}(S^{1})\times...\times w_{\sigma_{m}}(S^{m-1}).

For an empty path \langle\rangle, the weight is 𝟏\mathbf{1}.

The order of applications of the binary operator ×\times does not matter due to the associativity. Differently from the original cost-algebraic heuristic search, the weight of a path corresponding to an SS-solution may not be equal to the cost of the SS-solution in Definition 6 due to our inclusion of the cost of a base state. In the following lemma, we associate the weight of a path with the cost of a solution.

Lemma 1.

Given a monoidal DyPDL model with a monoid A,×,𝟏\langle A,\times,\mathbf{1}\rangle and a state SS, let σ\sigma be an SS-solution. For minimization, 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)=wσ(S)×minB:S[[σ]]CB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S[[σ]])\mathsf{solution\_cost}(\sigma,S)=w_{\sigma}(S)\times\min_{B\in\mathcal{B}:S[\![\sigma]\!]\models C_{B}}\mathsf{base\_cost}_{B}(S[\![\sigma]\!]). For maximization, we replace min\min with max\max.

Proof.

If σ\sigma is an empty sequence, since wσ(S)=𝟏w_{\sigma}(S)=\mathbf{1} and S[[σ]]=SS[\![\sigma]\!]=S,

wσ(S)×minB:S[[σ]]CB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S[[σ]])=minB:SCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)w_{\sigma}(S)\times\min_{B\in\mathcal{B}:S[\![\sigma]\!]\models C_{B}}\mathsf{base\_cost}_{B}(S[\![\sigma]\!])=\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)=\mathsf{solution\_cost}(\sigma,S)

by Definition 6. Otherwise, let σ=σ1,,σm\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle, S1=S[[σ1]]S^{1}=S[\![\sigma_{1}]\!], and Si+1=Si[[σi+1]]S^{i+1}=S^{i}[\![\sigma_{i+1}]\!] for i=1,,m1i=1,...,m-1. Following Definitions 6 and 22,

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)=𝖼𝗈𝗌𝗍σ1(𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,,σm,S1),S)=wσ1(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,,σm,S1).\mathsf{solution\_cost}(\sigma,S)=\mathsf{cost}_{\sigma_{1}}(\mathsf{solution\_cost}(\langle\sigma_{2},...,\sigma_{m}\rangle,S^{1}),S)=w_{\sigma_{1}}(S)\times\mathsf{solution\_cost}(\langle\sigma_{2},...,\sigma_{m}\rangle,S^{1}).

For 2im2\leq i\leq m, we get

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σi,,σm,Si1)=𝖼𝗈𝗌𝗍σi(𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σi+1,,σm,Si),Si1)=wσi(Si1)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σi+1,,σm,Si).\begin{split}\mathsf{solution\_cost}(\langle\sigma_{i},...,\sigma_{m}\rangle,S^{i-1})&=\mathsf{cost}_{\sigma_{i}}(\mathsf{solution\_cost}(\langle\sigma_{i+1},...,\sigma_{m}\rangle,S^{i}),S^{i-1})\\ &=w_{\sigma_{i}}(S^{i-1})\times\mathsf{solution\_cost}(\langle\sigma_{i+1},...,\sigma_{m}\rangle,S^{i}).\end{split}

For i=m+1i=m+1, 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(,Sm)=minB:SmCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(Sm)\mathsf{solution\_cost}(\langle\rangle,S^{m})=\min_{B\in\mathcal{B}:S^{m}\models C_{B}}\mathsf{base\_cost}_{B}(S^{m}). Thus, we get the equation in the lemma by Definition 24. The proof for maximization is similar. ∎

We show that isotonicity is sufficient for the Principle of Optimality in Definition 13. First, we prove its generalized version in Theorem 9. In what follows, we denote the concatenation of sequences of transitions σ\sigma and σ\sigma^{\prime} by σ;σ\langle\sigma;\sigma^{\prime}\rangle.

Theorem 9.

Consider a monoidal DyPDL model with A,×,𝟏\langle A,\times,\mathbf{1}\rangle such that A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\} and AA is isotone. Let SS^{\prime} and S′′S^{\prime\prime} be states reachable from SS with sequences of transitions σ\sigma^{\prime} and σ′′\sigma^{\prime\prime}, respectively, with wσ(S)wσ′′(S)w_{\sigma^{\prime}}(S)\leq w_{\sigma^{\prime\prime}}(S). For minimization, if there exist SS^{\prime}- and S′′S^{\prime\prime}-solutions σ1\sigma^{1} and σ2\sigma^{2} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ1,S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S′′)\mathsf{solution\_cost}(\sigma^{1},S^{\prime})\leq\mathsf{solution\_cost}(\sigma^{2},S^{\prime\prime}), then σ;σ1\langle\sigma^{\prime};\sigma^{1}\rangle and σ′′;σ2\langle\sigma^{\prime\prime};\sigma^{2}\rangle, are SS-solutions with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ;σ1,S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ′′;σ2,S)\mathsf{solution\_cost}(\langle\sigma^{\prime};\sigma^{1}\rangle,S)\leq\mathsf{solution\_cost}(\langle\sigma^{\prime\prime};\sigma^{2}\rangle,S). For maximization, we replace \leq with \geq.

Proof.

The sequences σ;σ1\langle\sigma^{\prime};\sigma^{1}\rangle and σ′′;σ2\langle\sigma^{\prime\prime};\sigma^{2}\rangle are SS-solutions by Definition 6. Since 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ;σ1,S)=wσ(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ1,S)\mathsf{solution\_cost}(\langle\sigma^{\prime};\sigma^{1}\rangle,S)=w_{\sigma^{\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{1},S^{\prime}) and AA is isotone,

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ;σ1,S)=wσ(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ1,S)wσ′′(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ1,S)wσ′′(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S′′)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ′′;σ2,S).\begin{split}\mathsf{solution\_cost}(\langle\sigma^{\prime};\sigma^{1}\rangle,S)&=w_{\sigma^{\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{1},S^{\prime})\leq w_{\sigma^{\prime\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{1},S^{\prime})\\ &\leq w_{\sigma^{\prime\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{2},S^{\prime\prime})=\mathsf{solution\_cost}(\langle\sigma^{\prime\prime};\sigma^{2}\rangle,S).\end{split}

The proof for maximization is similar. ∎

Corollary 1.

Let A,×,𝟏\langle A,\times,\mathbf{1}\rangle be a monoid where A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\} and AA is isotone. A monoidal DyPDL model with A,×,𝟏\langle A,\times,\mathbf{1}\rangle satisfies the Principle of Optimality in Definition 13.

Intuitively, the Principle of Optimality or its sufficient condition, isotonicity, ensures that an optimal path can be constructed by extending an optimal subpath. Thus, a state space search algorithm can discard suboptimal paths to reach each node in the state transition graph. Without this property, a state space search algorithm may need to enumerate suboptimal paths to a node since they may lead to an optimal path to a goal node.

5.3 Formalization of Heuristic Search for DyPDL

A state space search algorithm searches for a path between nodes in a graph. In particular, we focus on unidirectional search algorithms, which visit nodes by traversing edges from one node (the initial node) to find a path to one of the nodes satisfying particular conditions (goal nodes). Moreover, we focus on heuristic search algorithms, which use a heuristic function to estimate the path cost from a state to a goal node using a heuristic function hh. For a node SS, a unidirectional heuristic search algorithms maintains g(S)g(S) (the gg-value), the best path cost from the initial node to SS, and h(S)h(S) (the hh-value), the estimated path cost from SS to a goal node. These values are used in two ways: search guidance and pruning.

For search guidance, typically, the priority of a node SS is computed from g(S)g(S) and h(S)h(S), and the node to visit next is selected based on it. For pruning, a heuristic function needs to be admissible: h(S)h(S) is a lower bound of the shortest path weight from a node SS to a goal node. In the conventional shortest path problem, if a heuristic function hh is admissible, g(S)+h(S)g(S)+h(S) is a lower bound on the weight of a path from the initial node to a goal node via SS. Therefore, when we have found a path from the initial node to a goal node with weight γ¯\overline{\gamma}, we can prune the path to SS if g(S)+h(S)γ¯g(S)+h(S)\geq\overline{\gamma}. With this pruning, a heuristic search algorithm can be considered a branch-and-bound algorithm [72, 93].

While the above two functionalities of a heuristic function are fundamentally different, it is common that a single admissible heuristic function is used for both purposes. In particular, A* [91] visits the node that minimizes the ff-value, f(S)=g(S)+h(S)f(S)=g(S)+h(S). While A* does not explicitly prune paths, if the weights of edges are nonnegative, it never discovers a path σ(S)\sigma(S) such that g(S)+h(S)>γg(S)+h(S)>\gamma^{*}, where γ\gamma^{*} is the shortest path weight from the initial node to a goal node.555While ff^{*} is conventionally used to represent the optimal path weight, we use γ\gamma^{*} to explicitly distinguish it from ff-values. Thus, A* implicitly prunes non-optimal paths while guiding the search with the ff-values. However, in general, we can use different functions for the two purposes, and the one used for search guidance need not be admissible. Such multi-heuristic search algorithms have been developed particularly for the bounded-suboptimal setting, where we want to find a solution whose suboptimality is bounded by a constant factor, and the anytime setting, where we want to find increasingly better solutions until proving optimality [32, 94, 95, 96, 97, 98].

In DyPDL, a dual bound function can be used for search guidance, but we may use a different heuristic function. In this section, we do not introduce heuristic functions for search guidance and do not specify how to select the next node to visit. Instead, we provide a generic heuristic search algorithm that uses a dual bound function only for pruning and discuss its completeness and optimality. To explicitly distinguish pruning from search guidance, for a dual bound function, we use η\eta as in Definition 16 instead of hh and do not use ff.

We show generic pseudo-code of a heuristic search algorithm for a monoidal DyPDL model in Algorithm 1. The algorithm starts from the target state S0S^{0} and searches for a path to a base state by traversing edges in the state transition graph. The open list OO stores candidate states to search. The set GG stores generated states to detect duplicate or dominated states. If the model satisfies isotonicity, with Theorem 9, we just need to consider the best path to each state in terms of the weight. The sequence of transitions σ(S)\sigma(S) represents the best path found so far from the target state S0S^{0} to SS. The gg-value of SS, g(S)g(S), is the weight of the path σ(S)\sigma(S). The function η\eta is a dual bound function, which underestimates the cost of an SS-solution by the η\eta-value of SS, η(S)\eta(S). The best solution found so far, σ¯\overline{\sigma}, and its cost γ¯\overline{\gamma} (i.e., the primal bound) is also maintained. Algorithm 1 is for minimization, and \infty is replaced with -\infty, min\min is replaced with max\max, << is replaced with >>, and \geq and \leq are swapped for maximization. All the theoretical results shown later can be easily adapted to maximization.

Algorithm 1 Heuristic search for minimization with a monoidal DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with A,×,𝟏\langle A,\times,\mathbf{1}\rangle. An approximate dominance relation a\preceq_{a} and a dual bound function η\eta are given as input.
1:if S0⊧̸𝒞S^{0}\not\models\mathcal{C} then return NULL
2:γ¯,σ¯NULL\overline{\gamma}\leftarrow\infty,\overline{\sigma}\leftarrow\text{NULL} \triangleright Initialize the solution.
3:σ(S0)\sigma(S^{0})\leftarrow\langle\rangle, g(S0)𝟏g(S^{0})\leftarrow\mathbf{1} \triangleright Initialize the gg-value.
4:G,O{S0}G,O\leftarrow\{S^{0}\} \triangleright Initialize the open list.
5:while OO\neq\emptyset do
6:     Let SOS\in O \triangleright Select a state.
7:     OO{S}O\leftarrow O\setminus\{S\} \triangleright Remove the state.
8:     if B,SCB\exists B\in\mathcal{B},S\models C_{B} then
9:         current_costg(S)×minB:SCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)\text{current\_cost}\leftarrow g(S)\times\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S) \triangleright Compute the solution cost.
10:         if current_cost<γ¯\text{current\_cost}<\overline{\gamma} then
11:              γ¯current_cost\overline{\gamma}\leftarrow\text{current\_cost}, σ¯σ(S)\overline{\sigma}\leftarrow\sigma(S) \triangleright Update the best solution.
12:              O{SOg(S)×η(S)<γ¯}O\leftarrow\{S^{\prime}\in O\mid g(S^{\prime})\times\eta(S^{\prime})<\overline{\gamma}\} \triangleright Prune states in the open list.          
13:     else
14:         for all τ𝒯(S):S[[τ]]𝒞\tau\in\mathcal{T}^{*}(S):S[\![\tau]\!]\models\mathcal{C} do
15:              gcurrentg(S)×wτ(S)g_{\text{current}}\leftarrow g(S)\times w_{\tau}(S) \triangleright Compute the gg-value.
16:              if SG\not\exists S^{\prime}\in G such that S[[τ]]aSS[\![\tau]\!]\preceq_{a}S^{\prime} and gcurrentg(S)g_{\text{current}}\geq g(S^{\prime}) then
17:                  if gcurrent×η(S[[τ]])<γ¯g_{\text{current}}\times\eta(S[\![\tau]\!])<\overline{\gamma} then
18:                       if SG\exists S^{\prime}\in G such that SaS[[τ]]S^{\prime}\preceq_{a}S[\![\tau]\!] and gcurrentg(S)g_{\text{current}}\leq g(S^{\prime}) then
19:                           GG{S}G\leftarrow G\setminus\{S^{\prime}\}, OO{S}O\leftarrow O\setminus\{S^{\prime}\} \triangleright Remove a dominated state.                        
20:                       σ(S[[τ]])σ(S);τ\sigma(S[\![\tau]\!])\leftarrow\langle\sigma(S);\tau\rangle, g(S[[τ]])gcurrentg(S[\![\tau]\!])\leftarrow g_{\text{current}}
21:                       GG{S[[τ]]}G\leftarrow G\cup\{S[\![\tau]\!]\}, OO{S[[τ]]}O\leftarrow O\cup\{S[\![\tau]\!]\} \triangleright Insert the successor state.                                               
22:return σ¯\overline{\sigma} \triangleright Return the solution.

If the target state S0S^{0} violates the state constraints, the model does not have a solution, so we return NULL (line 1). Otherwise, the open list OO and GG are initialized with S0S^{0} (line 4). The gg-value of S0S^{0} is initialized to 𝟏\mathbf{1} following Definition 24 (line 3). Initially, the solution cost γ¯=\overline{\gamma}=\infty, and σ¯=NULL\overline{\sigma}=\text{NULL} (line 2). When OO is empty, σ¯\overline{\sigma} is returned (line 22). In such a case, the state transition graph is exhausted, and the current solution σ¯\overline{\sigma} is an optimal solution, or the model does not have a solution if σ¯=NULL\overline{\sigma}=\text{NULL}.

When OO is not empty, a state SOS\in O is selected and removed from OO (lines 6 and 7). We do not specify how to select SS in Algorithm 1 as it depends on the concrete heuristic search algorithms implemented. If SS is a base state, σ(S)\sigma(S) is a solution, so we update the best solution if σ(S)\sigma(S) is better (lines 811). If the best solution is updated, we prune a state SS^{\prime} in OO such that g(S)×η(S)g(S^{\prime})\times\eta(S^{\prime}) is not better than the new solution cost since the currently found paths to such states do not lead to a better solution (line 12).

If SS is not a base state, SS is expanded. We define a set of applicable transitions considering forced transitions as

𝒯(S)={{τ}if τ𝒯(S),τ is identified to be a forced transition𝒯(S)otherwise.\mathcal{T}^{*}(S)=\begin{cases}\{\tau\}&\text{if }\exists\tau\in\mathcal{T}(S),\tau\text{ is identified to be a forced transition}\\ \mathcal{T}(S)&\text{otherwise.}\end{cases} (11)

We expect that identifying all forced transitions is not practical but identifying some of them is feasible, e.g., based on sufficient conditions defined by a user as in the talent scheduling example in Section 3.3.3. In the first line, when multiple forced transitions are identified, we assume that one of them is selected. For example, we can select the one defined first in the model in practice. A successor states S[[τ]]S[\![\tau]\!] is generated for each transition τ𝒯(S)\tau\in\mathcal{T}^{*}(S) (line 14). In doing so, successor states violating state constraints are discarded. For each successor state, we check if a state SS^{\prime} that dominates S[[τ]]S[\![\tau]\!] and has a better or equal gg-value is already generated (line 16). In such a case, σ(S)\sigma(S^{\prime}) leads to a better or equal solution, so we prune S[[τ]]S[\![\tau]\!]. Since S[[τ]]S[\![\tau]\!] itself dominates S[[τ]]S[\![\tau]\!], this check also works as duplicate detection. If a dominating state in GG is not detected, and gcurrent×η(S[[τ]])g_{\text{current}}\times\eta(S[\![\tau]\!]) is better than the primal bound (line 17), we insert it into GG and OO (line 21). The best path to S[[τ]]S[\![\tau]\!] is updated to σ(S);τ\langle\sigma(S);\tau\rangle, which is an extension of σ(S)\sigma(S) with τ\tau (line 20). Before doing so, we remove an existing state SS^{\prime} from GG if SS^{\prime} is dominated by S[[τ]]S[\![\tau]\!] with a worse or equal gg-value (line 18).

Algorithm 1 terminates in finite time if the model is finite and cost-algebraic. In addition, even if the model is not cost-algebraic, if it is acyclic, it still terminates in finite time. First, we show the termination for a finite and acyclic model. Intuitively, with such a model, Algorithm 1 enumerates a finite number of paths from the target state, so it eventually terminates.

Theorem 10.

Given a finite, acyclic, and monoidal DyPDL model, Algorithm 1 terminates in finite time.

Proof.

Unless the target state violates the state constraints, the algorithm terminates when OO becomes an empty set. In each iteration of the loop in lines 521, at least one state is removed from OO by line 7. However, multiple successor states can be added to OO in each iteration by line 21. We prove that the number of iterations that reach line 21 is finite. With this property, OO eventually becomes an empty set with finite iterations.

A successor state S[[τ]]S[\![\tau]\!] is inserted into OO if it is not dominated by a state in GG with a better or equal gg-value, and gcurrent×η(S[[τ]])g_{\text{current}}\times\eta(S[\![\tau]\!]) is less than the current solution cost. Suppose that S[[τ]]S[\![\tau]\!] was inserted into OO and GG in line 21, and now the algorithm generates S[[τ]]S[\![\tau]\!] again in line 14. Suppose that gcurrent=g(S)×wτ(S)g(S[[τ]])g_{\text{current}}=g(S)\times w_{\tau}(S)\geq g(S[\![\tau]\!]). If S[[τ]]GS[\![\tau]\!]\in G, then we do not add S[[τ]]S[\![\tau]\!] to OO due to line 18. If S[[τ]]GS[\![\tau]\!]\not\in G, then S[[τ]]S[\![\tau]\!] was removed from GG, so we should have generated a state SS^{\prime} such that S[[τ]]aSS[\![\tau]\!]\preceq_{a}S^{\prime} and g(S)g(S[[τ]])g(S^{\prime})\leq g(S[\![\tau]\!]) (lines 16 and 19). It is possible that SS^{\prime} was also removed from GG, but in such a case, we have another state S′′GS^{\prime\prime}\in G such that S[[τ]]aSaS′′S[\![\tau]\!]\preceq_{a}S^{\prime}\preceq_{a}S^{\prime\prime} and g(S′′)g(S)g(S[[τ]])g(S^{\prime\prime})\leq g(S^{\prime})\leq g(S[\![\tau]\!]), so S[[τ]]S[\![\tau]\!] is not inserted into OO again. Thus, if S[[τ]]S[\![\tau]\!] was ever inserted into GG, then S[[τ]]S[\![\tau]\!] is inserted into OO in line 21 only if gcurrent<g(S[[τ]])g_{\text{current}}<g(S[\![\tau]\!]). We need to find a better path from S0S^{0} to S[[τ]]S[\![\tau]\!]. Since the model is finite and acyclic, the number of paths from S0S^{0} to each state is finite. Therefore, each state is inserted into OO finite times. Since the model is finite, the number of reachable states is finite. By line 14, we only generate reachable states. Thus, we reach line 21 finitely many times. ∎

When the state transition graph contains cycles, there can be an infinite number of paths even if the graph is finite. However, if the model is cost-algebraic, the cost monotonically changes along a path, so having a cycle does not improve a solution. Thus, the algorithm terminates in finite time by enumerating a finite number of acyclic paths. We start with the following lemma, which confirms that the gg-value is the weight of the path from the target state.

Lemma 2.

After line 4 of Algorithm 1, for each state SOS\in O, SS is the target state S0S^{0}, or SS is reachable from S0S^{0} with σ(S)\sigma(S) such that g(S)=wσ(S)(S0)g(S)=w_{\sigma(S)}(S^{0}) at all lines except for 2021.

Proof.

Assume that the following condition holds at the beginning of the current iteration: for each state SOS\in O, SS is the target state S0S^{0} with g(S0)=𝟏g(S^{0})=\mathbf{1}, or SS is reachable from S0S^{0} with σ(S)\sigma(S) and g(S)=wσ(S)(S)g(S)=w_{\sigma(S)}(S). In the first iteration, O={S0}O=\{S^{0}\}, so the assumption holds. When the assumption holds, the condition continues to hold until reaching lines 2021, where the gg-value is updated, and a new state is added to OO. If we reach these lines, a non-base state SS was removed from OO in line 7. Each successor state S[[τ]]S[\![\tau]\!] is reachable from SS with τ\langle\tau\rangle. By the assumption, S=S0S=S^{0}, or SS is reachable from S0S^{0} with σ(S)\sigma(S). Therefore, S[[τ]]S[\![\tau]\!] is reachable from S0S^{0} with σ(S);τ\langle\sigma(S);\tau\rangle. If S[[τ]]S[\![\tau]\!] is inserted into OO, then σ(S[[τ]])=σ(S);τ\sigma(S[\![\tau]\!])=\langle\sigma(S);\tau\rangle. If S=S0S=S^{0},

g(S[[τ]])=g(S0)×wτ(S0)=𝟏×wτ(S0)=wτ(S0)=wσ(S[[τ]])(S0).g(S[\![\tau]\!])=g(S^{0})\times w_{\tau}(S^{0})=\mathbf{1}\times w_{\tau}(S^{0})=w_{\tau}(S^{0})=w_{\sigma(S[\![\tau]\!])}(S^{0}).

If SS is not the target state, since g(S)=wσ(S)(S0)g(S)=w_{\sigma(S)}(S^{0}), by Definition 24,

g(S[[τ]])=g(S)×wτ(S)=wσ(S)(S0)×wτ(S)=wσ(S[[τ]])(S0).g(S[\![\tau]\!])=g(S)\times w_{\tau}(S)=w_{\sigma(S)}(S^{0})\times w_{\tau}(S)=w_{\sigma(S[\![\tau]\!])}(S^{0}).

Thus, S[[τ]]S[\![\tau]\!] is reachable from S0S^{0} with σ(S[[τ]])\sigma(S[\![\tau]\!]) and g(S[[τ]])=wσ(S[[τ]])(S0)g(S[\![\tau]\!])=w_{\sigma(S[\![\tau]\!])}(S^{0}), so the condition holds after line 21. By mathematical induction, the lemma is proved. ∎

Theorem 11.

Given a finite and cost-algebraic DyPDL model, Algorithm 1 terminates in finite time.

Proof.

The proof is almost the same as the proof of Theorem 10. However, now, there may be an infinite number of paths to a state since the state transition graph may contain cycles. We show that the algorithm never considers a path containing cycles when the model is cost-algebraic. Assume that for each state SS, the best-found path σ(S)\sigma(S) is acyclic up to the current iteration. This condition holds at the beginning since σ(S0)=\sigma(S^{0})=\langle\rangle is acyclic. Suppose that the algorithm generates a successor state S[[τ]]S[\![\tau]\!] that is already included in the path σ(S)\sigma(S). Then, S[[τ]]S[\![\tau]\!] was generated before. In addition, S[[τ]]S[\![\tau]\!] is not a base state since it has a successor state on σ(S)\sigma(S). Since σ(S)\sigma(S) is acyclic, S[[τ]]S[\![\tau]\!] is included only once. Let σ(S)=σ1;σ2\sigma(S)=\langle\sigma^{1};\sigma^{2}\rangle where σ1\sigma^{1} is the path from S0S^{0} to S[[τ]]S[\![\tau]\!]. By Lemma 2, we have

gcurrent=g(S)×wτ(S)=wσ1(S0)×wσ2(S[[τ]])×wτ(S).g_{\text{current}}=g(S)\times w_{\tau}(S)=w_{\sigma^{1}}(S^{0})\times w_{\sigma^{2}}(S[\![\tau]\!])\times w_{\tau}(S).

If g(S[[τ]])g(S[\![\tau]\!]) and σ(S[[τ]])\sigma(S[\![\tau]\!]) were updated after S[[τ]]S[\![\tau]\!] was generated with σ(S[[τ]])=σ1\sigma(S[\![\tau]\!])=\sigma^{1}, then a path from S0S^{0} to S[[τ]]S[\![\tau]\!] with a smaller weight was found by line 16. Thus, g(S[[τ]])wσ1(S0)=wσ1(S0)×𝟏g(S[\![\tau]\!])\leq w_{\sigma^{1}}(S^{0})=w_{\sigma^{1}}(S^{0})\times\mathbf{1}. By Definition 21, 𝟏wσ2(S[[τ]])×wτ(S)\mathbf{1}\leq w_{\sigma^{2}}(S[\![\tau]\!])\times w_{\tau}(S). Since AA is isotone,

gcurrent=wσ1(S0)×wσ2(S[[τ]])×wτ(S)wσ1(S0)×𝟏g(S[[τ]]).g_{\text{current}}=w_{\sigma^{1}}(S^{0})\times w_{\sigma^{2}}(S[\![\tau]\!])\times w_{\tau}(S)\geq w_{\sigma^{1}}(S^{0})\times\mathbf{1}\geq g(S[\![\tau]\!]).

Therefore, S[[τ]]S[\![\tau]\!] is not inserted into OO, and σ(S[[τ]])\sigma(S[\![\tau]\!]) remains acyclic. Thus, by mathematical induction, for each state, the number of insertions into OO is at most the number of acyclic paths to that state, which is finite. ∎

We confirm that σ¯\overline{\sigma} is a solution for a model when it is not NULL even during execution. In other words, Algorithm 1 is an anytime algorithm that can return a solution before proving the optimality.

Theorem 12.

After line 11 of Algorithm 1, if σ¯NULL\overline{\sigma}\neq\text{NULL}, then σ¯\overline{\sigma} is a solution for the DyPDL model with γ¯=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ¯)\overline{\gamma}=\mathsf{solution\_cost}(\overline{\sigma}).

Proof.

The solution σ¯\overline{\sigma} is updated in line 11 when a base state SS is removed from OO in line 7. If S=S0S=S^{0}, then σ¯=\overline{\sigma}=\langle\rangle, which is a solution. Since g(S0)=𝟏g(S^{0})=\mathbf{1}, γ¯=minB:S0CB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S0)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ¯)\overline{\gamma}=\min_{B\in\mathcal{B}:S^{0}\models C_{B}}\mathsf{base\_cost}_{B}(S^{0})=\mathsf{solution\_cost}(\overline{\sigma}) by Definition 6. If SS is not the target state, σ¯=σ(S)\overline{\sigma}=\sigma(S), which is a solution since SS is reachable from S0S^{0} with σ(S)\sigma(S) by Lemma 2, and SS is a base state. Since g(S)=wσ(S)(S0)g(S)=w_{\sigma(S)}(S^{0}), it holds that γ¯=wσ(S)(S0)×minB:SCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ¯)\overline{\gamma}=w_{\sigma(S)}(S^{0})\times\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)=\mathsf{solution\_cost}(\overline{\sigma}) by Lemma 1. ∎

Finally, we prove the optimality of Algorithm 1. Our proof is based on the following lemma, whose proof is presented in A.

Lemma 3.

In Algorithm 1, suppose that a solution exists for the DyPDL model, and let γ^\hat{\gamma} be its cost. When reaching line 5, at least one of the following two conditions is satisfied:

  • 1.

    γ¯γ^\overline{\gamma}\leq\hat{\gamma}.

  • 2.

    The open list OO contains a state S^\hat{S} such that an S^\hat{S}-solution σ^\hat{\sigma} exists, and σ(S^);σ^\langle\sigma(\hat{S});\hat{\sigma}\rangle is a solution for the model with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);σ^)γ^\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}.

Theorem 13.

Let A,×,𝟏\langle A,\times,\mathbf{1}\rangle be a monoid where A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\} and AA is isotone. Given a monoidal DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with A,×,𝟏\langle A,\times,\mathbf{1}\rangle, if an optimal solution exists for the model, and Algorithm 1 returns a solution that is not NULL, then the solution is optimal. If Algorithm 1 returns NULL, then the model is infeasible.

Proof.

Suppose that a solution exists, and let γ^\hat{\gamma} be its cost. By Lemma 3, when we reach line 5 with O=O=\emptyset, γ¯γ^\overline{\gamma}\leq\hat{\gamma}. Since γ¯\overline{\gamma}\neq\infty, it holds that σ¯NULL\overline{\sigma}\neq\text{NULL} by line 11. Therefore, if a solution exists, NULL is never returned, i.e., NULL is returned only if the model is infeasible. Suppose that an optimal solution exists, and let γ\gamma^{*} be its cost. Now, consider the above discussion with γ^=γ\hat{\gamma}=\gamma^{*}. When we reach line 5 with O=O=\emptyset, γ¯γ\overline{\gamma}\leq\gamma^{*}. By Theorem 12, σ¯\overline{\sigma} is a solution with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ¯)=γ¯\mathsf{solution\_cost}(\overline{\sigma})=\overline{\gamma}. Since 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ¯)γ\mathsf{solution\_cost}(\overline{\sigma})\geq\gamma^{*}, γ¯=γ\overline{\gamma}=\gamma^{*} and σ¯\overline{\sigma} is an optimal solution. Therefore, if an optimal solution exists, and the algorithm returns a solution, the solution is optimal. ∎

Corollary 2.

Given a finite and cost-algebraic DyPDL model, the model has an optimal solution, or the model is infeasible. A problem to decide if a solution whose cost is less (greater) than a given rational number exists for minimization (maximization) is decidable.

Note that Theorem 13 does not require a model to be finite, acyclic, or cost-algebraic. While the algorithm terminates in finite time if the model is finite and acyclic or cost-algebraic, there is no guarantee in general due to the undecidability in Theorem 1. However, even for such a model, if the algorithm terminates, the optimality or infeasibility is proved.

As shown in the proof of Lemma 3, when an optimal solution exists, a state S^\hat{S} such that there exists an optimal solution extending σ(S^)\sigma(\hat{S}) is included in the open list. For minimization (maximization), by taking the minimum (maximum) g(S)×η(S)g(S)\times\eta(S) value in the open list, we can obtain a dual bound, i.e., a lower (upper) bound on the optimal solution cost.

Theorem 14.

Let A,×,𝟏\langle A,\times,\mathbf{1}\rangle be a monoid where A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\} and AA is isotone. Given a monoidal DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with A,×,𝟏\langle A,\times,\mathbf{1}\rangle, if an optimal solution exists for the model and has the cost γ\gamma^{*}, and OO is not empty in line 5, for minimization,

minSOg(S)×η(S)γ.\min_{S\in O}g(S)\times\eta(S)\leq\gamma^{*}.
Proof.

By Lemma 3, if OO\neq\emptyset, then γ¯=γ\overline{\gamma}=\gamma^{*}, or there exists a state S^O\hat{S}\in O on an optimal solution, i.e., there exists an S^\hat{S}-solution σ^\hat{\sigma} such that σ(S^);σ^\langle\sigma(\hat{S});\hat{\sigma}\rangle is an optimal solution. If γ¯=γ\overline{\gamma}=\gamma^{*}, by lines 12 and 17, minSOg(S)×η(S)<γ\min_{S\in O}g(S)\times\eta(S)<\gamma^{*}. Otherwise, S^O\hat{S}\in O. Since η(S^)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ^,S^)\eta(\hat{S})\leq\mathsf{solution\_cost}(\hat{\sigma},\hat{S}) and AA is isotone,

minSOg(S)×η(S)g(S^)×η(S^)g(S^)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ^,S^)=γ.\min_{S\in O}g(S)\times\eta(S)\leq g(\hat{S})\times\eta(\hat{S})\leq g(\hat{S})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})=\gamma^{*}.

5.4 Heuristic Search Algorithms for DyPDL

We introduce existing heuristic search algorithms as instantiations of Algorithm 1 so that we can use them for DyPDL. In particular, each algorithm differs in how to select a state SS to remove from the open list OO in line 6. In addition to A*, which is the most fundamental heuristic search algorithm, we select anytime algorithms that have been applied to combinatorial optimization problems in problem-specific settings. For detailed descriptions of the algorithms, please refer to the papers that proposed them. Similar to A*, in our configuration, these algorithms use a heuristic function hh and guide the search with the ff-value, which is computed as f(S)=g(S)×h(S)f(S)=g(S)\times h(S), where ×\times is a binary operator such as ++. As we discussed in Section 5.3, hh is not necessarily a dual bound function and not necessarily admissible.

5.4.1 CAASDy: Cost-Algebraic A* Solver for DyPDL

A* selects a state with the best ff-value in lines 7 (i.e., the minimum ff-value for minimization and the maximum ff-value for maximization). If there are multiple states with the best ff-value, one is selected according to a tie-breaking strategy. Among states with the same ff-value, we select a state with the best hh-value (with “best” defined accordingly to the best ff-value). In what follows, if we select a state according to the ff-values in other algorithms, we also assume that a state with the best ff-value is selected, and ties are broken by the hh-values. If there are multiple states with the best ff- and hh-values, we use another tie-breaking strategy, which is not specified here and discussed later when we describe the implementation. We call our solver cost-algebraic A* solver for DyPDL (CAASDy) as we originally proposed it only for cost-algebraic models [74]. However, as shown in Theorems 12 and 13, CAASDy is applicable to monoidal and acyclic models with a monoid A,×,𝟏\langle A,\times,\mathbf{1}\rangle if AA is isotone.

In original A*, if hh is admissible, the first solution found is optimal. Previous work has generalized A* to cost-algebraic heuristic search with this property [92]. In our case, if a model is not cost-algebraic, the first solution may not be an optimal solution. In addition, even if a model is cost-algebraic, our problem setting is slightly different from Edelkamp et al. [92]: a base case has a cost, so the cost of a solution can be different from the weight of the corresponding path. If a model is cost-algebraic and the costs of the base cases do not matter, we can prove that the first solution found by CAASDy is optimal.

Theorem 15.

Given a cost-algebraic DyPDL model with a monoid A,×,𝟏\langle A,\times,\mathbf{1}\rangle, let hh be an admissible heuristic function, i.e., given any reachable state SS and any SS-solution σ\sigma, h(S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)h(S)\leq\mathsf{solution\_cost}(\sigma,S) for minimization. If an optimal solution exists for the model, the costs of base cases are 𝟏\mathbf{1}, i.e., B,𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)=𝟏\forall B\in\mathcal{B},\mathsf{base\_cost}_{B}(S)=\mathbf{1}, and h(S)𝟏h(S)\geq\mathbf{1} for any reachable state SS, then the first found solution by CAASDy is optimal.

Proof.

Let σ¯=σ(S)\overline{\sigma}=\sigma(S) be the first found solution with the cost γ¯\overline{\gamma} in line 11. Since minB:SCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)=𝟏\min\limits_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)=\mathbf{1}, γ¯=g(S)\overline{\gamma}=g(S). Since 𝟏h(S)minB:SCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)=𝟏\mathbf{1}\leq h(S)\leq\min\limits_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)=\mathbf{1}, f(S)=g(S)×h(S)=g(S)=γ¯f(S)=g(S)\times h(S)=g(S)=\overline{\gamma}. If σ(S)\sigma(S) is not an optimal solution, by Lemma 3, OO contains a state S^O\hat{S}\in O such that there exists an S^\hat{S}-solution σ^\hat{\sigma} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);σ^)=γ<γ¯\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)=\gamma^{*}<\overline{\gamma}. Since AA is isotone, 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);σ^)=g(S^)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ^,S^)g(S^)×h(S^)\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)=g(\hat{S})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})\geq g(\hat{S})\times h(\hat{S}). Therefore,

f(S^)=g(S^)×h(S^)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);σ^)<γ¯=f(S).f(\hat{S})=g(\hat{S})\times h(\hat{S})\leq\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)<\overline{\gamma}=f(S).

Thus, S^\hat{S} should have been expanded before SS, which is a contradiction. ∎

5.4.2 Depth-First Branch-and-Bound (DFBnB)

Theorem 15 indicates a potential disadvantage of A*: it does not find any feasible solutions before proving the optimality, which may take a long time. In practice, having a subopitmal solution is better than having no solution. In some combinatorial optimization problems, we can find a solution by applying a fixed number of transitions. For example, in talent scheduling, any sequence of scenes is a solution. With this observation, prioritizing depth in the state transition graph possibly leads to quickly finding a solution.

Depth-first branch-and-bound (DFBnB) expands states in the depth-first order, i.e., a state SS that maximizes |σ(S)||\sigma(S)| is selected in line 6. Concretely, the open list OO is implemented as a stack, and the state on the top of the stack (the state added to OO most recently) is selected in line 6. Successor states of the same state have the same priority, so ties are broken by the ff-values. DFBnB has been applied to state-based formulations of combinatorial optimization problems such as the sequential ordering problem (SOP) [88], the traveling salesperson problem (TSP), and single machine scheduling [99, 100].

5.4.3 Cyclic-Best First Search (CBFS)

In DFBnB, in terms of search guidance, ff-values are used to break ties between successor states of the same state. Cyclic-best first search (CBFS) [101] is similar to DFBnB, but it relies more on ff-values to guide the search and can be viewed as a hybridization of A* and DFBnB. CBFS partitions the open list OO into layers OiO_{i} for each depth ii. A state SS is inserted into OiO_{i} if σ(S)\sigma(S) has ii transitions. At the beginning, O0={S0}O_{0}=\{S^{0}\} and Oi=O_{i}=\emptyset for i>0i>0. Starting with i=0i=0, if OiO_{i}\neq\emptyset, CBFS selects a state having the best priority from OiO_{i} in line 6 and inserts successor states into Oi+1O_{i+1} in line 21. We use the ff-value as the priority. After that, CBFS increases ii by 11. However, when ii is the maximum depth, CBFS resets ii to 0 instead of incrementing it. The maximum depth is usually known in a problem-specific setting, but we do not use a fixed parameter in our setting. Instead, we set ii to 0 when a new best solution is found after line 11, or Oj=O_{j}=\emptyset for all jij\geq i. In problem specific settings, CBFS was used in single machine scheduling [101] and the simple assembly line balancing problem (SALBP-1) [102, 103].

5.4.4 Anytime Column Search (ACS)

Anytime column search (ACS) [99] can be considered a generalized version of CBFS, expanding bb states at each depth. ACS also partitions the open list OO into OiO_{i} for each depth ii and selects a state from OiO_{i} in line 6, starting with i=0i=0 and O0={S0}O_{0}=\{S_{0}\}. ACS increases ii by 11 after removing bb states from OiO_{i} or when OiO_{i} becomes empty, where bb is a parameter. We remove the best bb states according to the ff-values.

Anytime column progressive search (ACPS) [99] is a non-parametric version of ACS, which starts from b=1b=1 and increases bb by 11 when it reaches the maximum depth. Similarly to CBFS, we set ii to 0 and increase bb by 11 when a new best solution is found or ji,Oj=\forall j\geq i,O_{j}=\emptyset. For combinatorial optimization, ACS and ACPS were evaluated on TSP [99].

5.4.5 Anytime Pack Search (APS)

Anytime pack search (APS) [100] hybridizes A* and DFBnB in a different way from CBFS and ACS. It maintains the set of the best states ObOO_{b}\subseteq O, initialized with {S0}\{S^{0}\}, the set of the best successor states OcOO_{c}\subseteq O, and a suspend list OsOO_{s}\subseteq O. APS expands all states from ObO_{b} in line 6 and inserts the best bb successor states according to a priority into OcO_{c} and other successor states into OsO_{s}. When there are fewer than bb successor states, all of them are inserted into OcO_{c}. After expanding all states in ObO_{b}, APS swaps ObO_{b} and OcO_{c} and continues the procedure. If ObO_{b} and OcO_{c} are empty, the best bb states are moved from OsO_{s} to ObO_{b}. We use the ff-value as the priority to select states.

Anytime pack progressive search (APPS) [100] starts from b=1b=1 and increases bb by δ\delta if b<b¯b<\overline{b} when the best bb states are moved from OsO_{s} to ObO_{b}, where δ\delta and b¯\overline{b} are parameters. We use δ=1\delta=1 and b¯=\overline{b}=\infty following the configuration in TSP and single machine scheduling [100].

5.4.6 Discrepancy-Based Search

We consider a search strategy originating from the CP community, discrepancy-based search [104], which assumes that successor states of the same state are assigned priorities based on some estimation. In our case, we use the ff-value as the priority. The idea of discrepancy-based search is that the estimation may make mistakes, but the number of mistakes made by a strong guidance heuristic is limited. For each path, the discrepancy, the number of deviations from the estimated best path, is maintained. In our case, the target state has a discrepancy of 0. When a state SS has a discrepancy of dd, the successor states with the best priority has the discrepancy of dd. Other successor states have the discrepancy of d+1d+1. Discrepancy-based search algorithms search states whose discrepancy is smaller than an upper bound and iteratively increase the upper bound.

Discrepancy-bounded depth-first search (DBDFS) [105] performs depth-first search that only expands states having the discrepancy between (i1)k(i-1)k and ik1ik-1 inclusive, where ii starts from 11 and increases by 11 when all states within the range are expanded, and kk is a parameter. The open list is partitioned into two sets O0O_{0} and O1O_{1}, and O0={S0}O_{0}=\{S^{0}\} and O1=O_{1}=\emptyset at the beginning. A state is selected from O0O_{0} in line 6. Successor states with the discrepancy between (i1)k(i-1)k and ik1ik-1 are added to O0O_{0}, and other states are added to O1O_{1}. When O0O_{0} becomes empty, it is swapped with O1O_{1}, and ii is increased by 11. The discrepancy of states in O1O_{1} is ikik because the discrepancy is increased by at most 11 at a successor state. Therefore, after swapping O0O_{0} with O1O_{1}, the discrepancy of states in O0O_{0} falls between the new bounds, ikik and (i+1)k1(i+1)k-1. For depth-first search, when selecting a state to remove from O0O_{0}, we break ties by the ff-values. We use k=1k=1 in our configuration. Discrepancy-based search was originally proposed as tree search [104, 105] and later applied to state space search for SOP [88].

5.4.7 Complete Anytime Beam Search (CABS)

While the above algorithms except for A* are similar to depth-first search, we consider beam search, a heuristic search algorithm that searches the state transition graph layer by layer, similar to breadth-first search. Previous work in heuristic search has shown that breadth-first search can be beneficial in saving memory by discarding states in previous layers [106]. However, breadth-first search may take a long time to find a solution, particularly in our setting. In DP models for combinatorial optimization problems such as TSPTW and talent scheduling, all solutions have the same length, and an algorithm needs to reach the last layer to find a solution. For such settings, breadth-first search may need to expand many states before reaching the last layer. Beam search mitigates this issue by expanding at most bb states at each layer, where bb is a parameter called a beam width, while losing completeness.

Since beam search cannot be considered an instantiation of Algorithm 1, we provide dedicated pseudo-code in Algorithm 2. Beam search maintains states in the same layer, i.e., states that are reached with the same number of transitions, in the open list OO, which is initialized with the target state. Beam search expands all states in OO, inserts the best bb successor states into OO, and discards the remaining successor states, where bb is a parameter called a beam width. Beam search may discard all successor states leading to solutions, so it is incomplete, i.e., it may not find a solution.

Algorithm 2 Beam search for minimization with a monoidal DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with A,×,𝟏\langle A,\times,\mathbf{1}\rangle. An approximate dominance relation a\preceq_{a}, a dual bound function η\eta, a primal bound γ¯\overline{\gamma}, and a beam width bb are given as input.
1:if S0⊧̸𝒞S^{0}\not\models\mathcal{C} then return NULL, \top
2:σ¯NULL\overline{\sigma}\leftarrow\text{NULL}, complete\text{complete}\leftarrow\top \triangleright Initialize the solution.
3:l0l\leftarrow 0, σl(S0)\sigma^{l}(S^{0})\leftarrow\langle\rangle, gl(S0)𝟏g^{l}(S^{0})\leftarrow\mathbf{1} \triangleright Initialize the gg-value.
4:O{S0}O\leftarrow\{S^{0}\} \triangleright Initialize the open list.
5:while OO\neq\emptyset and σ¯=NULL\overline{\sigma}=\text{NULL} do
6:     GG\leftarrow\emptyset \triangleright Initialize the set of states.
7:     for all SOS\in O do
8:         if B,SCB\exists B\in\mathcal{B},S\models C_{B} then
9:              current_costgl(S)×minB:SCB𝖻𝖺𝗌𝖾_𝖼𝗈𝗌𝗍B(S)\text{current\_cost}\leftarrow g^{l}(S)\times\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S) \triangleright Compute the solution cost.
10:              if current_cost<γ¯\text{current\_cost}<\overline{\gamma} then
11:                  γ¯current_cost\overline{\gamma}\leftarrow\text{current\_cost}, σ¯σl(S)\overline{\sigma}\leftarrow\sigma^{l}(S) \triangleright Update the best solution.               
12:         else
13:              for all τ𝒯(S):S[[τ]]𝒞\tau\in\mathcal{T}^{*}(S):S[\![\tau]\!]\models\mathcal{C} do
14:                  gcurrentgl(S)×wτ(S)g_{\text{current}}\leftarrow g^{l}(S)\times w_{\tau}(S) \triangleright Compute the gg-value.
15:                  if SG\not\exists S^{\prime}\in G such that S[[τ]]aSS[\![\tau]\!]\preceq_{a}S^{\prime} and gcurrentgl+1(S)g_{\text{current}}\geq g^{l+1}(S^{\prime}) then
16:                       if gcurrent×η(S[[τ]])<γ¯g_{\text{current}}\times\eta(S[\![\tau]\!])<\overline{\gamma} then
17:                           if SG\exists S^{\prime}\in G such that SaS[[τ]]S^{\prime}\preceq_{a}S[\![\tau]\!] and gcurrentgl+1(S)g_{\text{current}}\leq g^{l+1}(S^{\prime}) then
18:                                GG{S}G\leftarrow G\setminus\{S^{\prime}\} \triangleright Remove a dominated state.                            
19:                           σl+1(S[[τ]])σl(S);τ\sigma^{l+1}(S[\![\tau]\!])\leftarrow\langle\sigma^{l}(S);\tau\rangle, gl+1(S[[τ]])gcurrentg^{l+1}(S[\![\tau]\!])\leftarrow g_{\text{current}} \triangleright Update the gg-value.
20:                           GG{S[[τ]]}G\leftarrow G\cup\{S[\![\tau]\!]\} \triangleright Insert the successor state.                                                                      
21:     ll+1l\leftarrow l+1 \triangleright Proceed to the next layer.
22:     O{SGgl(S)×η(S)<γ¯}O\leftarrow\{S\in G\mid g^{l}(S)\times\eta(S)<\overline{\gamma}\} \triangleright Prune states by the bound.
23:     if |O|>b|O|>b then
24:         OO\leftarrow the best bb states in OO, complete\text{complete}\leftarrow\bot \triangleright Keep the best bb states.      
25:if complete and OO\neq\emptyset then
26:     complete\text{complete}\leftarrow\bot \triangleright A better solution may exist.
27:return σ¯\overline{\sigma}, complete \triangleright Return the solution.
Algorithm 3 CABS for minimization with a monoidal DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with A,×,𝟏\langle A,\times,\mathbf{1}\rangle. An approximate dominance relation a\preceq_{a} and a dual bound function η\eta are given as input.
1:γ¯\overline{\gamma}\leftarrow\infty, σ¯NULL\overline{\sigma}\leftarrow\text{NULL}, b1b\leftarrow 1 \triangleright Initialization.
2:loop
3:     σ,complete\sigma,\text{complete}\leftarrow BeamSearch(𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle, a\preceq_{a}, η\eta, γ¯\overline{\gamma}, bb) \triangleright Execute Algorithm 2.
4:     if 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ)<γ¯\mathsf{solution\_cost}(\sigma)<\overline{\gamma} then
5:         γ¯𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ)\overline{\gamma}\leftarrow\mathsf{solution\_cost}(\sigma), σ¯σ\overline{\sigma}\leftarrow\sigma \triangleright Update the solution.      
6:     if complete then
7:         return σ¯\overline{\sigma} \triangleright Return the solution.      
8:     b2bb\leftarrow 2b \triangleright Double the beam width.

Complete anytime beam search (CABS) [107] is a complete version of beam search. In our configuration, CABS iteratively executes beam search with a beam width bb starting from b=1b=1 and doubles bb after each iteration until finding an optimal solution or proving infeasibility. With this strategy, CABS tries to quickly find a solution with small bb and eventually converges to breadth-first search when bb is large enough. Note that this configuration follows Libralesso et al. [88, 108], who applied CABS to SOP and permutation flowshop. Originally, Zhang [107] considered a generalized version of beam search, i.e., successor states inserted into OO are decided by a user-provided pruning rule, which can be different from selecting the best bb states. In this generalized version, CABS repeats beam search while relaxing the pruning rule and terminates when it finds a satisfactory solution according to some criterion.

In Algorithm 2, beam search maintains the set of states in the current layer using the open list OO, and the set of states in the next layer using GG. The open list OO is updated to GG after generating all successor states while pruning states based on the bound (line 22). If OO contains more than bb states, only the best bb states are kept. This operation may prune optimal solutions, so the flag complete, which indicates the completeness of beam search, becomes \bot. Beam search terminates when OO becomes empty, or a solution is found (line 5). Even if complete=\text{complete}=\top, and a solution is found, when OO is not empty, we may miss a better solution (line 25). Therefore, we update complete to \bot in such a case (line 26). We can derive the maximization version of Algorithm 2 in a similar way as Algorithm 1.

Beam search in Algorithm 2 has several properties that are different from Algorithm 1.

  1. 1.

    The set GG, which is used to detect dominance, contains only states in the next layer.

  2. 2.

    A state SS may be dropped from the open list OO even if g(S)×η(S)<γ¯g(S)\times\eta(S)<\overline{\gamma} (line 24).

  3. 3.

    Beam search may terminate when a solution is found even if OO\neq\emptyset.

With Property (1), beam search can potentially save memory compared to Algorithm 1. This method can be considered layered duplicate detection as proposed in previous work [106]. With this strategy, we do not detect duplicates when the same states appear in different layers. When a generated successor state S[[τ]]S[\![\tau]\!] in the next layer is the same as a state SS^{\prime} in the current layer, in line 19, we do not want to update g(S)g(S^{\prime}) and σ(S)\sigma(S^{\prime}) since we do not check if a better path to SS^{\prime} is found. Thus, we maintain ll, which is incremented by 1 after each layer (line 21), and use glg^{l} and σl\sigma^{l} to differentiate gg and σ\sigma for different layers.

Our layered duplicate detection mechanism prevents us from using beam search when the state transition graph contains cycles; beam search cannot store states found in the previous layers, so it continues to expand states in a cycle. This issue can be addressed by initializing GG with {S0}\{S^{0}\} outside the while loop, e.g., just after line 4, and removing line 6. With this modification, beam search can be used for a cyclic but cost-algebraic DyPDL model.

By Properties (2) and (3), there is no guarantee that beam search proves the optimality or infeasibility unless complete=\text{complete}=\top. However, CABS (Algorithm 3) has the guarantee of optimality as it repeats beam search until complete becomes \top. In what follows, we formalize the above points. Once again, we present the theoretical results for minimization, but they can be easily adapted to maximization.

Theorem 16.

Given a finite, acyclic, and monoidal DyPDL model, beam search terminates in finite time.

Proof.

Suppose that we have generated a successor state S[[τ]]S[\![\tau]\!], which was generated before. The difference from Algorithm 1 is that we need to consider is Property (1). If S[[τ]]S[\![\tau]\!] was generated before as a successor state of a state in the current layer, by the proof of Theorem 10, there exists a state SGS^{\prime}\in G with S[[τ]]aSS[\![\tau]\!]\preceq_{a}S^{\prime}. The successor state S[[τ]]S[\![\tau]\!] is inserted into GG again only if we find a better path to S[[τ]]S[\![\tau]\!]. If S[[τ]]S[\![\tau]\!] was generated before as a successor state of a state in a previous layer, the path to S[[τ]]S[\![\tau]\!] at that time was shorter (in terms of the number of transitions) than that of the current path. Thus, the current path is different from the previous path. The successor state S[[τ]]S[\![\tau]\!] may be inserted into GG since it is not included in GG. In either case, S[[τ]]S[\![\tau]\!] is inserted into GG again only if we find a new path to it. Since the number of paths to S[[τ]]S[\![\tau]\!] is finite, we insert S[[τ]]S[\![\tau]\!] into GG finite times. The rest of the proof follows that of Theorem 10. ∎

As we discussed above, with a slight modification, we can remove Property (1) and prove the termination of beam search for a cost-algebraic DyPDL model.

Since Properties (1)–(3) do not affect the proofs of Lemma 2 and Theorem 12, the following theorem holds.

Theorem 17.

After line 11 of Algorithm 2, if σ¯NULL\overline{\sigma}\neq\text{NULL}, then σ¯\overline{\sigma} is a solution for the model with γ¯=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ¯)\overline{\gamma}=\mathsf{solution\_cost}(\overline{\sigma}).

We also prove the optimality of beam search when complete=\text{complete}=\top.

Theorem 18.

Let A,×,𝟏\langle A,\times,\mathbf{1}\rangle be a monoid where A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\} and AA is isotone. Given a monoidal DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with A,×,𝟏\langle A,\times,\mathbf{1}\rangle and γ¯A\overline{\gamma}\in A, if an optimal solution exists for the model, and beam search returns σ¯NULL\overline{\sigma}\neq\text{NULL} and complete=\text{complete}=\top, then σ¯\overline{\sigma} is an optimal solution. If beam search returns σ¯=NULL\overline{\sigma}=\text{NULL} and complete=\text{complete}=\top, then there does not exist a solution whose cost is less than γ¯\overline{\gamma}.

Proof.

When complete=\text{complete}=\top is returned, during the execution, beam search never reached lines 24 and 26. Therefore, we can ignore Properties (2) and (3). If we modify Algorithm 2 so that GG contains states in all layers as discussed above, we can also ignore Property (1). By ignoring Properties (1)–(3), we can consider beam search as an instantiation of Algorithm 1. If the model is infeasible, or an optimal solution exists with the cost γ\gamma^{*} and γ¯>γ\overline{\gamma}>\gamma^{*} at the beginning, the proof is exactly the same as that of Theorem 13. If γ¯γ\overline{\gamma}\leq\gamma^{*} was given as input, beam search has never updated σ¯\overline{\sigma} and γ¯\overline{\gamma}, and NULL is returned if it terminates. In such a case, indeed, no solution has a cost less than γ¯γ\overline{\gamma}\leq\gamma^{*}.

The above proof is for beam search with the modification. We confirm that it is also valid with beam search in Algorithm 2 without modification, i.e., we consider Property (1). The proof of Theorem 13 depends on Lemma 3, which claims that when a solution with a cost γ^\hat{\gamma} exists and γ¯>γ^\overline{\gamma}>\hat{\gamma}, the open list contains a state S^\hat{S} such that there exists an S^\hat{S}-solution σ^=σ^1,,σ^m\hat{\sigma}=\langle\hat{\sigma}_{1},...,\hat{\sigma}_{m}\rangle with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);σ^)γ^\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}. At the beginning, S^=S0\hat{S}=S^{0} exists in OO. When such a state S^\hat{S} exists in the current layer, it is expanded. First, we show that a successor state of S^\hat{S} satisfying the condition is generated. If no applicable forced transitions are identified in S^\hat{S}, a successor state S^[[σ^1]]\hat{S}[\![\hat{\sigma}_{1}]\!] with σ(S^[[σ^1]])=σ(S^);σ^1\sigma(\hat{S}[\![\hat{\sigma}_{1}]\!])=\langle\sigma(\hat{S});\hat{\sigma}_{1}\rangle is generated, and

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^[[σ^1]]);σ^2,,σ^m)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);σ^)γ^.\mathsf{solution\_cost}(\langle\sigma(\hat{S}[\![\hat{\sigma}_{1}]\!]);\hat{\sigma}_{2},...,\hat{\sigma}_{m}\rangle)=\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}.

If an applicable forced transitions is identified, only one successor S^[[τ]]\hat{S}[\![\tau]\!] is generated with a forced transition τ\tau. By Definition 17, there exists an S^\hat{S}-solution τ;σ^\langle\tau;\hat{\sigma}^{\prime}\rangle with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(τ,σ^,S^)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ^,S^)\mathsf{solution\_cost}(\langle\tau,\hat{\sigma}^{\prime}\rangle,\hat{S})\leq\mathsf{solution\_cost}(\hat{\sigma},\hat{S}). Since AA is isotone,

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^[[τ]]);σ^)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);τ;σ^)=wσ(S^)(S0)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(τ,σ^,S^)wσ(S^)(S0)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ^,S^)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);σ^)γ^.\begin{split}\mathsf{solution\_cost}(\langle\sigma(\hat{S}[\![\tau]\!]);\hat{\sigma}^{\prime}\rangle)&=\mathsf{solution\_cost}(\langle\sigma(\hat{S});\tau;\hat{\sigma}^{\prime}\rangle)=w_{\sigma(\hat{S})}(S^{0})\times\mathsf{solution\_cost}(\langle\tau,\hat{\sigma}^{\prime}\rangle,\hat{S})\\ &\leq w_{\sigma(\hat{S})}(S^{0})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})=\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}.\end{split}

If the successor state S^[[τ]]\hat{S}[\![\tau]\!] (or S^[[σ^1]]\hat{S}[\![\hat{\sigma}_{1}]\!]) is not inserted into GG in line 20, another state SGS^{\prime}\in G dominates S^[[τ]]\hat{S}[\![\tau]\!] with a better or equal gg-value, so there exists a solution extending σ(S)\sigma(S^{\prime}) with the cost at most γ^\hat{\gamma}. Thus, SS^{\prime} can be considered a new S^\hat{S}. When S^[[τ]]\hat{S}[\![\tau]\!] or SS^{\prime} is removed from GG by line 18, another state S′′S^{\prime\prime} that dominates S^[[τ]]\hat{S}[\![\tau]\!] or SS^{\prime} with a better or equal gg-value is inserted into GG, and there exists a solution extending σ(S′′)\sigma(S^{\prime\prime}) with the cost at most γ^\hat{\gamma}. ∎

For CABS, since beam search returns an optimal solution or proves the infeasibility when complete=\text{complete}=\top by Theorem 18, the optimality is straightforward by line 6 of Algorithm 3.

Corollary 3.

Let A,×,𝟏\langle A,\times,\mathbf{1}\rangle be a monoid where A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\} and AA is isotone. Given a monoidal DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with A,×,𝟏\langle A,\times,\mathbf{1}\rangle, if an optimal solution exists for the model, and CABS returns a solution that is not NULL, then it is an optimal solution. If CABS returns NULL, then the model is infeasible.

We prove that CABS terminates when a DyPDL model is finite, monoidal, and acyclic.

Theorem 19.

Given a finite, acyclic, and monoidal DyPDL model, CABS terminates in finite time.

Proof.

When the beam width bb is sufficiently large, e.g., equal to the number of reachable states in the model, beam search never reaches line 24. Since the number of reachable states is finite, bb eventually becomes such a large number with finite iterations. Suppose that we call beam search with sufficiently large bb. If complete=\text{complete}=\top is returned, we are done. Otherwise, beam search should have found a new solution whose cost is better than γ¯\overline{\gamma} in line 11 and reached line 26. In this case, there exists a solution for the model. Since the state transition graph is finite and acyclic, there are a finite number of solutions, and there exists an optimal solution with the cost γ\gamma^{*}. Since γ¯\overline{\gamma} decreases after each call if complete=\text{complete}=\bot, eventually, γ¯\overline{\gamma} becomes γ\gamma^{*}, and complete=\text{complete}=\top is returned with finite iterations. By Theorem 16, each call of beam search terminates in finite time. Therefore, CABS terminates in finite time. ∎

To obtain a dual bound from beam search, we need to slightly modify Theorem 14; since beam search may discard states leading to optimal solutions, we need to keep track of the minimum (or maximum for maximization) gl(S)×η(S)g^{l}(S)\times\eta(S) value for all discarded states in addition to states in OO.

Theorem 20.

Let A,×,𝟏\langle A,\times,\mathbf{1}\rangle be a monoid where A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\} and AA is isotone. Given a monoidal DyPDL model 𝒱,S0,𝒯,,𝒞\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle with A,×,𝟏\langle A,\times,\mathbf{1}\rangle and γ¯A\overline{\gamma}\in A, let DmD_{m} be the set of states dropped in layer ml1m\leq l-1 by line 24 of Algorithm 2. If an optimal solution for the model exists and has the cost γ\gamma^{*}, just after line 22,

min{γ¯,minSOgl(S)×η(S),minm=1,,l1minSDmgm(S)×η(S)}γ\min\left\{\overline{\gamma},\min_{S\in O}g^{l}(S)\times\eta(S),\min_{m=1,...,l-1}\min_{S\in D_{m}}g^{m}(S)\times\eta(S)\right\}\leq\gamma^{*}

where we assume minSOgl(S)×η(S)=\min_{S\in O}g^{l}(S)\times\eta(S)=\infty if O=O=\emptyset and minSDmgm(S)×η(S)=\min_{S\in D_{m}}g^{m}(S)\times\eta(S)=\infty if Dm=D_{m}=\emptyset.

Proof.

If γ¯γ\overline{\gamma}\leq\gamma^{*}, the inequality holds trivially, so we assume γ¯>γ\overline{\gamma}>\gamma^{*}. We prove that there exists a state S^Om=1l1Dm\hat{S}\in O\cup\bigcup_{m=1}^{l-1}D_{m} on an optimal path, i.e., there exists an S^\hat{S}-solution σ^\hat{\sigma} such that σm(S^);σ^\langle\sigma^{m}(\hat{S});\hat{\sigma}\rangle is an optimal solution where m{0,,l}m\in\{0,...,l\}. Initially, O={S0}O=\{S^{0}\}, so the condition is satisfied. Suppose that a state S^\hat{S} on an optimal path is included in OO just before line 7. If S^\hat{S} is a base state, we reach line 9, and current_cost=γ\text{current\_cost}=\gamma^{*}. Since γ¯>γ\overline{\gamma}>\gamma^{*}, γ¯\overline{\gamma} is updated to γ\gamma^{*} in line 11. Then, γ¯=γγ\overline{\gamma}=\gamma^{*}\leq\gamma^{*} will hold after line 22. If S^\hat{S} is not a base state, by a similar argument to the proof of Theorem 18 (or Theorem 13 if we consider the modified version where states in all layers are kept in GG), a state on an optimal path, SS^{\prime}, will be included in GG just before line 22. Since gl(S)×η(S)γ<γ¯g^{l}(S^{\prime})\times\eta(S^{\prime})\leq\gamma^{*}<\overline{\gamma}, SOS^{\prime}\in O holds after line 22, and minSOgl(S)×η(S)gl(S)×η(S)γ\min_{S\in O}g^{l}(S)\times\eta(S)\leq g^{l}(S^{\prime})\times\eta(S^{\prime})\leq\gamma^{*}. After line 24, SS^{\prime} will be included in either OO or DlD_{l}, which can be considered a new S^\hat{S}. Suppose that S^Dm\hat{S}\in D_{m} just before line 7. Since S^\hat{S} is never removed from DmD_{m}, minSDmgm(S)×η(S)gm(S^)×η(S^)γ\min_{S\in D_{m}}g^{m}(S)\times\eta(S)\leq g^{m}(\hat{S})\times\eta(\hat{S})\leq\gamma^{*} always holds. By mathematical induction, the theorem is proved. ∎

6 DyPDL Models for Combinatorial Optimization Problems

To show the flexibility of DyPDL, in addition to TSPTW and talent scheduling, we formulate DyPDL models for NP-hard combinatorial optimization problems from different application domains such as routing, packing, scheduling, and manufacturing. We select problem classes whose DyPDL models can be solved by our heuristic search solvers. The models are monoidal with isotonicity, which guarantees the optimality of the heuristic search solvers, as shown in Theorem 13 and Corollary 3. While some models are cost-algebraic and others are not, all of them are acyclic, so the heuristic search solvers terminate in finite time, as shown in Theorems 10 and 19. We diversify the problem classes with the following criteria:

  • 1.

    Both minimization and maximization problems are included.

  • 2.

    DyPDL models with different binary operators for the cost expressions are included: addition (++) and taking the maximum (max\max).

  • 3.

    Each of DIDP, MIP, and CP outperforms the others in at least one problem class as shown in Section 7.

We present DyPDL models for six problem classes satisfying the above criteria in this section and three in B. For some of the problem classes, problem-specific DP approaches were previously proposed, and our DyPDL models are based on them. To concisely represent the models, we only present the Bellman equations since all models are finite and acyclic and satisfy the Principle of Optimality in Definition 13. The YAML-DyPDL files for the models are publicly available in our repository.666https://github.com/Kurorororo/didp-models

6.1 Capacitated Vehicle Routing Problem (CVRP)

In the capacitated vehicle routing problem (CVRP) [109], customers N={0,,n1}N=\{0,...,n-1\}, where 0 is the depot, are given, and each customer iN{0}i\in N\setminus\{0\} has the demand di0d_{i}\geq 0. A solution is a tour to visit each customer in N{0}N\setminus\{0\} exactly once using mm vehicles, which start from and return to the depot. The sum of demands of customers visited by a single vehicle must be less than or equal to the capacity qq. We assume diqd_{i}\leq q for each iNi\in N. Visiting customer jj from ii requires the travel time cij0c_{ij}\geq 0, and the objective is to minimize the total travel time. CVRP is strongly NP-hard since it generalizes TSP [110].

We formulate the DyPDL model based on the giant-tour representation [24]. We sequentially construct tours for the mm vehicles. Let UU be a set variable representing unvisited customers, ii be an element variable representing the current location, ll be a numeric variable representing the current load, and kk be a numeric variable representing the number of used vehicles. Both ll and kk are resource variables where less is preferred. At each step, one customer jj is visited by the current vehicle or a new vehicle. When a new vehicle is used, jj is visited via the depot, ll is reset, and kk is increased. Similar to TSPTW, let cjin=minkN{j}ckjc^{\text{in}}_{j}=\min_{k\in N\setminus\{j\}}c_{kj} and cjout=minkN{j}cjkc^{\text{out}}_{j}=\min_{k\in N\setminus\{j\}}c_{jk}.

compute V(N{0},0,0,1)\displaystyle\text{compute }V(N\setminus\{0\},0,0,1) (12)
V(U,i,l,k)={if (mk+1)q<l+jUdjci0else if U=min{minjU:l+djqcij+V(U{j},j,l+dj,k)minjUci0+c0j+V(U{j},j,dj,k+1)else if jU,l+djqk<mminjU:l+djqcij+V(U{j},j,l+dj,k)else if jU:l+djqminjUci0+c0j+V(U{j},j,dj,k+1)else if k<melse\displaystyle V(U,i,l,k)=\begin{cases}\infty&\text{if }(m-k+1)q<l+\sum\limits_{j\in U}d_{j}\\ c_{i0}&\text{else if }U=\emptyset\\ \min\left\{\begin{array}[]{l}\min\limits_{j\in U:l+d_{j}\leq q}c_{ij}+V(U\setminus\{j\},j,l+d_{j},k)\\ \min\limits_{j\in U}c_{i0}+c_{0j}+V(U\setminus\{j\},j,d_{j},k+1)\end{array}\right.&\text{else if }\exists j\in U,l+d_{j}\leq q\land k<m\\ \min\limits_{j\in U:l+d_{j}\leq q}c_{ij}+V(U\setminus\{j\},j,l+d_{j},k)&\text{else if }\exists j\in U:l+d_{j}\leq q\\ \min\limits_{j\in U}c_{i0}+c_{0j}+V(U\setminus\{j\},j,d_{j},k+1)&\text{else if }k<m\\ \infty&\text{else}\end{cases} (13)
V(U,i,l,k)V(U,i,l,k)if llkk\displaystyle V(U,i,l,k)\leq V(U,i,l^{\prime},k^{\prime})~{}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\text{if }l\leq l^{\prime}\land k\leq k^{\prime} (14)
V(U,i,l,k)max{jU{0}cjin,jU{i}cjout}.\displaystyle V(U,i,l,k)\geq\max\left\{\sum_{j\in U\cup\{0\}}c^{\text{in}}_{j},\sum_{j\in U\cup\{i\}}c^{\text{out}}_{j}\right\}. (15)

The first line of Equation (13) represents a state constraint: in a state, if the sum of capacities of the remaining vehicles ((mk+1)q(m-k+1)q) is less than the sum of the current load (ll) and the demands of the unvisited customers (jUdj\sum_{j\in U}d_{j}), it does not lead to a solution. The second line is a base case where all customers are visited. The model has two types of transitions: directly visiting customer jj, which is applicable when the current vehicle has sufficient space (l+djql+d_{j}\leq q), and visiting jj with a new vehicle from the depot, which is applicable when there is an unused vehicle (k<mk<m). The third line is active when both of them are possible, and the fourth and fifth lines are active when only one of them is possible. Recall that a state SS dominates another state SS^{\prime} iff for any SS^{\prime}-solution, there exists an equal or better SS-solution with an equal or shorter length in Definition 14. If lll\leq l^{\prime} and kkk\leq k^{\prime}, any (U,i,l,k)(U,i,l^{\prime},k^{\prime})-solution is also a (U,i,l,k)(U,i,l,k)-solution, so the dominance implied by Inequality (14) satisfies this condition. Inequality (15) is a dual bound function defined in the same way as Inequality (6) of the DyPDL model for TSPTW. Similar to the DyPDL model for TSPTW, this model is cost-algebraic with a cost algebra 0+,+,0\langle\mathbb{Q}^{+}_{0},+,0\rangle. However, since the base cost ci0c_{i0} is not necessarily zero, the first solution found by CAASDy may not be optimal (Theorem 15).

6.2 Multi-Commodity Pickup and Delivery TSP (m-PDTSP)

A one-to-one multi-commodity pickup and delivery traveling salesperson problem (m-PDTSP) [111] is to pick up and deliver commodities using a single vehicle. Similar to CVRP, m-PDTSP is a generalization of TSP and is strongly NP-hard. In this problem, customers N={0,,n1}N=\{0,...,n-1\}, edges AN×NA\subseteq N\times N, and commodities M={0,,m1}M=\{0,...,m-1\} are given. The vehicle can visit customer jj directly from customer ii with the travel time cij0c_{ij}\geq 0 if (i,j)A(i,j)\in A. Each commodity kMk\in M is picked up at customer pkNp_{k}\in N and delivered to customer dkNd_{k}\in N. The load increases (decreases) by wkw_{k} at pkp_{k} (dkd_{k}) and must not exceed the capacity qq. The vehicle starts from 0, visits each customer once, and stops at n1n-1. We assume that cyclic dependencies between commodities, e.g., pk=dkp_{k}=d_{k^{\prime}} and pk=dkp_{k^{\prime}}=d_{k}, do not exist.

We propose a DyPDL model based on the 1-PDTSP reduction [112] and the DP model by Castro et al. [113]. In a state, a set variable UU represents the set of unvisited customers, an element variable ii represents the current location, and a numeric resource variable ll represents the current load. The net change of the load at customer jj is represented by δj=kM:pk=jwkkM:dk=jwk\delta_{j}=\sum_{k\in M:p_{k}=j}w_{k}-\sum_{k\in M:d_{k}=j}w_{k}, and the customers that must be visited before jj is represented by Pj={pkkM:dk=j}P_{j}=\{p_{k}\mid k\in M:d_{k}=j\}, both of which can be precomputed. The set of customers that can be visited next is X(U,i,l)={jU(i,j)Al+δjqPjU=}X(U,i,l)=\{j\in U\mid(i,j)\in A\land l+\delta_{j}\leq q\land P_{j}\cap U=\emptyset\}. Let cjin=minkN:(k,j)Ackjc^{\text{in}}_{j}=\min_{k\in N:(k,j)\in A}c_{kj} and cjout=minkN:(j,k)Acjkc^{\text{out}}_{j}=\min_{k\in N:(j,k)\in A}c_{jk}.

compute V(N{0,n1},0,0)\displaystyle\text{compute }V(N\setminus\{0,n-1\},0,0) (16)
V(U,i,l)={ci,n1if U=(i,n1)AminjX(U,i,l)cij+V(U{j},j,l+δj)else if X(U,i,l)else\displaystyle V(U,i,l)=\begin{cases}c_{i,n-1}&\text{if }U=\emptyset\land(i,n-1)\in A\\ \min\limits_{j\in X(U,i,l)}c_{ij}+V(U\setminus\{j\},j,l+\delta_{j})&\text{else if }X(U,i,l)\neq\emptyset\\ \infty&\text{else}\end{cases} (17)
V(U,i,l)V(U,i,l)if ll\displaystyle V(U,i,l)\leq V(U,i,l^{\prime})\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\ \ \text{if }l\leq l^{\prime} (18)
V(U,i,l)max{jU{n1}cjin,jU{i}cjout}.\displaystyle V(U,i,l)\geq\max\left\{\sum_{j\in U\cup\{n-1\}}c^{\text{in}}_{j},\sum_{j\in U\cup\{i\}}c^{\text{out}}_{j}\right\}. (19)

Similarly to CVRP, Inequalities (18) and (19) represent resource variables and a dual bound function, the model is cost-algebraic with a cost algebra 0+,+,0\langle\mathbb{Q}^{+}_{0},+,0\rangle, and the base cost ci,n1c_{i,n-1} is not necessary zero.

6.3 Orienteering Problem with Time Windows (OPTW)

In the orienteering problem with time windows (OPTW) [114], customers N={0,,n1}N=\{0,...,n-1\} are given, where 0 is the depot. Visiting customer jj from ii requires the travel time cij>0c_{ij}>0 while producing the integer profit pj0p_{j}\geq 0. Each customer jj can be visited only in the time window [aj,bj][a_{j},b_{j}], and the vehicle needs to wait until aja_{j} upon earlier arrival. The objective is to maximize the total profit while starting from the depot at time t=0t=0 and returning to the depot by b0b_{0}. OPTW is strongly NP-hard since it is a generalization of the orienteering problem, which is NP-hard [115].

Our DyPDL model is similar to the DP model by Righini and Salani [26] but designed for DIDP with forced transitions and a dual bound function. A set variable UU represents the set of customers to visit, an element variable ii represents the current location, and a numeric resource variable tt represents the current time, where less is preferred. We visit customers one by one using transitions. Customer jj can be visited next if it can be visited and the depot can be reached by the deadline after visiting jj. Let cijc^{*}_{ij} be the shortest travel time from ii to jj. Then, the set of customers that can be visited next is X(U,i,t)={jUt+cijbjt+cij+cj0b0}X(U,i,t)=\{j\in U\mid t+c_{ij}\leq b_{j}\land t+c_{ij}+c^{*}_{j0}\leq b_{0}\}. In addition, we remove a customer that can no longer be visited using a forced transition. If t+cij>bjt+c^{*}_{ij}>b_{j}, then we can no longer visit customer jj. If t+cij+cj0>b0t+c^{*}_{ij}+c^{*}_{j0}>b_{0}, then we can no longer return to the depot after visiting jj. Thus, the set of unvisited customers that can no longer be visited is represented by Y(U,i,t)={jUt+cij>bjt+cij+cj0>b0}Y(U,i,t)=\{j\in U\mid t+c^{*}_{ij}>b_{j}\lor t+c^{*}_{ij}+c^{*}_{j0}>b_{0}\}. The set Y(U,i,t)Y(U,i,t) is not necessarily equivalent to UX(U,i,t)U\setminus X(U,i,t) since it is possible that jj cannot be visited directly from ii but can be visited via another customer when the triangle inequality does not hold.

If we take the sum of profits over UY(U,i,t)U\setminus Y(U,i,t), we can compute an upper bound on the value of the current state. In addition, we use another upper bound considering the remaining time limit b0tb_{0}-t. We consider a relaxed problem, where the travel time to customer jj is always cjin=minkN{j}ckjc^{\text{in}}_{j}=\min_{k\in N\setminus\{j\}}c_{kj}. This problem can be viewed as the well-known 0-1 knapsack problem [116, 117], which is to maximize the total profit of items included in a knapsack such that the total weight of the included items does not exceed the capacity of the knapsack. Each customer jUY(U,i,t)j\in U\setminus Y(U,i,t) is an item with the profit pjp_{j} and the weight cjinc^{\text{in}}_{j}, and the capacity of the knapsack is b0tc0inb_{0}-t-c^{\text{in}}_{0} since we need to return to the depot. Then, we can use the Dantzig upper bound [118], which sorts items in the descending order of the efficiency ejin=pj/cjine^{\text{in}}_{j}=p_{j}/c^{\text{in}}_{j} and includes as many items as possible. When an item kk exceeds the remaining capacity qq, then it is included fractionally, i.e., the profit is increased by qekin\lfloor qe^{\text{in}}_{k}\rfloor. This procedural upper bound is difficult to represent efficiently with the current YAML-DyPDL due to its declarative nature. Therefore, we further relax the problem by using maxjUY(U,i,t)ejin\max_{j\in U\setminus Y(U,i,t)}e^{\text{in}}_{j} as the efficiencies of all items, i.e., we use (b0tc0in)maxjUY(U,i,t)ejin\lfloor(b_{0}-t-c^{\text{in}}_{0})\max_{j\in U\setminus Y(U,i,t)}e^{\text{in}}_{j}\rfloor as an upper bound. Similarly, based on cjout=minkN{j}cjkc^{\text{out}}_{j}=\min_{k\in N\setminus\{j\}}c_{jk}, the minimum travel time from jj, we also use (b0tciout)maxjUY(U,i,t)ejout\lfloor(b_{0}-t-c^{\text{out}}_{i})\max_{j\in U\setminus Y(U,i,t)}e^{\text{out}}_{j}\rfloor where ejout=pj/cjoute^{\text{out}}_{j}=p_{j}/c^{\text{out}}_{j}.

compute V(N{0},0,0)\displaystyle\text{compute }V(N\setminus\{0\},0,0) (20)
V(U,i,t)={0if t+ci0b0U=V(U{j},i,t)else if jY(U,i,t)V(U{j},i,t)else if jUX(U,i,t)=maxjX(U,i,t)pj+V(U{j},j,max{t+cij,aj})else if X(U,i,t)else\displaystyle V(U,i,t)=\begin{cases}0&\text{if }t+c_{i0}\leq b_{0}\land U=\emptyset\\ V(U\setminus\{j\},i,t)&\text{else if }\exists j\in Y(U,i,t)\\ V(U\setminus\{j\},i,t)&\text{else if }\exists j\in U\land X(U,i,t)=\emptyset\\ \max\limits_{j\in X(U,i,t)}p_{j}+V(U\setminus\{j\},j,\max\{t+c_{ij},a_{j}\})&\text{else if }X(U,i,t)\neq\emptyset\\ -\infty&\text{else}\end{cases} (21)
V(U,i,t)V(U,i,t)if tt\displaystyle V(U,i,t)\geq V(U,i,t^{\prime})~{}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\text{if }t\leq t^{\prime} (22)
V(U,i,t)min{jUY(U,i,t)pj,(b0tc0in)maxjUY(U,i,t)ejin,(b0tciout)maxjUY(U,i,t)ejout}.\displaystyle V(U,i,t)\leq\min\left\{\sum_{j\in U\setminus Y(U,i,t)}p_{j},\left\lfloor(b_{0}-t-c^{\text{in}}_{0})\max_{j\in U\setminus Y(U,i,t)}e^{\text{in}}_{j}\right\rfloor,\left\lfloor(b_{0}-t-c^{\text{out}}_{i})\max_{j\in U\setminus Y(U,i,t)}e^{\text{out}}_{j}\right\rfloor\right\}. (23)

The second line of Equation (21) removes an arbitrary customer jj in Y(U,i,t)Y(U,i,t), which is a forced transition. The third line also defines a forced transition to remove a customer jj in UU when no customer can be visited directly (X(U,i,t)=X(U,i,t)=\emptyset); in such a case, even if jUY(U,i,t)j\in U\setminus Y(U,i,t), i.e., j+cijbjj+c^{*}_{ij}\leq b_{j}, the shortest path to customer jj is not available. The base case (the first line of Equation (21)) becomes active when all customers are visited or removed. This condition forces the vehicle to visit as many customers as possible. Since each transition removes one customer from UU, and all customers must be removed in a base state, all (U,i,t)(U,i,t)- and (U,i,t)(U,i,t^{\prime})-solutions have the same length. If ttt\leq t^{\prime}, more customers can potentially be visited, so (U,i,t)(U,i,t) leads to an equal or better solution than (U,i,t)(U,i,t^{\prime}). Thus, the dominance implied by Inequality (22) satisfies Definition 14. The cost expressions are represented by the addition of nonnegative values, so the model is monoidal with a monoid 0+,+,0\langle\mathbb{Q}_{0}^{+},+,0\rangle, and 0+\mathbb{Q}_{0}^{+} is isotone. However, the model is not cost-algebraic since it is maximization and x0+,0x\forall x\in\mathbb{Q}_{0}^{+},0\geq x does not hold. Thus, the first solution found by CAASDy may not be optimal.

6.4 Bin Packing

In the bin packing problem [116], items N={0,,n1}N=\{0,...,n-1\} are given, and each item ii has weight wiw_{i}. The objective is to pack items in bins with the capacity qq while minimizing the number of bins. We assume qwiq\geq w_{i} for each iNi\in N. Bin packing is strongly NP-hard [119].

In our DyPDL model, we pack items one by one. A set variable UU represents the set of unpacked items, and a numeric resource variable rr represents the remaining space in the current bin, where more is preferred. In addition, we use an element resource variable kk representing the number of used bins, where less is preferred. The model breaks symmetry by packing item ii in the ii-th or an earlier bin. Thus, X(U,r,k)={iUrwii+1k}X(U,r,k)=\{i\in U\mid r\geq w_{i}\land i+1\geq k\} represents items that can be packed in the current bin. When jU,r<wj\forall j\in U,r<w_{j}, then a new bin is opened, and any item in Y(U,k)={iUik}Y(U,k)=\{i\in U\mid i\geq k\} can be packed; it is a forced transition.

For a dual bound function, we use lower bounds, LB1, LB2, and LB3, used by Johnson [120]. The first lower bound, LB1, is (iUwir)/q\left\lceil(\sum_{i\in U}w_{i}-r)/q\right\rceil, which relaxes the problem by allowing splitting an item across multiple bins. The second lower bound, LB2, only considers items in {iUwiq/2}\{i\in U\mid w_{i}\geq q/2\}. If wi>q/2w_{i}>q/2, item ii cannot be packed with other items considered. If wi=q/2w_{i}=q/2, at most one additional item jj with wj=q/2w_{j}=q/2 can be packed. Let ai=1a_{i}=1 if wi>q/2w_{i}>q/2 and ai=0a_{i}=0 otherwise. Let bi=1/2b_{i}=1/2 if wi=q/2w_{i}=q/2 and bi=0b_{i}=0 otherwise. The number of bins is lower bounded by iUai+iUbi𝟙(rq2)\sum_{i\in U}a_{i}+\left\lceil\sum_{i\in U}b_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{2}\right), where 𝟙\mathbbm{1} is an indicator function that returns 1 if the given condition is true and 0 otherwise. The last term considers the case when an item with wiq/2w_{i}\geq q/2 can be packed in the current bin. Similarly, LB3 only considers items in {iUwiq/3}\{i\in U\mid w_{i}\geq q/3\}. Let c=1c=1 if wi>2q/3w_{i}>2q/3, ci=2/3c_{i}=2/3 if wi=2q/3w_{i}=2q/3, ci=1/2c_{i}=1/2 if q/3<wi<2q/3q/3<w_{i}<2q/3, ci=1/3c_{i}=1/3 if wi=q/3w_{i}=q/3, and ci=0c_{i}=0 otherwise. The number of bins is lower bounded by iUci𝟙(rq3)\left\lceil\sum_{i\in U}c_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{3}\right).

compute V(N,0,0)\displaystyle\text{compute }V(N,0,0) (24)
V(U,r,k)={0if U=1+V(U{i},qwi,k+1)else if iY(U,k)jU,r<wjminiX(U,r,k)V(U{i},rwi,k)else if X(U,r,k)else\displaystyle V(U,r,k)=\begin{cases}0&\text{if }U=\emptyset\\ 1+V(U\setminus\{i\},q-w_{i},k+1)&\text{else if }\exists i\in Y(U,k)\land\forall j\in U,r<w_{j}\\ \min\limits_{i\in X(U,r,k)}V(U\setminus\{i\},r-w_{i},k)&\text{else if }X(U,r,k)\neq\emptyset\\ \infty&\text{else}\end{cases} (25)
V(U,r,k)V(U,r,k)if rrkk\displaystyle V(U,r,k)\leq V(U,r^{\prime},k^{\prime})~{}\quad\quad\quad\quad\quad\quad\quad\quad\text{if }r\geq r^{\prime}\land k\leq k^{\prime} (26)
V(U,r,k)max{(iUwir)/qiUai+iUbi𝟙(rq2)iUci𝟙(rq3).\displaystyle V(U,r,k)\geq\max\left\{\begin{array}[]{l}\left\lceil(\sum_{i\in U}w_{i}-r)/q\right\rceil\\ \sum_{i\in U}a_{i}+\left\lceil\sum_{i\in U}b_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{2}\right)\\ \left\lceil\sum_{i\in U}c_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{3}\right).\end{array}\right. (30)

Since each transition packs one item, any (U,r,k)(U,r,k)- and (U,r,k)(U,r^{\prime},k^{\prime}) solutions have the same length. It is easy to see that (U,r,k)(U,r,k) leads to an equal or better solution than (U,r,k)(U,r^{\prime},k^{\prime}) if rrr\geq r^{\prime} and kkk\leq k^{\prime}, so the dominance implied by Inequality (26) is valid. This model is cost-algebraic with a cost algebra 0+,+,0\langle\mathbb{Z}^{+}_{0},+,0\rangle. Since the base cost is always zero, the first solution found by CAASDy is optimal by Theorem 15.

6.5 Simple Assembly Line Balancing Problem (SALBP-1)

The variant of the simple assembly line balancing problem (SALBP) called SALBP-1 [121, 122] is the same as bin packing except for precedence constraints. In SALBP-1, we are given set of tasks N={0,,n1}N=\{0,...,n-1\}, and each task ii has a processing time wiw_{i}. A task is scheduled in a station, and the sum of the processing times of tasks in a station must not exceed the cycle time qq. Stations are ordered, and each task must be scheduled in the same or later station than its predecessors PiNP_{i}\subseteq N. SALBP-1 is strongly NP-hard since it is a generalization of bin packing [123].

We formulate a DyPDL model based on that of bin packing and inspired by a problem-specific heuristic search method for SALBP-1 [102, 103]. Due to the precedence constraint, we cannot schedule an arbitrary item when we open a station unlike bin packing. Thus, we do not use an element resource variable kk. Now, the set of tasks that can be scheduled in the current station is represented by X(U,r)={iUrwiPiU=}X(U,r)=\{i\in U\mid r\geq w_{i}\land P_{i}\cap U=\emptyset\}. We introduce a transition to open a new station only when X(U,r)=X(U,r)=\emptyset, which is called a maximum load pruning rule in the literature [124, 125]. Since bin packing is a relaxation of SALBP-1, we can use the dual bound function for bin packing.

compute V(N,0)\displaystyle\text{compute }V(N,0) (31)
V(U,r)={0if U=1+V(U,q)else if X(U,r)=miniX(U,r)V(U{i},rwi)else\displaystyle V(U,r)=\begin{cases}0&\text{if }U=\emptyset\\ 1+V(U,q)&\text{else if }X(U,r)=\emptyset\\ \min\limits_{i\in X(U,r)}V(U\setminus\{i\},r-w_{i})&\text{else}\end{cases} (32)
V(U,r)V(U,r)if rr\displaystyle V(U,r)\leq V(U,r^{\prime})~{}~{}~{}\quad\quad\quad\quad\quad\quad\quad\text{if }r\geq r^{\prime} (33)
V(U,r)max{(iUwir)/qiUai+iUbi𝟙(rq2)iUci𝟙(rq3).\displaystyle V(U,r)\geq\max\left\{\begin{array}[]{l}\left\lceil(\sum_{i\in U}w_{i}-r)/q\right\rceil\\ \sum_{i\in U}a_{i}+\left\lceil\sum_{i\in U}b_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{2}\right)\\ \left\lceil\sum_{i\in U}c_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{3}\right).\end{array}\right. (37)

The length of a (U,r)(U,r)-solution is the sum of |U||U| and the number of stations opened, which is the cost of that solution. Therefore, if rrr\geq r^{\prime}, then state (U,r)(U,r) leads to an equal or better and shorter solution than (U,r)(U,r^{\prime}), so the dominance implied by Inequality (33) is valid. Similar to bin packing, this model is cost-algebraic, and the base cost is zero, so the first solution found by CAASDy is optimal.

6.6 Graph-Clear

In the graph-clear problem [126], an undirected graph (N,E)(N,E) with the node weight aia_{i} for iNi\in N and the edge weight bijb_{ij} for {i,j}E\{i,j\}\in E is given. In the beginning, all nodes are contaminated. In each step, one node can be made clean by sweeping it using aia_{i} robots and blocking each edge {i,j}\{i,j\} using bijb_{ij} robots. However, while sweeping a node, an already swept node becomes contaminated if it is connected by a path of unblocked edges to a contaminated node. The optimal solution minimizes the maximum number of robots per step to make all nodes clean. This optimization problem is NP-hard since finding a solution whose cost is smaller than a given value is NP-complete [126].

Previous work [19] proved that there exists an optimal solution in which a swept node is never contaminated again. Based on this observation, the authors developed a state-based formula as the basis for MIP and CP models. We use the state-based formula directly as a DyPDL model. A set variable CC represents swept nodes, and one node in NCN\setminus C is swept at each step. We block all edges connected to cc and all edges from contaminated nodes to already swept nodes. We assume that bij=0b_{ij}=0 if {i,j}E\{i,j\}\notin E.

compute V()\displaystyle\text{compute }V(\emptyset) (38)
V(C)={0if C=NmincNCmax{ac+iNbci+iCj(NC){c}bij,V(C{c})}else\displaystyle\begin{split}&V(C)=\begin{cases}0&\text{if }C=N\\ \min\limits_{c\in N\setminus C}\max\left\{a_{c}+\sum\limits_{i\in N}b_{ci}+\sum\limits_{i\in C}\sum\limits_{j\in(N\setminus C)\setminus\{c\}}b_{ij},V(C\cup\{c\})\right\}&\text{else}\end{cases}\end{split} (39)
V(C)0.\displaystyle V(C)\geq 0. (40)

Viewing the maximum of two values (max\max) as a binary operator, 0+,max,0\langle\mathbb{Z}_{0}^{+},\max,0\rangle is a monoid since max{x,y}0+\max\{x,y\}\in\mathbb{Z}_{0}^{+}, max{x,max{y,z}}=max{max{x,y},z}\max\{x,\max\{y,z\}\}=\max\{\max\{x,y\},z\}, and max{x,0}=max{0,x}=x\max\{x,0\}=\max\{0,x\}=x for x,y,z0+x,y,z\in\mathbb{Z}_{0}^{+}. It is isotone since xymax{x,z}max{y,z}x\leq y\rightarrow\max\{x,z\}\leq\max\{y,z\} and xymax{z,x}max{z,y}x\leq y\rightarrow\max\{z,x\}\leq\max\{z,y\}. Since x0+,0x\forall x\in\mathbb{Z}_{0}^{+},0\leq x, 0+,max,0\langle\mathbb{Z}_{0}^{+},\max,0\rangle is a cost algebra, so the DyPDL model is cost-algebraic. Since the base cost is always zero, the first solution found by CAASDy is optimal.

7 Experimental Evaluation

We implement and experimentally evaluate DIDP solvers using the heuristic search algorithms described in Section 5.4. We compare our DIDP solvers with commercial MIP and CP solvers, Gurobi 11.0.2 [127] and IBM ILOG CP Optimizer 22.1.0 [20]. We select state-of-the-art MIP and CP models in the literature when multiple models exist and develop a new model when we do not find an existing one. We also compare DIDP with existing state-based general-purpose solvers, domain-independent AI planners, a logic programming language, and a decision diagram-based (DD-based) solver.

7.1 Software Implementation of DIDP

We develop didp-rs v0.7.0,777https://github.com/domain-independent-dp/didp-rs/releases/tag/v0.7.0 a software implementation of DIDP in Rust. It has four components, dypdl,888https://crates.io/crates/dypdl dypdl-heuristic-search,999https://crates.io/crates/dypdl-heuristic-search didp-yaml,101010https://crates.io/crates/didp-yaml and DIDPPy.111111https://didppy.readthedocs.io The library dypdl is for modeling, and dypdl-heuristic-search is a library for heuristic search solvers. The commandline interface didp-yaml takes YAML-DyPDL domain and problem files and a YAML file specifying a solver as input and returns the result. DIDPPy is a Python interface whose modeling capability is equivalent to didp-yaml. In our experiment, we use didp-yaml.

As DIDP solvers, dypdl-heuristic-search implements CAASDy, DFBnB, CBFS, ACPS, APPS, DBDFS, and CABS. These solvers can handle monoidal DyPDL models with a monoid A,×,𝟏\langle A,\times,\mathbf{1}\rangle where A{,}A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}, ×{+,max}\times\in\{+,\max\}, and 𝟏=0\mathbf{1}=0 if ×=+\times=+ or 𝟏\mathbf{1} is the minimum value in AA if ×=max\times=\max.

In all solvers, we use the dual bound function provided with a DyPDL model as a heuristic function. Thus, f(S)=g(S)×h(S)=g(S)×η(S)f(S)=g(S)\times h(S)=g(S)\times\eta(S). By Theorem 14, the best ff-value in the open list is a dual bound. In CAASDy, states in the open list are ordered by the ff-values in a binary heap, so a dual bound can be obtained by checking the top of the binary heap. Similarly, in DFBnB, CBFS, and ACPS, since states with each depth are ordered by the ff-values, by keeping track of the best ff-value in each depth, we can compute a dual bound. In APPS, when the set of the best states ObO_{b} and the set of the best successor states OcO_{c} become empty, the best ff-value of states in the suspend list OsO_{s} is a dual bound, where states are ordered by the ff-values. In DBDFS, we keep track of the best ff-value of states inserted into O1O_{1} and use it as a dual bound when O0O_{0} becomes empty. In CABS, based on Theorem 20, the best ff-value of discarded states is maintained, and a dual bound is computed after generating all successor states in a layer. In CAASDy, CBFS, ACPS, APPS, and CABS, when the ff- and hh-values of two states are the same, the tie is broken according to the implementation of the binary heap that is used to implement the open list. In DFBnB and DBDFS, the open list is implemented with a stack, and successor states are sorted before being pushed to the stack, so the tie-breaking depends on the implementation of the sorting algorithm. While a dual bound function is provided in each DyPDL model used in our experiment, it is not required in general; when no dual bound function is provided, the DIDP solvers use the gg-value instead of the ff-value to guide the search and do not perform pruning.

As explained in Section 4.1.5, forced transitions can be explicitly defined in a DyPDL model. If such transitions are applicable in a state, our solvers keeps only the first defined one τ\tau in the set of applicable transitions, i.e., 𝒯(S)={τ}\mathcal{T}^{*}(S)=\{\tau\} in Algorithms 1 and 2. Otherwise, no forced transitions are considered, i.e., 𝒯(S)=𝒯(S)\mathcal{T}^{*}(S)=\mathcal{T}(S).

7.2 Benchmarks

We describe benchmark instances and MIP and CP models for each problem class. Except for TSPTW, DyPDL models are presented in Section 6 and B. All benchmark instances are in text format, so they are converted to YAML-DyPDL problem files by a Python script. All instances in one problem class share the same YAML-DyPDL domain file except for the multi-dimensional knapsack problem, where the number of state variables depends on an instance, and thus a domain file is generated for each instance by the Python script. All instances generated by us, MIP and CP models, YAML-DyPDL domain files, and the Python scripts are available from our repository.121212https://github.com/Kurorororo/didp-models

7.2.1 TSPTW

For TSPTW, we use 340 instances from Dumas et al. [23], Gendreau et al. [128], Ohlmann and Thomas [129], and Ascheuer [130], where travel times are integers; while didp-rs can handle floating point numbers, the CP solver we use, CP Optimizer, does not. In these instances, the deadline to return to the depot, b0b_{0}, is defined, but iN,bi+ci0b0\forall i\in N,b_{i}+c_{i0}\leq b_{0} holds, i.e., we can always return to the depot after visiting the final customer. Thus, in our DyPDL model (Equation (1) with redundant information in Inequalities (3), (4), and (6)), b0b_{0} is not considered. For MIP, we use Formulation (1) proposed by Hungerländer and Truden [10]. When there are zero-cost edges, flow-based subtour elimination constraints [131] are added. We adapt a CP model for a single machine scheduling problem with time windows and sequence-dependent setup times [11] to TSPTW, where an interval variable represents the time to visit a customer. We change the objective to the sum of travel costs (setup time in their model) and add a 𝖥𝗂𝗋𝗌𝗍\mathsf{First} constraint ensuring that the depot is visited first.

7.2.2 CVRP

We use 207 instances in A, B, D, E, F, M, P, and X sets from CVRPLIB [132]. We use the DyPDL model in Section 6.1, a MIP model proposed by Gadegaard and Lysgaard [12], and a CP model proposed by Rabbouch et al. [13].

7.2.3 m-PDTSP

We use 1178 instances from Hernández-Pérez and Salazar-González [111], which are divided into class1, class2, and class3 sets. We use the DyPDL model in Section 6.2, the MCF2C+IP formulation for MIP [14], and the CP model proposed by Castro et al. [113]. In all models, unnecessary edges are removed by a preprocessing method [14].

7.2.4 OPTW

We use 144 instances from Righini and Salani [133, 25], Montemanni and Gambardella [134], and Vansteenwegen et al. [135]. In these instances, service time sis_{i} spent at each customer ii is defined, so we incorporate it in the travel time, i.e., we use si+cijs_{i}+c_{ij} as the travel time from ii to jj. We use the MIP model described in Vansteenwegen et al. [136]. For CP, we develop a model similar to that of TSPTW, described in B.5.

7.2.5 Multi-Dimensional Knapsack Problem (MDKP)

We use 276 instances of the multi-dimensional knapsack problem (MDKP) [116, 117] from OR-Library [137], excluding one instance that has fractional item weights; while the DIDP solvers can handle fractional weights, the CP solver does not. We use a DyPDL model in B.1.1 and the MIP model described in Cacchiani et al. [138]. For CP, we develop a model using the 𝖯𝖺𝖼𝗄\mathsf{Pack} global constraint [139] for each dimension (see B.1.2).

7.2.6 Bin Packing

We use 1615 instances in BPPLIB [140], proposed by Falkenauer [141] (Falkenauer U and Falkenauer T), Scholl et al. [142] (Scholl 1, Scholl 2, and Scholl 3), Wäscher and Gau [143] (Wäscher), Schwerin and Wäscher [144] (Schwerin 1 and Schwerin 2), and Schoenfield [145] (Hard28). We use the DyPDL model in Section 6.4 and the MIP model by Martello and Toth [116] extended with inequalities ensuring that bins are used in order of index and item jj is packed in the jj-th bin or earlier as described in Delorme et al. [146]. We implement a CP model using 𝖯𝖺𝖼𝗄\mathsf{Pack} while ensuring that item jj is packed in bin jj or before. For MIP and CP models, the upper bound on the number of bins is computed by the first-fit decreasing heuristic. We show the CP model in B.6.

7.2.7 SALBP-1

We use 2100 instances proposed by Morrison et al. [103]. We use the DyPDL model in Section 6.5 and the NF4 formulation for MIP [15]. Our CP model is based on Bukchin and Raviv [16] but is implement with the global constraint 𝖯𝖺𝖼𝗄\mathsf{Pack} in CP Optimizer as it performs better than the original model (see B.7). In addition, the upper bound on the number of stations is computed in the same way as the MIP model instead of using a heuristic.

7.2.8 Single Machine Total Weighted Tardiness

We use 375 instances of single machine scheduling to minimize total weighted tardiness (1||wiTi1||\sum w_{i}T_{i}) [147] in OR-Library [137] with 40, 50, and 100 jobs. We use a DyPDL model in B.2.1 and the formulation with assignment and positional date variables (F4) for MIP [17]. For CP, we formulate a model using interval variables, as described in B.2.2. We extract precedence relations between jobs using the method proposed by Kanet [148] and incorporate them into the DyPDL and CP models but not into the MIP model as its performance is not improved.

7.2.9 Talent Scheduling

Garcia de la Banda and Stuckey [28] considered instances with 8, 10, 12, 14, 16, 18, 20, 22 actors and 16, 18, …, 64 scenes, resulting in 200 configurations in total. For each configuration, they randomly generated 100 instances. We use the first five instances for each configuration, resulting in 1000 instances in total. We use an extended version of the DyPDL model presented in Section 3.3.3 (see B.3.1) and a MIP model described in Qin et al. [149]. For CP, we extend the model used in Chu and Stuckey [150] with the 𝖠𝗅𝗅𝖣𝗂𝖿𝖿𝖾𝗋𝖾𝗇𝗍\mathsf{AllDifferent} global constraint [151], which is redundant but slightly improves the performance in practice, as described in B.3.2. In all models, a problem is simplified by preprocessing as described in Garcia de la Banda and Stuckey [28].

7.2.10 Minimization of Open Stacks Problem (MOSP)

We use 570 instances of the minimization of open stacks problem (MOSP) [152] from four sets: Constraint Modelling Challenge [153], SCOOP Project,131313https://cordis.europa.eu/project/id/32998 Faggioli and Bentivoglio [154], and Chu and Stuckey [155]. We use a DyPDL model based on a problem-specific algorithm [155] presented in B.4. The MIP and CP models are proposed by Martin et al. [18]. From their two MIP models, we select MOSP-ILP-I as it solves more instances optimally in their paper.

7.2.11 Graph-Clear

We generated 135 instances using planar and random graphs in the same way as Morin et al. [19], where the number of nodes in a graph is 20, 30, or 40. For planar instances, we use a planar graph generator [156] with the input parameter of 1000. We use the DyPDL model in Section 6.6 and MIP and CP models proposed by Morin et al. [19]. From the two proposed CP models, we select CPN as it solves more instances optimally.

7.3 Comparison with MIP and CP

We use Rust 1.70.0 for didp-rs and Python 3.10.2 for the Python scripts to convert instances to YAML-DyPDL files and the Python interfaces of Gurobi and CP Optimizer. All experiments are performed on an Intel Xeon Gold 6418 processor with a single thread, an 8 GB memory limit, and a 30-minute time limit using GNU Parallel [157].

7.3.1 Coverage

Table 1: Coverage (c.) and the number of instances where the memory limit is reached (m.) in each problem class. The coverage of a DIDP solver is in bold if it is higher than MIP and CP, and the higher of MIP and CP is in bold if there is no better DIDP solver. The highest coverage is underlined.
MIP CP CAASDy DFBnB CBFS ACPS APPS DBDFS CABS CABS/0
c. m. c. m. c. m. c. m. c. m. c. m. c. m. c. m. c. m. c. m.
TSPTW (340) 224 0 47 0 257 83 242 34 257 81 257 82 257 83 256 83 259 0 259 0
CVRP (207) 28 5 0 0 6 201 6 187 6 201 6 201 6 201 6 201 6 0 5 3
m-PDTSP (1178) 940 0 1049 0 952 226 985 193 988 190 988 190 988 190 987 191 1035 0 988 15
OPTW (144) 16 0 49 0 64 79 64 60 64 80 64 80 64 80 64 78 64 0 - -
MDKP (276) 168 0 6 0 4 272 4 272 5 271 5 271 5 271 4 272 5 1 - -
Bin Packing (1615) 1160 0 1234 0 922 632 526 1038 1115 431 1142 405 1037 520 426 1118 1167 4 242 14
SALBP-1 (2100) 1431 250 1584 0 1657 406 1629 470 1484 616 1626 474 1635 465 1404 696 1802 0 1204 1
1||wiTi1||\sum w_{i}T_{i} (375) 107 0 150 0 270 105 233 8 272 103 272 103 265 110 268 107 288 0 - -
Talent Scheduling (1000) 0 0 0 0 207 793 189 388 214 786 214 786 206 794 205 795 239 0 231 0
MOSP (570) 241 14 437 0 483 87 524 46 523 47 524 46 523 47 522 48 527 0 - -
Graph-Clear (135) 26 0 4 0 78 57 99 36 101 34 101 34 99 36 82 53 103 19 - -

Since all solvers are exact, we evaluate coverage, the number of instances where an optimal solution is found and its optimality is proved within time and memory limits. We include the number of instances where infeasibility is proved in coverage. We show the coverage of each method in each problem class in Table 1. In the table, if a DIDP solver has higher coverage than MIP and CP, it is emphasized in bold. If MIP or CP is better than all DIDP solvers, its coverage is in bold. The highest coverage is underlined. We explain CABS/0 in Tables 13 later in Section 7.5.

CAASDy, ACPS, APPS, and CABS outperform both MIP and CP in seven problem classes: TSPTW, OPTW, SALBP-1, 1||wiTi1||\sum w_{i}T_{i}, talent scheduling, MOSP, and graph-clear. In addition, the DIDP solvers except for CAASDy have higher coverage than MIP and CP in the class1 instances of m-PDTSP (145 (CABS) and 144 (others) vs. 128 (MIP and CP)). Comparing the DIDP solvers, CABS has the highest coverage in all problem classes. As shown, each DIDP solver except for CABS reaches the memory limit in most of the instances it is unable to solve while CABS rarely reaches the memory limit. This difference is possibly because CABS needs to store only states in the current and next layers, while other solvers need to store all generated and not dominated states in the open list.

MIP has the highest coverage in CVRP and MDKP, and CP in m-PDTSP and bin packing. MIP runs out of memory in some instances while CP never does. In particular, in the MIP model for SALBP-1, the number of decision variables and constraints is quadratic in the number of tasks in the worst case, and MIP reaches the memory limit in 250 instances with 1000 tasks.

7.3.2 Optimality Gap

Table 2: Average optimality gap in each problem class. The optimality gap of a DIDP solver is in bold if it is lower than MIP and CP, and the lower of MIP and CP is in bold if there is no better DIDP solver. The lowest optimality gap is underlined.
MIP CP CAASDy DFBnB CBFS ACPS APPS DBDFS CABS CABS/0
TSPTW (340) 0.2200 0.7175 0.2441 0.1598 0.1193 0.1194 0.1217 0.1408 0.1151 0.2085
CVRP (207) 0.8647 0.9868 0.9710 0.7484 0.7129 0.7123 0.7164 0.7492 0.6912 0.9111
m-PDTSP (1178) 0.1838 0.1095 0.2746 0.2097 0.1807 0.1807 0.1840 0.2016 0.1599 0.1878
OPTW (144) 0.6650 0.2890 0.5556 0.3583 0.2683 0.2683 0.2778 0.3359 0.2696 -
MDKP (276) 0.0008 0.4217 0.9855 0.4898 0.4745 0.4745 0.4742 0.4854 0.4676 -
Bin Packing (1615) 0.0438 0.0043 0.4291 0.0609 0.0083 0.0075 0.0105 0.0651 0.0049 0.7386
SALBP-1 (2100) 0.2712 0.0108 0.2100 0.0257 0.0115 0.0096 0.0094 0.0273 0.0057 0.3695
1||wiTi1||\sum w_{i}T_{i} (375) 0.4981 0.3709 0.2800 0.3781 0.2678 0.2679 0.2878 0.2845 0.2248 -
Talent Scheduling (1000) 0.8867 0.9509 0.7930 0.2368 0.1884 0.1884 0.2003 0.2462 0.1697 0.6667
MOSP (570) 0.3169 0.1931 0.1526 0.0713 0.0362 0.0359 0.0392 0.0655 0.0200 -
Graph-Clear (135) 0.4465 0.4560 0.4222 0.2359 0.0995 0.0996 0.1089 0.2636 0.0607 -

We also evaluate optimality gap, a relative difference between primal and dual bounds. The optimality gap measures how close a solver is to prove the optimality in instances that are note optimally solved. Let γ¯\overline{\gamma} be a primal bound and γ¯\underline{\gamma} be a dual bound found by a solver. We define the optimality gap, δ(γ¯,γ¯)\delta\left(\overline{\gamma},\underline{\gamma}\right), as follows:

δ(γ¯,γ¯)={0if γ¯=γ¯=0|γ¯γ¯|max{|γ¯|,|γ¯|}else.\delta\left(\overline{\gamma},\underline{\gamma}\right)=\begin{cases}0&\text{if }\overline{\gamma}=\underline{\gamma}=0\\ \frac{|\overline{\gamma}-\underline{\gamma}|}{\max\left\{|\overline{\gamma}|,|\underline{\gamma}|\right\}}&\text{else.}\\ \end{cases}

The optimality gap is 0 when the optimality is proved and positive otherwise. We also use 0 as the optimality gap when the infeasibility is proved. In the second line, when the signs of the primal and dual bounds are the same, since |γ¯γ¯|max{|γ¯|,|γ¯|}|\overline{\gamma}-\underline{\gamma}|\leq\max\{|\overline{\gamma}|,|\underline{\gamma}|\}, the optimality gap never exceeds 11. In practice, we observe that primal and dual bounds found are always nonnegative in our experiment. Therefore, we use 11 as the optimality gap when either a primal or dual bound is not found.

We show the average optimality gap in Table 2. Similar to Table 1, the optimality gap of a DIDP solver is in bold if it is better than MIP and CP, and the best value is underlined. ACPS, APPS, and CABS achieve a better optimality gap than MIP and CP in the seven problem classes where they have higher coverage. In addition, the DIDP solvers except for CAASDy outperform MIP and CP in CVRP, where MIP has the highest coverage; in large instances of CVRP, MIP fails to find feasible solutions, which results in a high average optimality gap. Comparing the DIDP solvers, CABS is the best in all problem classes except for OPTW, where CBFS and ACPS are marginally better. CAASDy is the worst among the DIDP solvers in all problem classes except for 1||wiTi1||\sum w_{i}T_{i}; similar to MIP in CVRP, CAASDy does not provide a primal bound for any unsolved instances except for eleven instances of m-PDTSP.

7.3.3 Primal Integral

Table 3: Average primal integral in each problem class. The primal integral of a DIDP solver is in bold if it is lower than MIP and CP, and the lower of MIP and CP is in bold if there is no better DIDP solver. The lowest primal integral is underlined.
MIP CP CAASDy DFBnB CBFS ACPS APPS DBDFS CABS CABS/0
TSPTW (340) 479.03 48.97 458.26 46.31 9.49 10.06 29.36 56.65 9.25 13.71
CVRP (207) 1127.55 482.89 1748.23 420.66 423.45 418.29 440.59 523.42 333.68 335.75
m-PDTSP (1178) 177.60 26.04 333.53 23.37 6.51 6.49 9.23 17.87 5.24 5.31
OPTW (144) 438.06 15.58 1018.23 175.49 54.05 54.29 74.37 139.64 57.95 -
MDKP (276) 0.65 15.86 1773.92 236.12 211.69 211.62 211.84 237.99 201.72 -
Bin Packing (1615) 88.07 8.05 778.60 104.46 9.98 8.40 13.82 111.85 5.04 11.56
SALBP-1 (2100) 538.79 28.43 383.35 35.59 10.83 7.28 6.74 38.80 1.92 19.42
1||wiTi1||\sum w_{i}T_{i} (375) 64.89 3.49 513.24 136.99 111.19 103.76 105.97 97.34 71.21 -
Talent Scheduling (1000) 106.10 18.91 1435.12 119.03 40.72 40.39 60.12 143.45 25.41 50.78
MOSP (570) 95.20 13.01 275.48 4.41 1.39 1.20 1.37 7.72 0.31 -
Graph-Clear (135) 334.87 83.49 764.00 4.63 0.70 0.74 3.45 87.90 0.37 -

To evaluate the performance of anytime solvers, we use the primal integral [158], which considers the balance between the solution quality and computational time. For an optimization problem, let σt\sigma^{t} be a solution found by a solver at time tt, σ\sigma^{*} be an optimal (or best-known) solution, and γ\gamma be a function that returns the solution cost. The primal gap function pp is

p(t)={0if γ(σt)=γ(σ)=01if no σt or γ(σt)γ(σ)<0|γ(σ)γ(σt)|max{|γ(σ)|,|γ(σt)|}else.p(t)=\begin{cases}0&\text{if }\gamma(\sigma^{t})=\gamma(\sigma^{*})=0\\ 1&\text{if no }\sigma^{t}\text{ or }\gamma(\sigma^{t})\gamma(\sigma^{*})<0\\ \frac{\left|\gamma(\sigma^{*})-\gamma(\sigma^{t})\right|}{\max\{|\gamma(\sigma^{*})|,|\gamma(\sigma^{t})|\}}&\text{else.}\end{cases}

The primal gap takes a value in [0,1][0,1], and lower is better. Let ti[0,T]t_{i}\in[0,T] for i=1,,l1i=1,...,l-1 be a time point when a new better solution is found by a solver, t0=0t_{0}=0, and tl=Tt_{l}=T. The primal integral is defined as P(T)=i=1lp(ti1)(titi1)P(T)=\sum_{i=1}^{l}p(t_{i-1})\cdot(t_{i}-t_{i-1}). It takes a value in [0,T][0,T], and lower is better. P(T)P(T) decreases if the same solution cost is achieved faster or a better solution is found with the same computational time. When an instance is proved to be infeasible at time tt, we use p(t)=0p(t)=0, so P(T)P(T) corresponds to the time to prove infeasibility. For TSPTW, CVRP, and 1||wiTi1||\sum w_{i}T_{i}, we use the best-known solutions provided with the instances to compute the primal gap. For other problems, we use the best solutions found by the solvers evaluated.

We show the average primal integral in Table 3. Similar to Tables 1 and 2, the primal integral of a DIDP solver is in bold if it is better than MIP and CP, and the best value is underlined. CBFS, ACPS, and APPS outperform MIP and CP in six problem classes (TSPTW, CVRP, m-PDTSP, SALBP-1, MOSP, and graph-clear). In addition to these problem classes, CABS achieves a better primal integral than CP in bin packing. Comparing the DIDP solvers, CABS is the best in all problem classes except for OPTW, where CBFS and ACPS are better. As mentioned above, CAASDy does not find feasible solutions for almost all unsolved instances, resulting in the worst primal integral in all problem classes.

In CVRP, while MIP solves more instances optimally, the DIDP solvers except for CAASDy achieve a better average primal integral since MIP fails to find primal bounds for large instances as mentioned. In m-PDTSP, the DIDP solvers except for CAASDy have a lower primal integral than CP, which has the highest coverage. In contrast, in OPTW, 1||wiTi1||\sum w_{i}T_{i}, and talent scheduling, where the DIDP solvers solve more instances, CP has a better primal integral.

7.4 Performance of DIDP Sovlers and Problem Characteristics

In CVRP, MIP solves more instances than the DIDP solvers even though the DyPDL model is similar to other routing problems (i.e., TSPTW, m-PDTSP, and OPTW) where the DIDP solvers are better. Similarly, in bin packing, CP solves more instances even though the DyPDL model is similar to that of SALBP-1, where CABS is the best. One common feature in the subset of the problem classes where DIDP is better than MIP or CP is a sequential dependency. In TSPTW and OPTW, a solution is a sequence of visited customers, the time when a customer is visited depends on the partial sequence of customers visited before, and time window constraints restrict possible sequences. Similarly, in m-PDTSP and SALBP-1, precedence constraints restrict possible sequences. In the DyPDL models, these constraints restrict possible paths in the state transition graphs and thus reduce the number of generated states. In contrast, in CVRP, as long as the capacity constraint is satisfied for each vehicle, customers can be visited in any order. In the DyPDL model for bin packing, while item ii must be packed in the ii-th or earlier bin and an arbitrary item is packed in a new bin by a forced transition, the remaining items can be packed in any order. We conjecture that this difference, whether a sequential dependency exists or not, may be important for the performance of the DIDP solvers observed in our experiments. However, sequential dependencies do not appear to be the only factor: DIDP also outperforms the other solvers in talent scheduling, MOSP, and Graph-Clear which do not exhibit such sequential dependencies. Detailed analysis of model characteristics that affect the performance of DIDP solvers is left for future work.

7.5 Evaluating the Importance of Dual Bound Functions

As we described above, our DIDP solvers use the dual bound function defined in a DyPDL model as an admissible heuristic function, which is used for both search guidance and state pruning. In 1||wiTi1||\sum w_{i}T_{i}, MOSP, and graph-clear, we use a trivial dual bound function, which always returns 0. Nevertheless, the DIDP solvers show better performance than MIP and CP. This result suggests that representing these problems using state transition systems provides a fundamental advantage on these problems while raising the question of the impact of non-trivial dual bound functions on problem solving performance. To investigate, we evaluate the performance of CABS with DyPDL models where the dual bound function is replaced with a function that always returns 0. In other words, beam search keeps the best bb states according to the gg-values and prunes a state SS if g(S)γ¯g(S)\geq\overline{\gamma} in minimization, where γ¯\overline{\gamma} is a primal bound. Since the zero dual bound function is not valid for OPTW and MDKP, where the DyPDL models maximize the nonnegative total profit, we use only TSPTW, CVRP, m-PDTSP, bin packing, SALBP-1, and talent scheduling. We call this configuration CABS/0 and show results in Tables 13. CABS/0 has a lower coverage than CABS in all problem classes except for TSPTW, where it achieves the same coverage. In terms of the optimality gap and primal integral, CABS is better than CABS/0 in all problem classes, and the difference is particularly large in bin packing and SALBP-1. The result confirms the importance of a non-trivial dual bound function for the current DIDP solvers.

7.6 Comparison with Other State-Based Approaches

We compare DIDP with domain-independent AI planning and Picat, a logic programming language that has an AI planning module. For domain-independent AI planning, we formulate numeric planning models for TSPTW, CVRP, m-PDTSP, bin packing, and SALBP-1 based on the DyPDL models using PDDL 2.1. We use NLM-CutPlan Orbit [159], the winner of the optimal numeric track in the International Planning Competition (IPC) 2023,141414https://ipc2023-numeric.github.io/ to solve the models. For MOSP, we use a PDDL model for classical planning that was previously used in IPC from 2006 to 2008 with Ragnarok [160], the winner of the optimal classical track in IPC 2023.151515https://ipc2023-classical.github.io/ In these PDDL models, we are not able to model redundant information represented by state constraints, resource variables, dual bound functions, and forced transitions. For Picat, we formulate models for TSPTW, CVRP, m-PDTSP, bin packing, SALBP-1, 1||wiTi1||\sum w_{i}T_{i}, and talent scheduling using the AI planning module with the best_plan_bb predicate, which performs a branch-and-bound algorithm. These models are equivalent to the DyPDL models except that they do not have resource variables. For OPTW, MDKP, MOSP, and graph-clear, we use tabling, a feature to cache the evaluation results of predicates, and the models do not include resource variables and dual bound functions. We provide more details for the PDDL and Picat models in C, and the implementations are available in our repository.161616https://github.com/Kurorororo/didp-models

For NLMCut-Plan and Ragnarok, a problem instance is translated to PDDL files by a Python script. For Picat, a problem instance of CVRP, m-PDTSP, OPTW, SALBP-1, 1||wiTi1||\sum w_{i}T_{i}, and talent scheduling is preprocessed and formatted by a Python script so that Picat can easily parse it. We use GCC 12.3 for NLM-CutPlan and Ragnarok, IBM ILOG CPLEX 22.1.1 as a linear programming solver for Ragnarok, and Picat 3.6.

Table 4: Coverage of MIP, CP, PDDL planners, Picat, and CABS. For PDDL, Ragnarok is used for MOSP, and NLM-CutPlan Orbit is used for the other problem classes. The coverage of a solver is in bold if it is higher than MIP and CP, and the higher of MIP and CP is in bold if there is no better solver. The highest coverage is underlined.
MIP CP PDDL Picat CABS
TSPTW (340) 224 47 61 210 259
CVRP (207) 28 0 1 6 6
m-PDTSP (1178) 940 1049 1031 804 1035
OPTW (144) 16 49 - 26 64
MDKP (276) 168 6 - 3 5
Bin Packing (1615) 1160 1234 18 895 1167
SALBP-1 (2100) 1431 1584 871 1590 1802
1||wiTi1||\sum w_{i}T_{i} (375) 107 150 - 199 288
Talent Scheduling (1000) 0 0 - 84 239
MOSP (570) 241 437 193 162 527
Graph-Clear (135) 26 4 - 45 103

Table 4 compares MIP, CP, the PDDL planners, Picat, and CABS. Since the PDDL planners and Picat return only an optimal solution, we evaluate only coverage for them. CABS has higher or equal coverage than the planners and Picat in all problem classes. This result is not surprising since the planners and the AI planning module and tabling in Picat are not designed for combinatorial optimization. It might be possible to improve the PDDL and Picat models so that they are more suited for these approaches. Moreover, different PDDL planners might be better for combinatorial optimization. However, our point is to show that the performance achieved by the DIDP solvers is not a trivial consequence of the state-based modeling approach, and DIDP is doing something that existing approaches are not able to easily do.

We also compare DIDP with ddo, a DD-based solver [58], using TSPTW and talent scheduling, for which previous work developed models for ddo [161, 162, 163, 164]. For TSPTW, while we minimize the total travel time, which does not include the waiting time, the model for ddo minimizes the makespan, which is the time spent until returning to the depot. Therefore, we adapt our DyPDL model to minimize the makespan: when visiting customer jj from the current location ii with time tt, we increase the cost by max{cij,ajt}\max\{c_{ij},a_{j}-t\} instead of cijc_{ij}. To avoid confusion, in what follows, we call TSPTW to minimize the makespan TSPTW-M.

Since defining a merging operator required by ddo is a non-trivial task, we do not evaluate ddo for problem classes other than TSPTW-M and talent scheduling. When two states are merged into one state, solutions for the original states must also be a solution for the merged state with the same or better cost. In the DP model for TSPTW-M, three state variables, the set of unvisited customers UU, the current location ii, and the current time tt, are used. When two states with the sets of unvisited customers UU and UU^{\prime} are merged, the set of customers that must be visited should be UUU\cap U^{\prime}, but the set of customers that may be visited should be UUU\cup U^{\prime}. Thus, in the model for ddo, UU is replaced with two state variables, one representing the set of customers that must be visited and another representing the set of customers that may be visited, and these two variables have the same values in a non-merged state. Similarly, the current locaton is represented by a set of locations that can be considered as the current location. As in this example, defining a merging operator requires a significant change in the DP model.

Table 5: Coverage and the average optimality gap of ddo and CABS in TSPTW-M and talent scheduling. For TSPTW-M, the optimality gap is not presented since ddo runs out of 8 GB memory in all unsolved instances and does not report intermediate solutions. For talent scheduling, the average optimality gap is computed from 976 instances where ddo does not reach the memory limit. A value of ddo or CABS is in bold if it is better than MIP and CP, and the better one of MIP and CP is in bold otherwise. The best value is underlined.
Coverage Optimality Gap
MIP CP ddo CABS MIP CP ddo CABS
TSPTW-M (340) 114 331 213 260 - -
Talent Scheduling (1000) 0 0 210 239 0.8871 0.9513 0.1424 0.1730

We evaluate ddo 2.0 using Rust 1.70.0 and present the result in Table 5. We use the ddo models for TSPTW-M and talent scheduling obtained from the published repository of ddo.171717https://github.com/xgillard/ddo/tree/b2e68bfc085af7cc09ece38cc9c81acb0da6e965/ddo/examples We also adapt the MIP and CP models for TSPTW to TSPTW-M and evaluate them. While ddo returns the best solution and dual bound found within the time limit, it does not return intermediate solutions and bounds during solving. Since we manage the memory limit using an external process, when ddo reaches the memory limit, it is killed without returning the best solution. In TSPTW-M, ddo reaches the memory limit in all unsolved instances. Therefore, we evaluate only coverage in TSPTW-M and present the average optimality gap computed from 976 out of 1000 talent scheduling instances where ddo does not reach the memory limit. CABS is competitive with ddo: it is better in TSPTW-M and talent scheduling in coverage, but ddo has a better average optimality gap in talent scheduling. Note that when we adapt TSPTW to minimize makespan, CP performance increases considerably from the coverage of 47 (see Table 1) to 331 instances. We conjecture that the improvement is a result of the strong back-propagation of the maximum objective function with specialized global constraints for scheduling [165].

8 Conclusion and Future Work

We proposed domain-independent dynamic programming (DIDP), a novel model-based paradigm for combinatorial optimization based on dynamic programming (DP). We introduced Dynamic Programming Description Language (DyPDL), a modeling formalism for DP, and YAML-DyPDL, a modeling language for DyPDL. We developed seven DIDP solvers using heuristic search algorithms and experimentally showed that DIDP outperforms mixed-integer programming and constraint programming in a number of combinatorial optimization problem classes. This result shows that DIDP is promising and complements existing model-based paradigms.

The significance of DIDP is that it is the first model-based paradigm designed for combinatorial optimization based on DP. DIDP is based on two different fields, artificial intelligence (AI) and operations research (OR). In particular, we focused on the state-based representations of problems, which are common in AI planning and DP but not previously exploited in a model-based paradigm for combinatorial optimization. In AI planning, PDDL, the state-based modeling language, is commonly used, and some combinatorial optimization problems such as the minimization of open stacks problem were modeled in PDDL and used in International Planning Competitions. However, PDDL and AI planners are not specifically designed for combinatorial optimization. In OR, DP and state space search methods were used in problem-specific settings, but little work has developed a model-based paradigm based on DP. DIDP bridges these gaps, benefitting from both AI and OR. Since DIDP has a state-based modeling formalism similar to AI planning, we can apply heuristic search algorithms studied in AI to various combinatorial optimization problems. Since DIDP follows the OR approach that allows a user to incorporate redundant information into optimization models, we can develop efficient models for application problems built upon problem-specific DP methods studied in OR.

DIDP opens up new research opportunities. As shown in the experimental result, state-based paradigms such as DIDP and constraint-based paradigms such as mixed-integer programming and constraint programming are suited to different problem classes. Even within state-based paradigms, DIDP and decision diagram-based solvers have different strengths. Analyzing the characteristics of problems that make DIDP superior to others is an interesting direction. DIDP also makes it possible to investigate better DP models, for example, by incorporating redundant information.

There is also a significant opportunity to improve DIDP. One of the most important directions is to develop better heuristic functions for the heuristic search solvers. The current solvers use a dual bound function provided in a DyPDL model for two roles: search guidance and state pruning. While we demonstrated the importance of a dual bound function in Section 7.5, using it for search guidance is not necessarily justified as discussed in Section 5.3; we may improve the anytime behavior by using an inadmissible heuristic function for search guidance. Disentangling search guidance from pruning and developing better functions for each role is one of our future plans. In particular, we are considering developing methods to automatically compute heuristic functions from a DyPDL model as studied in AI planning. In addition, decision diagrams (DDs) can be a source of heuristic functions; in the existing DD-based solver [57, 58], relaxed DDs, where multiple nodes are merged together to make the graph smaller, are used to compute a dual bound. To obtain a relaxed DD, the DD-based solver requires a user to provide a merging operator. Thus, for DIDP, there are two directions: developing a domain-independent merging operator for DyPDL and extending DyPDL so that a user can declaratively incorporate a merging operator into a model.

In contrast to applying AI techniques to DIDP, using DIDP for AI planning is also possible. In proving the undecidability of DyPDL, we showed that numeric planning tasks can be modeled in DyPDL. As this result implies that we can automatically transform a numeric planning task into a DyPDL model, investigating more efficient DyPDL models for each planning domain is an interesting direction for future work.

Appendix A Proof of Lemma 3

Proof.

Once γ¯γ^\overline{\gamma}\leq\hat{\gamma} holds, γ¯\overline{\gamma} never increases, so the lemma continues to hold. We only consider γ¯>γ^\overline{\gamma}>\hat{\gamma} in the current iteration and examine if the lemma will hold in the next iteration. Now, we further specify our assumption: when γ¯>γ^\overline{\gamma}>\hat{\gamma}, then OO contains a state S^\hat{S} such that an S^\hat{S}-solution σ^\hat{\sigma} exists, σ(S^);σ^\langle\sigma(\hat{S});\hat{\sigma}\rangle is a solution for the model with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);σ^)γ^\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}, and |σ^||σ||\hat{\sigma}|\leq|\sigma^{\prime}| for each SS^{\prime}-solution σ\sigma^{\prime} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);σ)γ^\mathsf{solution\_cost}(\langle\sigma(S^{\prime});\sigma^{\prime}\rangle)\leq\hat{\gamma} for each SGS^{\prime}\in G. Here, we denote the number of transitions in a solution σ\sigma as |σ||\sigma|. At the beginning, S^=S0O\hat{S}=S^{0}\in O, any solution is an extension of σ(S0)\sigma(S^{0}), and G={S0}G=\{S^{0}\}, so the assumption holds.

In line 7, SS is removed from OO. If SS is a base state, γ¯\overline{\gamma} and σ¯\overline{\sigma} can be updated. If γ¯\overline{\gamma} becomes less than or equal to γ^\hat{\gamma}, the assumption holds in the next iteration. Otherwise, γ¯>γ^\overline{\gamma}>\hat{\gamma}. Since there exists a solution extending σ(S^)\sigma(\hat{S}) with the cost at most γ^\hat{\gamma} by the assumption, SS^S\neq\hat{S}. By Lemma 2, g(S^)×η(S^)=wσ(S^)(S0)×η(S^)g(\hat{S})\times\eta(\hat{S})=w_{\sigma(\hat{S})}(S^{0})\times\eta(\hat{S}). By Definition 16, η(S^)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ^,S^)\eta(\hat{S})\leq\mathsf{solution\_cost}(\hat{\sigma},\hat{S}). Since AA is isotone, g(S^)×η(S^)wσ(S^)(S0)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ^,S^)γ^<γ¯g(\hat{S})\times\eta(\hat{S})\leq w_{\sigma(\hat{S})}(S^{0})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})\leq\hat{\gamma}<\overline{\gamma}. Thus, S^\hat{S} is not removed from OO in line 12.

If SS is not a base state, its successor states are generated in lines 1421. Since SS was included in OO, SGS\in G by lines 19 and 21. If there exists an SS-solution σ1\sigma^{1} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);σ1)γ^\mathsf{solution\_cost}(\langle\sigma(S);\sigma^{1}\rangle)\leq\hat{\gamma}, then |σ1||σ^||\sigma^{1}|\geq|\hat{\sigma}| by the assumption. We consider the following cases.

  1. 1.

    There does not exist an SS-solution σ1\sigma^{1} satisfying both 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);σ1)γ^\mathsf{solution\_cost}(\langle\sigma(S);\sigma^{1}\rangle)\leq\hat{\gamma} and |σ1||σ^||\sigma^{1}|\leq|\hat{\sigma}|.

  2. 2.

    There exists an SS-solution σ1\sigma^{1} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);σ1)γ^\mathsf{solution\_cost}(\langle\sigma(S);\sigma^{1}\rangle)\leq\hat{\gamma} and |σ1|=|σ^||\sigma^{1}|=|\hat{\sigma}|.

In the first case, SS^S\neq\hat{S}. For each successor state S[[τ]]S[\![\tau]\!], if there does not exist an S[[τ]]S[\![\tau]\!]-solution σ2\sigma^{2} such that 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);τ;σ2)γ^\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)\leq\hat{\gamma} holds, adding S[[τ]]S[\![\tau]\!] to OO in line 21 does not affect the assumption as long as S^G\hat{S}\in G. Suppose that there exists an S[[τ]]S[\![\tau]\!]-solution σ2\sigma^{2} such that 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);τ;σ2)γ^\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)\leq\hat{\gamma} holds. Then, since τ;σ2\langle\tau;\sigma^{2}\rangle is an SS-solution, |τ;σ2|>|σ^||\langle\tau;\sigma^{2}\rangle|>|\hat{\sigma}|, so |σ2||σ^||\sigma^{2}|\geq|\hat{\sigma}|. Again, as long as S^G\hat{S}\in G, adding S[[τ]]S[\![\tau]\!] to GG does not affect the assumption. Removing S^\hat{S} from GG in line 16 is possible only if S^aS[[τ]]\hat{S}\preceq_{a}S[\![\tau]\!] and gcurent=g(S)×wτ(S)g(S^)g_{\text{curent}}=g(S)\times w_{\tau}(S)\leq g(\hat{S}) in line 19. In such a case, since S^S[[τ]]\hat{S}\preceq S[\![\tau]\!], there exists an S[[τ]]S[\![\tau]\!]-solution σ2\sigma^{2} such that 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S[[τ]])𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ^,S^)\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq\mathsf{solution\_cost}(\hat{\sigma},\hat{S}) with |σ2||σ^||\sigma^{2}|\leq|\hat{\sigma}|. Since σ(S);τ;σ2\langle\sigma(S);\tau;\sigma^{2}\rangle is a solution for the model, by Lemma 2,

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);τ;σ2)=g(S)×wτ(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S[[τ]]).\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)=g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!]).

Since S[[τ]]S[\![\tau]\!] is reachable from S0S^{0} with σ(S);τ\langle\sigma(S);\tau\rangle, by Theorem 9,

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);τ;σ2)=g(S)×wτ(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S[[τ]])g(S^)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ^,S^)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S^);σ^)γ^.\begin{split}\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)&=g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq g(\hat{S})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})\\ &=\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}.\end{split}

Because |σ2||σ^||\sigma^{2}|\leq|\hat{\sigma}|, by considering S[[τ]]S[\![\tau]\!] as a new S^\hat{S}, the assumption will hold in the next iteration.

In the second case, first, we assume that no applicable forced transitions are identified. In the loop from line 14 to line 21, let S[[τ]]S[\![\tau]\!] be the first successor such that there exists an S[[τ]]S[\![\tau]\!]-solution σ2\sigma^{2} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);τ;σ2)γ^\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)\leq\hat{\gamma} and |τ;σ2|=|σ^||\langle\tau;\sigma^{2}\rangle|=|\hat{\sigma}|, which implies |σ2|=|σ^|1|\sigma^{2}|=|\hat{\sigma}|-1. Such a successor state exists since at least S[[σ11]]S[\![\sigma^{1}_{1}]\!] satisfies the condition, where σ11\sigma^{1}_{1} is the first transition of σ1\sigma^{1}. First, we show that S[[τ]]S[\![\tau]\!] is inserted into OO in line 21. Then, we prove that S[[τ]]S[\![\tau]\!] or another successor state replacing S[[τ]]S[\![\tau]\!] in line 19 can be considered a new S^\hat{S} in the next iteration. For a successor state S[[τ]]S[\![\tau^{\prime}]\!] considered before S[[τ]]S[\![\tau]\!], if there exists an S[[τ]]S[\![\tau^{\prime}]\!]-solution σ3\sigma^{3} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S[[τ]]);σ3)γ^\mathsf{solution\_cost}(\langle\sigma(S[\![\tau^{\prime}]\!]);\sigma^{3}\rangle)\leq\hat{\gamma}, then |τ;σ3|>|σ^||\langle\tau^{\prime};\sigma^{3}\rangle|>|\hat{\sigma}|, so |σ3||σ^|>|σ2||\sigma^{3}|\geq|\hat{\sigma}|>|{\sigma^{2}}|. Therefore, adding S[[τ]]S[\![\tau^{\prime}]\!] to GG does not affect the assumption. Suppose that S[[τ]]S[\![\tau]\!] is not added to OO due to line 16. Then, there exists a state SGS^{\prime}\in G such that S[[τ]]aSS[\![\tau]\!]\preceq_{a}S^{\prime} and gcurrent=g(S)×wτ(S)g(S)g_{\text{current}}=g(S)\times w_{\tau}(S)\geq g(S^{\prime}). Since S[[τ]]aSS[\![\tau]\!]\preceq_{a}S^{\prime}, there exists an SS^{\prime}-solution σ\sigma^{\prime} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ,S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S[[τ]])\mathsf{solution\_cost}(\sigma^{\prime},S^{\prime})\leq\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!]) and |σ||σ2||\sigma^{\prime}|\leq|\sigma^{2}|. However, by the assumption, |σ||σ^|>|σ2||\sigma^{\prime}|\geq|\hat{\sigma}|>|\sigma^{2}|, which is a contradiction. Therefore, there does not exist such SS^{\prime}, and the condition in line 16 is true. Next, we examine the condition in line 17. By Lemma 2,

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);τ;σ2)=g(S)×wτ(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S[[τ]])γ^.\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)=g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq\hat{\gamma}.

Since η(S[[τ]])𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S[[τ]])\eta(S[\![\tau]\!])\leq\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!]) and AA is isotone,

gcurrent×η(S[[τ]])=g(S)×wτ(S)×η(S[[τ]])g(S)×wτ(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S[[τ]])γ^<γ¯.g_{\text{current}}\times\eta(S[\![\tau]\!])=g(S)\times w_{\tau}(S)\times\eta(S[\![\tau]\!])\leq g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq\hat{\gamma}<\overline{\gamma}.

Therefore, the condition in line 17 is true, and S[[τ]]S[\![\tau]\!] is inserted into OO. For a successor state S[[τ]]S[\![\tau^{\prime}]\!] generated after S[[τ]]S[\![\tau]\!], suppose that there does not exist an S[[τ]]S[\![\tau^{\prime}]\!]-solution σ3\sigma^{3} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);τ;σ3)γ^\mathsf{solution\_cost}(\langle\sigma(S);\tau^{\prime};\sigma^{3}\rangle)\leq\hat{\gamma}. Adding S[[τ]]S[\![\tau^{\prime}]\!] to GG does not affect the assumption as long as S[[τ]]GS[\![\tau]\!]\in G. If there exists such σ3\sigma^{3}, then |σ3|=|σ^|1|\sigma^{3}|=|\hat{\sigma}|-1 or |σ3||σ^||\sigma^{3}|\geq|\hat{\sigma}| by the assumption. In the former case, adding S[[τ]]S[\![\tau^{\prime}]\!] to GG does not affect the assumption as long as S[[τ]]GS[\![\tau^{\prime}]\!]\in G since we can consider S[[τ]]S[\![\tau^{\prime}]\!] as a new S^\hat{S} in the next iteration. In the latter case, adding S[[τ]]S[\![\tau^{\prime}]\!] to GG does not affect the assumption as long as S[[τ]]GS[\![\tau]\!]\in G since |σ3|>|σ2||\sigma^{3}|>|\sigma^{2}|. The remaining problem is the possibility that S[[τ]]S[\![\tau]\!] is removed in line 19. If the condition in line 16 is true, then g(S)×wτ(S)g(S)×wτ(S)g(S)\times w_{\tau^{\prime}}(S)\leq g(S)\times w_{\tau}(S), and there exists an S[[τ]]S[\![\tau^{\prime}]\!]-solution σ3\sigma^{3} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ3,S[[τ]])𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S[[τ]])\mathsf{solution\_cost}(\sigma^{3},S[\![\tau^{\prime}]\!])\leq\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!]) and |σ3||σ2||\sigma^{3}|\leq|\sigma^{2}|. By Theorem 9,

𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);τ;σ3)=g(S)×wτ(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ3,S[[τ]])g(S)×wτ(S)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ2,S[[τ]])γ^.\begin{split}\mathsf{solution\_cost}(\langle\sigma(S);\tau^{\prime};\sigma^{3}\rangle)&=g(S)\times w_{\tau^{\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{3},S[\![\tau^{\prime}]\!])\\ &\leq g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq\hat{\gamma}.\end{split}

Therefore, if we consider S[[τ]]S[\![\tau^{\prime}]\!] as a new S^\hat{S} in the next iteration instead of S[[τ]]S[\![\tau]\!], the situation does not change. Similarly, if S[[τ]]S[\![\tau^{\prime}]\!] is replaced with another successor state, by considering it as a new S^\hat{S}, the situation does not change, and the assumption will hold in the next iteration.

When applicable forced transitions are identified, only one successor state S[[τ]]S[\![\tau]\!] is generated, where τ\tau is a forced transition, and there exists an SS-solution τ;σ2\langle\tau;\sigma^{2}\rangle with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(τ;σ2,S)𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ1,S)\mathsf{solution\_cost}(\langle\tau;\sigma^{2}\rangle,S)\leq\mathsf{solution\_cost}(\sigma^{1},S). Since AA is isotone,

wσ(S)(S0)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(τ;σ2,S)wσ(S)(S0)×𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ1,S)=𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);σ1)γ^.w_{\sigma(S)}(S^{0})\times\mathsf{solution\_cost}(\langle\tau;\sigma^{2}\rangle,S)\leq w_{\sigma(S)}(S^{0})\times\mathsf{solution\_cost}(\sigma^{1},S)=\mathsf{solution\_cost}(\langle\sigma(S);\sigma^{1}\rangle)\leq\hat{\gamma}.

If GG contains a state SS^{\prime} such that there exists an SS^{\prime}-solution σ\sigma^{\prime} with 𝗌𝗈𝗅𝗎𝗍𝗂𝗈𝗇_𝖼𝗈𝗌𝗍(σ(S);σ)γ^\mathsf{solution\_cost}(\langle\sigma(S^{\prime});\sigma^{\prime}\rangle)\leq\hat{\gamma} and |σ||σ2||\sigma^{\prime}|\leq|\sigma^{2}|, then, by considering the one minimizing |σ||\sigma^{\prime}| as a new S^\hat{S}, the condition will hold in the next iteration. If GG does not contain such a state, then S[[τ]]S[\![\tau]\!] can be considered a new S^\hat{S} if it is inserted into OO. By a similar argument as the previous paragraph, we can prove that S[[τ]]S[\![\tau]\!] is inserted into OO and GG. ∎

Appendix B Additional Problem and Model Definitions

We present problem definitions and DyPDL models that are not covered in Section 6 and new CP models used in the experimental evaluation.

B.1 Multi-Dimensional Knapsack Problem (MDKP)

The multi-dimensional knapsack problem (MDKP) [116, 117] is a generalization of the 0-1 knapsack problem. In this problem, each item i{0,,n1}i\in\{0,...,n-1\} has the integer profit pi0p_{i}\geq 0 and mm-dimensional nonnegative weights (wi,0,,wi,m1)(w_{i,0},...,w_{i,m-1}), and the knapsack has the mm-dimensional capacities (q0,,qm1)(q_{0},...,q_{m-1}). In each dimension, the total weight of items included in the knapsack must not exceed the capacity. The objective is to maximize the total profit. MDKP is strongly NP-hard [119, 138]

B.1.1 DyPDL Model for MDKP

In our DyPDL model, we decide whether to include an item one by one. An element variable ii represents the index of the item considered currently, and a numeric variable rjr_{j} represents the remaining space in the jj-th dimension. We can use the total profit of the remaining items as a dual bound function. In addition, we consider an upper bound similar to that of OPTW by ignoring dimensions other than jj. Let ekj=pk/wkje_{kj}=p_{k}/w_{kj} be the efficiency of item kk in dimension jj. Then, rjmaxk=i,,n1ekj\lfloor r_{j}\max_{k=i,...,n-1}e_{kj}\rfloor is an upper bound on the cost of a (i,rj)(i,r_{j})-solution. If wkj=0w_{kj}=0, we define ekj=k=i,,n1pke_{kj}=\sum_{k=i,...,n-1}p_{k}, i.e., the maximum additional profit achieved from (i,rj)(i,r_{j}). In such a case, max{rj,1}maxk=1,,n1eij\max\{r_{j},1\}\cdot\max_{k=1,...,n-1}e_{ij} is still a valid upper bound.

compute V(0,q0,,qm1)\displaystyle\text{compute }V(0,q_{0},...,q_{m-1}) (41)
V(i,r0,,rm1)={0if i=nmax{pi+V(i+1,r0wi0,,rm1wi,m1)V(i+1,r0,,rm1)else if jM,wijrijV(i+1,r0,,rm1)else\displaystyle V(i,r_{0},...,r_{m-1})=\begin{cases}0&\text{if }i=n\\ \max\left\{\begin{array}[]{l}p_{i}+V(i+1,r_{0}-w_{i0},...,r_{m-1}-w_{i,m-1})\\ V(i+1,r_{0},...,r_{m-1})\end{array}\right.&\text{else if }\forall j\in M,w_{ij}\leq r_{ij}\\ V(i+1,r_{0},...,r_{m-1})&\text{else}\end{cases} (42)
V(i,r0,,rm1)min{k=i,,n1pk,minjMmax{rj,1}maxk=i,,n1ekj}\displaystyle V(i,r_{0},...,r_{m-1})\leq\min\left\{\sum_{k=i,...,n-1}p_{k},\min\limits_{j\in M}\left\lfloor\max\{r_{j},1\}\cdot\max\limits_{k=i,...,n-1}e_{kj}\right\rfloor\right\} (43)

where M={0,,m1}M=\{0,...,m-1\}. Similar to OPTW, this model is monoidal, and the isotonicity is satisfied, but it is not cost-algebraic, so the first solution found by CAASDy may not be optimal (Theorem 15).

B.1.2 CP Model for MDKP

We use the 𝖯𝖺𝖼𝗄\mathsf{Pack} global constraint [139] and consider packing all items into two bins; one represents the knapsack, and the other represents items not selected. We introduce a binary variable xix_{i} representing the bin where item ii is packed (xi=0x_{i}=0 represents the item is in the knapsack). We define an integer variable yj,0y_{j,0} representing the total weight of the items in the knapsack in dimension jj of the knapsack and yj,1y_{j,1} representing the total weight of the items not selected.

max\displaystyle\max iNpi(1xi)\displaystyle\sum_{i\in N}p_{i}(1-x_{i}) (44)
s.t. 𝖯𝖺𝖼𝗄({yj,0,yj,1},{xiiN},{wijiN})\displaystyle\mathsf{Pack}(\{y_{j,0},y_{j,1}\},\{x_{i}\mid i\in N\},\{w_{ij}\mid i\in N\}) j=0,,m1\displaystyle j=0,...,m-1 (45)
yj,0qj\displaystyle y_{j,0}\leq q_{j} j=0,,m1\displaystyle j=0,...,m-1 (46)
yj,0,yj,10+\displaystyle y_{j,0},y_{j,1}\in\mathbb{Z}_{0}^{+} j=0,,m1\displaystyle j=0,...,m-1 (47)
xi{0,1}\displaystyle x_{i}\in\{0,1\} iN.\displaystyle\forall i\in N. (48)

B.2 Single Machine Total Weighted Tardiness (1||wiTi1||\sum w_{i}T_{i})

In single machine scheduling to minimize the total weighted tardiness (1||wiTi1||\sum w_{i}T_{i}) [147], a set of jobs NN is given, and each job iNi\in N has a processing time pip_{i}, a deadline did_{i}, and an weight wiw_{i}, all of which are nonnegative. The objective is to schedule all jobs on a machine while minimizing the sum of the weighted tardiness, iNwimax{0,Cidi}\sum_{i\in N}w_{i}\max\{0,C_{i}-d_{i}\} where CiC_{i} is the completion time of job ii. This problem is strongly NP-hard [166].

B.2.1 DyPDL Model for 1||wiTi1||\sum w_{i}T_{i}

We formulate a DyPDL model based on an existing DP model [22, 167], where one job is scheduled at each step. Let FF be a set variable representing the set of scheduled jobs. A numeric expression T(i,F)=max{0,jFpj+pidi}T(i,F)=\max\{0,\sum_{j\in F}p_{j}+p_{i}-d_{i}\} represents the tardiness of ii when it is scheduled after FF. We introduce a set PiP_{i}, representing the set of jobs that can be scheduled before ii without losing optimality. While PiP_{i} is redundant information not defined in the problem, it can be extracted in preprocessing using precedence theorems [147].

compute V()\displaystyle\text{compute }V(\emptyset) (49)
V(F)={0if F=NminiNF:PiF=wiT(i,F)+V(F{i})else\displaystyle V(F)=\begin{cases}0&\text{if }F=N\\ \min\limits_{i\in N\setminus F:P_{i}\setminus F=\emptyset}w_{i}T(i,F)+V(F\cup\{i\})&\text{else}\end{cases} (50)
V(F)0.\displaystyle V(F)\geq 0. (51)

This model is cost-algebraic with a cost algebra 0+,+,0\langle\mathbb{Q}_{0}^{+},+,0\rangle since wiT(i,F)0w_{i}T(i,F)\geq 0. Since the base cost is always zero, the first solution found by CAASDy is optimal.

B.2.2 CP Model for 1||wiTi1||\sum w_{i}T_{i}

We use an interval variable xix_{i} with the duration pip_{i} and the time within [0,jNpj][0,\sum_{j\in N}p_{j}], representing the interval of time when job ii is processed.

min\displaystyle\min iNwimax{𝖤𝗇𝖽𝖮𝖿(xi)di,0}\displaystyle\sum_{i\in N}w_{i}\max\{\mathsf{EndOf}(x_{i})-d_{i},0\} (52)
s.t. 𝖭𝗈𝖮𝗏𝖾𝗋𝗅𝖺𝗉(π)\displaystyle\mathsf{NoOverlap}(\pi) (53)
𝖡𝖾𝖿𝗈𝗋𝖾(π,xi,xj)\displaystyle\mathsf{Before}(\pi,x_{i},x_{j}) jN,iPj\displaystyle\forall j\in N,\forall i\in P_{j} (54)
xi:𝗂𝗇𝗍𝖾𝗋𝗏𝖺𝗅𝖵𝖺𝗋(pi,[0,jNpj])\displaystyle x_{i}:\mathsf{intervalVar}\left(p_{i},\left[0,\sum_{j\in N}p_{j}\right]\right) iN\displaystyle\forall i\in N (55)
π:𝗌𝖾𝗊𝗎𝖾𝗇𝖼𝖾𝖵𝖺𝗋({x0,,xn1}).\displaystyle\pi:\mathsf{sequenceVar}(\{x_{0},...,x_{n-1}\}). (56)

B.3 DyPDL Model for Talent Scheduling

In talent scheduling, we are given a set of scenes N={0,,n1}N=\{0,...,n-1\}, a set of actors A={0,,m1}A=\{0,...,m-1\}, the set of actors AsAA_{s}\subseteq A playing in scene ss, and the set of scenes NaNN_{a}\subseteq N where actor aa plays. In addition, dsd_{s} is the duration of a scene ss, and cac_{a} is the cost of an actor aa per day.

B.3.1 DyPDL Model for Talent Scheduling

We use a set variable QQ representing the set of scenes that are not yet shot and a set expression L(Q)=sQAssNQAsL(Q)=\bigcup_{s\in Q}A_{s}\cap\bigcup_{s\in N\setminus Q}A_{s} to represent the set of actors on location after shooting NQN\setminus Q. If As=L(Q)A_{s}=L(Q), then ss should be immediately shot because all actors are already on location: a forced transition. For the dual bound function, we underestimate the cost to shoot ss by bs=aAscab_{s}=\sum_{a\in A_{s}}c_{a}.

If there exist two scenes s1s_{1} and s2s_{2} in QQ such that As1As2A_{s_{1}}\subseteq A_{s_{2}} and As2sNQAsAs1A_{s_{2}}\subseteq\bigcup_{s\in N\setminus Q}A_{s}\cup A_{s_{1}}, it is known that scheduling s2s_{2} before s1s_{1} is always better, denoted by s2s1s_{2}\preceq s_{1}. Since two scenes with the same set of actors are merged into a single scene in preprocessing without losing optimality, we can assume that all AsA_{s} are different. With this assumption, the relationship is a partial order: it is reflexive because As1As1A_{s_{1}}\subseteq A_{s_{1}} and As1sNQAsAs1A_{s_{1}}\subseteq\bigcup_{s\in N\setminus Q}A_{s}\cup A_{s_{1}}; it is antisymmetric because if s1s2s_{1}\preceq s_{2} and s2s1s_{2}\preceq s_{1}, then As1As2A_{s_{1}}\subseteq A_{s_{2}} and As2As1A_{s_{2}}\subseteq A_{s_{1}}, which imply s1=s2s_{1}=s_{2}; it is transitive because if s2s1s_{2}\preceq s_{1} and s3s2s_{3}\preceq s_{2}, then As1As2As3A_{s_{1}}\subseteq A_{s_{2}}\subseteq A_{s_{3}} and As3sNQAsAs2sNQAsAs1A_{s_{3}}\subseteq\bigcup_{s\in N\setminus Q}A_{s}\cup A_{s_{2}}\subseteq\bigcup_{s\in N\setminus Q}A_{s}\cup A_{s_{1}}, which imply s3s1s_{3}\preceq s_{1}. Therefore, the set of candidate scenes to shoot next, R(Q)={s1Qs2Q{s1},s2s1}R(Q)=\{s_{1}\in Q\mid\not\exists s_{2}\in Q\setminus\{s_{1}\},s_{2}\preceq s_{1}\}, is not empty.

V(Q)={0if Q=bs+V(Q{s})else if sQ,As=L(Q)minsR(Q)dsaAsL(Q)ca+V(Q{s})else if Q\displaystyle V(Q)=\begin{cases}0&\text{if }Q=\emptyset\\ b_{s}+V(Q\setminus\{s\})&\text{else if }\exists s\in Q,A_{s}=L(Q)\\ \min\limits_{s\in R(Q)}d_{s}\sum_{a\in A_{s}\cup L(Q)}c_{a}+V(Q\setminus\{s\})&\text{else if }Q\neq\emptyset\end{cases} (57)
V(Q)sQbs.\displaystyle V(Q)\geq\sum_{s\in Q}b_{s}. (58)

B.3.2 CP Model for Talent Scheduling

We extend the model used by Chu and Stuckey [150], which was originally implemented with MiniZinc [168], with the 𝖠𝗅𝗅𝖣𝗂𝖿𝖿𝖾𝗋𝖾𝗇𝗍\mathsf{AllDifferent} global constraint. While 𝖠𝗅𝗅𝖣𝗂𝖿𝖿𝖾𝗋𝖾𝗇𝗍\mathsf{AllDifferent} is redundant in the model, it slightly improved the performance in our preliminary experiment. Let xix_{i} be a variable representing the ii-th scene in the schedule, bsib_{si} be a variable representing if scene ss is shot before the ii-th scene, oaio_{ai} be a variable representing if any scene in NaN_{a} is shot by the ii-th scene, and faif_{ai} be a variable representing if all scenes in NaN_{a} finish before the ii-th scene. The CP model is

min\displaystyle\min iNdxiaAcaoai(1fai)\displaystyle\sum_{i\in N}d_{x_{i}}\sum_{a\in A}c_{a}o_{ai}(1-f_{ai}) (59)
s.t. 𝖠𝗅𝗅𝖣𝗂𝖿𝖿𝖾𝗋𝖾𝗇𝗍({xiiN})\displaystyle\mathsf{AllDifferent}(\{x_{i}\mid i\in N\}) (60)
bs0=0\displaystyle b_{s0}=0 sN\displaystyle\forall s\in N (61)
bsi=bs,i1+𝟙(xi1=s)\displaystyle b_{si}=b_{s,i-1}+\mathbbm{1}(x_{i-1}=s) iN{0},sN\displaystyle\forall i\in N\setminus\{0\},\forall s\in N (62)
bsi=1xis\displaystyle b_{si}=1\rightarrow x_{i}\neq s iN{0},sN\displaystyle\forall i\in N\setminus\{0\},\forall s\in N (63)
oa0=𝟙(sNax0=s)\displaystyle o_{a0}=\mathbbm{1}\left(\bigvee_{s\in N_{a}}x_{0}=s\right) aA\displaystyle\forall a\in A (64)
oai=𝟙(oa,i1=1sNaxi=s)\displaystyle o_{ai}=\mathbbm{1}\left(o_{a,i-1}=1\lor\bigvee_{s\in N_{a}}x_{i}=s\right) iN{0},aA\displaystyle\forall i\in N\setminus\{0\},\forall a\in A (65)
fai=sNabsi\displaystyle f_{ai}=\prod_{s\in N_{a}}b_{si} iN,aA\displaystyle\forall i\in N,a\in A (66)
xiN\displaystyle x_{i}\in N iN\displaystyle\forall i\in N (67)
bsi,fsi{0,1}\displaystyle b_{si},f_{si}\in\{0,1\} s,iN.\displaystyle\forall s,i\in N. (68)

B.4 DyPDL Model for Minimization of Open Stacks Problem (MOSP)

In the minimization of open stacks problem (MOSP) [152], customers CC and products PP are given, and each customer cc orders products PcPP_{c}\subseteq P. A solution is a sequence in which products are produced. When producing product ii, a stack for customer cc with iPci\in P_{c} is opened, and it is closed when all of PcP_{c} are produced. The objective is to minimize the maximum number of open stacks at a time. MOSP is strongly NP-hard [169].

For MOSP, customer search is a state-of-the-art exact method [155]. It searches for an order of customers to close stacks, from which the order of products is determined; for each customer cc, all products ordered by cc and not yet produced are consecutively produced in an arbitrary order. We formulate customer search as a DyPDL model. A set variable RR represents customers whose stacks are not closed, and OO represents customers whose stacks have been opened. Let Nc={cCPcPc}N_{c}=\{c^{\prime}\in C\mid P_{c}\cap P_{c^{\prime}}\neq\emptyset\} be the set of customers that order at least one of the same products as customer cc. When producing items for customer cc, we need to open stacks for customers in NcON_{c}\setminus O, and stacks for customers in ORO\cap R remain open.

compute V(C,)\displaystyle\text{compute }V(C,\emptyset) (69)
V(R,O)={0if R=mincRmax{|(OR)(NcO)|,V(R{c},ONc)}else\displaystyle V(R,O)=\begin{cases}0&\text{if }R=\emptyset\\ \min\limits_{c\in R}\max\left\{|(O\cap R)\cup(N_{c}\setminus O)|,V(R\setminus\{c\},O\cup N_{c})\right\}&\text{else}\end{cases} (70)
V(R,O)0.\displaystyle V(R,O)\geq 0. (71)

Similar to the DyPDL model for graph-clear in Section 6.6, this model is cost-algebraic, and the base cost is zero, so the first solution found by CAASDy is optimal.

B.5 CP Model for Orienteering Problem with Time Windows (OPTW)

In the orienteering problem with time windows (OPTW), we are given a set of customers N={0,,n1}N=\{0,...,n-1\}, the travel time cijc_{ij} from customer ii to jj, and the profit pip_{i} of customer ii. We define an optional interval variable xix_{i} that represents visiting customer ii in [ai,bi][a_{i},b_{i}]. We also introduce an interval variable xnx_{n} that represents returning to the depot (0) and define cin=ci0c_{in}=c_{i0} and cni=c0ic_{ni}=c_{0i} for each iNi\in N. We use a sequence variable π\pi to sequence the interval variables.

max\displaystyle\max iN{0}pi𝖯𝗋𝖾𝗌(xi)\displaystyle\sum_{i\in N\setminus\{0\}}p_{i}\mathsf{Pres}(x_{i}) (72)
s.t. 𝖭𝗈𝖮𝗏𝖾𝗋𝗅𝖺𝗉(π,{cij(i,j)(N{n})×(N{n})})\displaystyle\mathsf{NoOverlap}(\pi,\{c_{ij}\mid(i,j)\in(N\cup\{n\})\times(N\cup\{n\})\}) (73)
𝖥𝗂𝗋𝗌𝗍(π,x0)\displaystyle\mathsf{First}(\pi,x_{0}) (74)
𝖫𝖺𝗌𝗍(π,xn)\displaystyle\mathsf{Last}(\pi,x_{n}) (75)
xi:𝗈𝗉𝗍𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅𝖵𝖺𝗋(0,[ai,bi])\displaystyle x_{i}:\mathsf{optIntervalVar}(0,[a_{i},b_{i}]) iN{0}\displaystyle\forall i\in N\setminus\{0\} (76)
x0:𝗂𝗇𝗍𝖾𝗋𝗏𝖺𝗅𝖵𝖺𝗋(0,[0,0])\displaystyle x_{0}:\mathsf{intervalVar}(0,[0,0]) (77)
xn:𝗂𝗇𝗍𝖾𝗋𝗏𝖺𝗅𝖵𝖺𝗋(0,[0,b0])\displaystyle x_{n}:\mathsf{intervalVar}(0,[0,b_{0}]) (78)
π:𝗌𝖾𝗊𝗎𝖾𝗇𝖼𝖾𝖵𝖺𝗋({x0,,xn}).\displaystyle\pi:\mathsf{sequenceVar}(\{x_{0},...,x_{n}\}). (79)

Objective (72) maximizes the total profit, where 𝖯𝗋𝖾𝗌(xi)=1\mathsf{Pres}(x_{i})=1 if the optional interval variable xix_{i} is present. Constraint (73) ensures that if xjx_{j} is present in π\pi after xix_{i}, the distance between them is at least cijc_{ij}. Constraints (74) and (75) ensure that the tour starts from and returns to the depot.

B.6 CP Model for Bin Packing

In the bin packing problem, we are given a set of items N={0,,n1}N=\{0,...,n-1\}, the weight wiw_{i} of each item iNi\in N, and the capacity of a bin qq. We compute the upper bound m¯\bar{m} on the number of bins using the first-fit decreasing heuristic and use M={0,,m^1}M=\{0,...,\hat{m}-1\}. We use 𝖯𝖺𝖼𝗄\mathsf{Pack} and ensure that item ii is packed in the ii-th or an earlier bin.

min\displaystyle\min maxiNxi+1\displaystyle\max_{i\in N}x_{i}+1 (80)
s.t. 𝖯𝖺𝖼𝗄({yjjM},{xiiN},{wiiN})\displaystyle\mathsf{Pack}(\{y_{j}\mid j\in M\},\{x_{i}\mid i\in N\},\{w_{i}\mid i\in N\}) (81)
0yjq\displaystyle 0\leq y_{j}\leq q jM\displaystyle\forall j\in M (82)
0xii\displaystyle 0\leq x_{i}\leq i iN\displaystyle\forall i\in N (83)
yj\displaystyle y_{j}\in\mathbb{Z} jM\displaystyle\forall j\in M (84)
xi\displaystyle x_{i}\in\mathbb{Z} iN.\displaystyle\forall i\in N. (85)

B.7 CP Model for Simple Assembly Line Balancing Problem (SALBP-1)

In SALBP-1, in addition to tasks and the capacity of a station, we are given PiP_{i}, the set of the predecessors of task ii. We implement the CP model proposed by Bukchin and Raviv [16] with the addition of 𝖯𝖺𝖼𝗄\mathsf{Pack}. For an upper bound on the number of stations, instead of using a heuristic to compute it, we use m¯=min{n,2iNwi/q}\bar{m}=\min\{n,2\lceil\sum_{i\in N}w_{i}/q\rceil\} following the MIP model [15]. Let mm be the number of stations, xix_{i} be the index of the station of task ii, and yjy_{j} be the sum of the weights of tasks scheduled in station jj. The set of all direct and indirect predecessors of task ii is P~i={jNjPikP~j,jP~k}\tilde{P}_{i}=\{j\in N\mid j\in P_{i}\lor\exists k\in\tilde{P}_{j},j\in\tilde{P}_{k}\}. The set of all direct and indirect successors of task ii is S~i={jNiPjkS~i,jS~k}\tilde{S}_{i}=\{j\in N\mid i\in P_{j}\lor\exists k\in\tilde{S}_{i},j\in\tilde{S}_{k}\}. Thus, ei=wi+kP~itkqe_{i}=\left\lceil\frac{w_{i}+\sum_{k\in\tilde{P}_{i}}t_{k}}{q}\right\rceil is a lower bound on the number of stations required to schedule task ii, li=wi1+kS~itkql_{i}=\left\lfloor\frac{w_{i}-1+\sum_{k\in\tilde{S}_{i}}t_{k}}{q}\right\rfloor is a lower bound on the number of stations between the station of task ii and the last station, and dij=wi+tj1+kS~iP~jtkqd_{ij}=\left\lfloor\frac{w_{i}+t_{j}-1+\sum_{k\in\tilde{S}_{i}\cap\tilde{P}_{j}}t_{k}}{q}\right\rfloor is a lower bound on the number of stations between the stations of tasks ii and jj.

min\displaystyle\min\ m\displaystyle m (86)
s.t. 𝖯𝖺𝖼𝗄({yjjM},{xiiN},{wiiN})\displaystyle\mathsf{Pack}(\{y_{j}\mid j\in M\},\{x_{i}\mid i\in N\},\{w_{i}\mid i\in N\}) (87)
0yjq\displaystyle 0\leq y_{j}\leq q jM\displaystyle\forall j\in M (88)
ei1xim1li\displaystyle e_{i}-1\leq x_{i}\leq m-1-l_{i} iN\displaystyle\forall i\in N (89)
xi+dijxj\displaystyle x_{i}+d_{ij}\leq x_{j} jN,iP~j,kS~iP~j:dijdik+dkj\displaystyle\forall j\in N,\forall i\in\tilde{P}_{j},\not\exists k\in\tilde{S}_{i}\cap\tilde{P}_{j}:d_{ij}\leq d_{ik}+d_{kj} (90)
m\displaystyle m\in\mathbb{Z} (91)
yj\displaystyle y_{j}\in\mathbb{Z} jM\displaystyle\forall j\in M (92)
xi\displaystyle x_{i}\in\mathbb{Z} iN.\displaystyle\forall i\in N. (93)

Constraint (89) states the lower and upper bounds on the index of the station of ii. Constraint (90) is an enhanced version of the precedence constraint using dijd_{ij}.

Appendix C Modeling in AI Planning

In Section 7.6, we compare DIDP with AI planning approaches: numeric planning using planning domain definition languages (PDDL) [86, 64] and Picat, a logic programming language that has an AI planning module. We explain how we formulate models for these approaches.

C.1 PDDL

We model TSPTW, CVRP, m-PDTSP, bin packing, and SALBP-1 as linear numeric planning tasks [170], where preconditions and effects of actions are represented by linear formulas of numeric state variables. In these models, the objective is to minimize the sum of nonnegative action costs, which is a standard in optimal numeric planning [171, 172, 173, 174, 175, 176, 177, 178, 179]. We use PDDL 2.1 [64] to formulate the models and NLM-CutPlan Orbit [159], the winner of the optimal numeric track of the International Planning Competition (IPC) 2023,181818https://ipc2023-numeric.github.io/ to solve the models. The PDDL models are adaptations of the DyPDL models presented in Section 6, where a numeric or element variable in DyPDL becomes a numeric variable in PDDL (except for the element variable representing the current location in the routing problems as explained in the next paragraph), a set variable is represented by a predicate and a set of objects, the target state becomes the initial state, base cases become the goal conditions, and a transition becomes an action. However, we are unable to model dominance between states and dual bound functions in PDDL. In addition, we cannot differentiate forced transitions and other transitions. We explain other differences between the PDDL models and the DyPDL models in the following paragraphs.

In the DyPDL models of the routing problems (TSPTW, CVRP, and m-PDTSP), each transition visits one customer and increases the cost by the travel time from the current location to the customer. Since a goal state in PDDL is not associated with a cost unlike a base case in DyPDL, we also define an action to return to the depot, which increases the cost by the travel time to the depot. While the travel time depends on the current location, for NLM-CutPlan Orbit, the cost of an action must be a nonnegative constant independent of a state, which is a standard in admissible heuristic functions for numeric planning [171, 173, 174, 176, 177, 179]. Thus, in the PDDL models, we define one action with two parameters, the current location (?from) and the destination (?to), so that the cost of each grounded action becomes a state-independent constant (c ?from ?to) corresponding to the travel time. We define a predicate (visited ?customer) representing if a customer is visited and (location ?customer) representing the current location. In each action, we use (location ?from) and (not (visited ?to)) as preconditions and (not (location ?from)), (location ?to), and (visited ?to) as effects.

In the DyPDL models of TSPTW, each transition updates the current time tt to max{t+cij,aj}\max\{t+c_{ij},a_{j}\}, where cijc_{ij} is the travel time and aja_{j} is the beginning of the time window at the destination. While this effect can be written as (assign (t) (max (+ (t) (c ?from ?to)) (a ?to))) in PDDL, to represent effects as linear formulas, we introduce two actions: one with a precondition (>=>= (+ (t) (c ?from ?to)) (a ?to)) and an effect (increase (t) (c ?from ?to)), corresponding to tt+cijt\leftarrow t+c_{ij} if t+cijajt+c_{ij}\geq a_{j}, and another with a precondition (<< (+ (t) (c ?from ?to)) (a ?to)) and an effect (assign (t) (a ?to)), corresponding to tajt\leftarrow a_{j} if t+cij<ajt+c_{ij}<a_{j}.

In the DyPDL models of TSPTW and CVRP, we have redundant state constraints. While a state constraint could be modeled by introducing it as a precondition of each action, we do not use the state constraints in the PDDL models of TSPTW and CVRP because efficiently modeling them is non-trivial: straightforward approaches result in an exponential number of actions. For TSPTW, the state constraint checks if all unvisited customers can be visited by the deadline, represented as jU,t+cijbj\forall j\in U,t+c^{*}_{ij}\leq b_{j}, where UU is the set of the unvisited customers, cijc^{*}_{ij} is the shortest travel time from the current location to customer jj, and bjb_{j} is the deadline. One possible way to model this constraint is to define a disjunctive precondition (or (visited ?j) (<<= (+ (t) (cstar ?from ?j)) (b ?j))) for each customer ?j, where (t) is a numeric variable corresponding to tt, (cstar ?from ?j) is a numeric constant corresponding to cijc^{*}_{ij}, and (b ?j) is a numeric constant corresponding to bjb_{j}. However, the heuristic function used by NLM-CutPlan Orbit does not support disjunctive preconditions, and NLM-CutPlan Orbit compiles an action with disjunctive preconditions into a set of actions with different combinations of the preconditions.191919This approach is inherited from Fast Downward [36], the standard classical planning framework on which NLM-CutPlan Orbit is based. In our case, each action has one of the two preconditions, (visited ?j) or (<=<= (+ (t) (cstar ?from ?j)) (b ?j)), resulting in 2n2^{n} actions in total, where nn is the number of customers. In CVRP, the state constraint takes the sum of demands over all unvisited customers. To model this computation independently of a state, we need to define an action for each possible set of unvisited customers, resulting in 2n2^{n} actions in total.

In the DyPDL models of bin packing and SALBP-1, each transition packs an item in the current bin (schedules a task in the current station for SALBP-1) or opens a new bin. When opening a new bin, the transition checks if no item can be packed in the current bin as a precondition, which is unnecessary but useful to exclude suboptimal solutions. However, for similar reasons to the state constraint in TSPTW, we do not model this precondition in the PDDL models. We could model this condition by defining (or (packed ?j) (>> (w ?j) (r))) for each item ?j, where (packed ?j) represents if ?j is already packed, (w ?j) represents the weight of ?j, and (r) is the remaining capacity. However, as discussed above, NLM-CutPlan Orbit would generate an exponential number of actions with this condition.

In addition to the above problem classes, we also use MOSP: it was used as a benchmark domain in the classical planning tracks of the International Planning Competitions from 2006 to 2014. This PDDL formulation is different from our DyPDL model. To solve the model, we use Ragnarok [160], the winner of the optimal classical track of IPC 2023.202020https://ipc2023-classical.github.io/

We do not use other problem classes since their DyPDL models do not minimize the sum of the state-independent and nonnegative action costs. In 1||wiTi1||\sum w_{i}T_{i} and talent scheduling, since the cost of each transition depends on a set variable, we need an exponential number of actions to make it state-independent. In OPTW and MDKP, the objective is to maximize the sum of the nonnegative profits. In graph-clear, the objective is to minimize the maximum value of state-dependent weights associated with transitions.

C.2 Picat

Picat is a logic programming language, in which DP can be used with tabling, a feature to store and reuse the evaluation results of predicates, without implementing a DP algorithm. Picat provides an AI planning module based on tabling, where a state, goal conditions, and actions can be programmatically described by expressions in Picat. While the cost of a plan is still restricted to the sum of the nonnegative action costs, each action cost can be state-dependent. In addition, an admissible heuristic function can be defined and used by a solving algorithm. Thus, we can define a dual bound function as an admissible heuristic function in the AI planning module. However, we cannot model dominance between states. Using the AI planning module, we formulate models for TSPTW, CVRP, m-PDTSP, bin packing, SALBP-1, 1||wiTi1||\sum w_{i}T_{i}, and talent scheduling, which are the same as the DyPDL models except that they do not define dominance. To solve the formulated models, we use the best_plan_bb predicate, which performs a branch-and-bound algorithm using the heuristic function. For OPTW, MDKP, MOSP, and graph-clear, we do not use the AI planning module due to the objective structure of their DyPDL models. We define DP models for these problem classes, which are the same as the DyPDL models except that they do not define dominance and dual bound functions, using tabling without the AI planning module.

Acknowledgement

This work was supported by the Natural Sciences and Engineering Research Council of Canada. This reseearch was enabled in part by support provided by Compute Ontario and the Digital Research Alliance of Canada (alliancecan.ca).

References

  • Toth and Vigo [2014] P. Toth, D. Vigo, Vehicle Routing: Problems, Methods, and Applications, second ed., Society for Industrial and Applied Mathematics, 2014. doi:10.1137/1.9781611973594.
  • Cheng et al. [1993] T. C. E. Cheng, J. E. Diamond, B. M. T. Lin, Optimal scheduling in film production to minimize talent hold cost, J. Optim. Theory Appl. 79 (1993) 479–492. doi:10.1007/BF00940554.
  • Pinedo [2009] M. L. Pinedo, Planning and Scheduling in Manufacturing and Services, second ed., Springer, New York, NY, 2009. doi:10.1007/978-1-4419-0910-7.
  • Boysen et al. [2021] N. Boysen, P. Schulze, A. Scholl, Assembly line balancing: What happened in the last fifteen years?, Eur. J. Oper. Res. 301 (2021) 797–814. doi:10.1016/j.ejor.2021.11.043.
  • Russell and Norvig [2020] S. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, fourth ed., Pearson, 2020.
  • Beck and Fox [1998] J. C. Beck, M. S. Fox, A generic framework for constraint-directed search and scheduling, AI Mag. 19 (1998) 103. doi:10.1609/aimag.v19i4.1426.
  • Bengio et al. [2021] Y. Bengio, A. Lodi, A. Prouvost, Machine learning for combinatorial optimization: A methodological tour d’horizon, Eur. J. Oper. Res. 290 (2021) 405–421. doi:10.1016/j.ejor.2020.07.063.
  • Korte and Vygen [2018] B. Korte, J. Vygen, Combinatorial Optimization: Theory and Algorithms, sixth ed., Springer, Berlin, Heidelberg, 2018. doi:10.1007/978-3-662-56039-6.
  • Freuder [1997] E. Freuder, In pursuit of the holy grail, Constraints 2 (1997) 57–61. doi:10.1023/A:1009749006768.
  • Hungerländer and Truden [2018] P. Hungerländer, C. Truden, Efficient and easy-to-implement mixed-integer linear programs for the traveling salesperson problem with time windows, Transp. Res. Proc. 30 (2018) 157–166. doi:10.1016/j.trpro.2018.09.018.
  • Booth et al. [2016] K. E. Booth, T. T. Tran, G. Nejat, J. C. Beck, Mixed-integer and constraint programming techniques for mobile robot task planning, IEEE Robot. Autom. Lett. 1 (2016) 500–507. doi:10.1109/LRA.2016.2522096.
  • Gadegaard and Lysgaard [2021] S. L. Gadegaard, J. Lysgaard, A symmetry-free polynomial formulation of the capacitated vehicle routing problem, Discrete Appl. Math. 296 (2021) 179–192. doi:10.1016/j.dam.2020.02.012.
  • Rabbouch et al. [2019] B. Rabbouch, F. Saâdaoui, R. Mraihi, Constraint programming based algorithm for solving large-scale vehicle routing problems, in: Hybrid Artificial Intelligent Systems, Springer International Publishing, Cham, 2019, pp. 526–539. doi:10.1007/978-3-030-29859-3_45.
  • Letchford and Salazar-González [2016] A. N. Letchford, J.-J. Salazar-González, Stronger multi-commodity flow formulations of the (capacitated) sequential ordering problem, Eur. J. Oper. Res. 251 (2016) 74–84. doi:10.1016/j.ejor.2015.11.001.
  • Ritt and Costa [2018] M. Ritt, A. M. Costa, Improved integer programming models for simple assembly line balancing and related problems, Int. Trans. Oper. Res. 25 (2018) 1345–1359. doi:10.1111/itor.12206.
  • Bukchin and Raviv [2018] Y. Bukchin, T. Raviv, Constraint programming for solving various assembly line balancing problems, Omega 78 (2018) 57–68. doi:10.1016/j.omega.2017.06.008.
  • Keha et al. [2009] A. B. Keha, K. Khowala, J. W. Fowler, Mixed integer programming formulations for single machine scheduling problems, Comput. Ind. Eng. 56 (2009) 357–367. doi:10.1016/j.cie.2008.06.008.
  • Martin et al. [2021] M. Martin, H. H. Yanasse, M. J. Pinto, Mathematical models for the minimization of open stacks problem, Int. Trans. Oper. Res. 29 (2021) 2944–2967. doi:10.1111/itor.13053.
  • Morin et al. [2018] M. Morin, M. P. Castro, K. E. Booth, T. T. Tran, C. Liu, J. C. Beck, Intruder alert! Optimization models for solving the mobile robot graph-clear problem, Constraints 23 (2018) 335–354. doi:10.1007/s10601-018-9288-3.
  • Laborie et al. [2018] P. Laborie, J. Rogerie, P. Shaw, P. Vilím, IBM ILOG CP optimizer for scheduling, Constraints 23 (2018) 210–250. doi:10.1007/s10601-018-9281-x.
  • Bellman [1957] R. Bellman, Dynamic Programming, Princeton University Press, 1957.
  • Held and Karp [1962] M. Held, R. M. Karp, A dynamic programming approach to sequencing problems, J. Soc. Ind. Appl. Math. 10 (1962) 196–210. doi:10.1137/0110015.
  • Dumas et al. [1995] Y. Dumas, J. Desrosiers, E. Gelinas, M. M. Solomon, An optimal algorithm for the traveling salesman problem with time windows, Oper. Res. 43 (1995) 367–371. doi:10.1287/opre.43.2.367.
  • Gromicho et al. [2012] J. Gromicho, J. J. V. Hoorn, A. L. Kok, J. M. Schutten, Restricted dynamic programming: A flexible framework for solving realistic VRPs, Comput. Oper. Res. 39 (2012) 902–909. doi:10.1016/j.cor.2011.07.002.
  • Righini and Salani [2008] G. Righini, M. Salani, New dynamic programming algorithms for the resource constrained elementary shortest path problem, Networks 51 (2008) 155–170. doi:10.1002/net.20212.
  • Righini and Salani [2009] G. Righini, M. Salani, Decremental state space relaxation strategies and initialization heuristics for solving the orienteering problem with time windows with dynamic programming, Comput. Oper. Res. 36 (2009) 1191–1203. doi:10.1016/j.cor.2008.01.003.
  • Lawler [1964] E. L. Lawler, On scheduling problems with deferral costs, Manag. Sci. 11 (1964) 280–288. doi:10.1287/mnsc.11.2.280.
  • Garcia de la Banda and Stuckey [2007] M. Garcia de la Banda, P. J. Stuckey, Dynamic programming to minimize the maximum number of open stacks, INFORMS J. Comput. 19 (2007) 607–617. doi:10.1287/ijoc.1060.0205.
  • Garcia de la Banda et al. [2011] M. Garcia de la Banda, P. J. Stuckey, G. Chu, Solving talent scheduling with dynamic programming, INFORMS J. Comput. 23 (2011) 120–137. doi:10.1287/ijoc.1090.0378.
  • Ghallab et al. [2004] M. Ghallab, D. Nau, P. Traverso, Automated Planning, The Morgan Kaufmann Series in Artificial Intelligence, Morgan Kaufmann, Burlington, 2004. doi:10.1016/B978-1-55860-856-6.X5000-5.
  • Russell and Norvig [2020] S. Russell, P. Norvig, Solving problems by searching, in: Artificial Intelligence: A Modern Approach, fourth ed., Pearson, 2020, pp. 63–109.
  • Pearl and Kim [1982] J. Pearl, J. H. Kim, Studies in semi-admissible heuristics, IEEE Trans. Pattern Anal. Machh. Intell. PAMI-4 (1982) 392–399. doi:10.1109/TPAMI.1982.4767270.
  • Edelkamp and Schrödl [2012] S. Edelkamp, S. Schrödl, Heuristic Search: Theory and Applications, Morgan Kaufmann, San Francisco, 2012. doi:10.1016/C2009-0-16511-X.
  • Bonet and Geffner [2001] B. Bonet, H. Geffner, Planning as heuristic search, Artificial Intelligence 129 (2001) 5–33. doi:10.1016/S0004-3702(01)00108-4.
  • Hoffmann and Nebel [2001] J. Hoffmann, B. Nebel, The FF planning system: Fast plan generation through heuristic search, Journal of Artificial Intelligence Research 14 (2001) 253–302. doi:10.1613/jair.855.
  • Helmert [2006] M. Helmert, The Fast Downward planning system, J. Artif. Intell. Res. 26 (2006) 191–246. doi:10.1613/jair.1705.
  • Karp and Held [1967] R. M. Karp, M. Held, Finite-state processes and dynamic programming, SIAM J. Appl. Math. 15 (1967) 693–718. doi:10.1137/0115060.
  • Ibaraki [1972] T. Ibaraki, Representation theorems for equivalent optimization problems, Inf. Control 21 (1972) 397–435. doi:10.1016/S0019-9958(72)90125-8.
  • Ibaraki [1973a] T. Ibaraki, Finite state representations of discrete optimization problems, SIAM J. Comput. 2 (1973a) 193–210. doi:10.1137/0202016.
  • Ibaraki [1973b] T. Ibaraki, Solvable classes of discrete dynamic programming, J. Math. Anal. Appl. 43 (1973b) 642–693. doi:10.1016/0022-247X(73)90283-7.
  • Ibaraki [1974] T. Ibaraki, Classes of discrete optimization problems and their decision problems, J. Comput. Syst. Sci. 8 (1974) 84–116. doi:10.1016/S0022-0000(74)80024-3.
  • Martelli and Montanari [1975] A. Martelli, U. Montanari, On the foundations of dynamic programming, in: S. Rinaldi (Ed.), Topics in Combinatorial Optimization, Springer, Vienna, 1975, pp. 145–163. doi:10.1007/978-3-7091-3291-3_9.
  • Helman [1982] P. Helman, A New Theory of Dynamic Programming, Ph.D. thesis, University of Michigan, Ann Arbor, 1982.
  • Kumar and Kanal [1988] V. Kumar, L. N. Kanal, The cdp: A unifying formulation for heuristic search, dynamic programming, and branch-and-bound, in: L. Kanal, V. Kumar (Eds.), Search in Artificial Intelligence, Springer, New York, NY, 1988, pp. 1–27. doi:10.1007/978-1-4613-8788-6_1.
  • Michie [1968] D. Michie, “memo” functions and machine learning, Nature 218 (1968) 19–22. doi:10.1038/218019a0.
  • Bird [1980] R. S. Bird, Tabulation techniques for recursive programs, ACM Comput. Surveys 12 (1980) 403–417. doi:10.1145/356827.356831.
  • Tamaki and Sato [1986] H. Tamaki, T. Sato, Old resolution with tabulation, in: Third International Conference on Logic Programming (ICLP), Springer, Berlin, Heidelberg, 1986, pp. 84–98.
  • Puchinger and Stuckey [2008] J. Puchinger, P. J. Stuckey, Automating branch-and-bound for dynamic programs, in: Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM ’08, Association for Computing Machinery, New York, NY, USA, 2008, pp. 81–89. doi:10.1145/1328408.1328421.
  • Zhou et al. [2015] N.-F. Zhou, H. Kjellerstrand, J. Fruhman, Constraint Solving and Planning with Picat, Springer, Cham, 2015. doi:10.1007/978-3-319-25883-6.
  • Giegerich and Meyer [2002] R. Giegerich, C. Meyer, Algebraic dynamic programming, in: Proceedings of the Ninth Algebraic Methodology and Software Technology, Springer, Berlin, Heidelberg, 2002, pp. 349–364. doi:10.1007/3-540-45719-4_24.
  • zu Siederdissen et al. [2015] C. H. zu Siederdissen, S. J. Prohaska, P. F. Stadler, Algebraic dynamic programming over general data structures, BMC Bioinformatics 16 (2015). doi:10.1186/1471-2105-16-S19-S2.
  • Eisner et al. [2005] J. Eisner, E. Goldlust, N. A. Smith, Compiling comp ling: Weighted dynamic programming and the Dyna language, in: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), Association for Computational Linguistics, USA, 2005, pp. 281–290. doi:10.3115/1220575.1220611.
  • Vieira et al. [2017] T. Vieira, M. Francis-Landau, N. W. Filardo, F. Khorasani, J. Eisner, Dyna: Toward a self-optimizing declarative language for machine learning applications, in: Proceedings of the First ACM SIGPLAN Workshop on Machine Learning and Programming Languages (MAPL), ACM, 2017, pp. 8–17. doi:10.1145/3088525.3088562.
  • Sundstrom and Guzzella [2009] O. Sundstrom, L. Guzzella, A generic dynamic programming matlab function, in: 2009 IEEE Control Applications, (CCA) & Intelligent Control, (ISIC), 2009, pp. 1625–1630. doi:10.1109/CCA.2009.5281131.
  • Miretti et al. [2021] F. Miretti, D. Misul, E. Spessa, Dynaprog: Deterministic dynamic programming solver for finite horizon multi-stage decision problems, SoftwareX 14 (2021) 100690. doi:10.1016/j.softx.2021.100690.
  • Hooker [2013] J. N. Hooker, Decision diagrams and dynamic programming, in: Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems – 10th International Conference, CPAIOR 2013, Springer, Berlin, Heidelberg, 2013, pp. 94–110. doi:10.1007/978-3-642-38171-3_7.
  • Bergman et al. [2016] D. Bergman, A. A. Cire, W.-J. van Hoeve, J. N. Hooker, Discrete optimization with decision diagrams, INFORMS J. Comput. 28 (2016) 47–66. doi:10.1287/ijoc.2015.0648.
  • Gillard et al. [2020] X. Gillard, P. Schaus, V. Coppé, Ddo, a generic and efficient framework for MDD-based optimization, in: Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint Conferences on Artificial Intelligence Organization, 2020, pp. 5243–5245. doi:10.24963/ijcai.2020/757, demos.
  • Michel and van Hoeve [2024] L. Michel, W.-J. van Hoeve, CODD: A decision diagram-based solver for combinatorial optimization, in: ECAI 2024 – 27th European Conference on Artificial Intelligence, volume 392 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2024, pp. 4240–4247. doi:10.3233/FAIA240997.
  • Lew and Mauch [2006] A. Lew, H. Mauch, Dynamic Programming: A Computational Tool, Springer, Berlin, Heidelberg, 2006. doi:10.1007/978-3-540-37014-7.
  • Chalumeau et al. [2021] F. Chalumeau, I. Coulon, Q. Cappart, L.-M. Rousseau, Seapearl: A constraint programming solver guided by reinforcement learning, in: Integration of Constraint Programming, Artificial Intelligence, and Operations Research – 18th International Conference, CPAIOR 2021, Springer International Publishing, Cham, 2021, pp. 392–409. doi:10.1007/978-3-030-78230-6_25.
  • Cappart et al. [2021] Q. Cappart, T. Moisan, L.-M. Rousseau, I. Prémont-Schwarz, A. A. Cire, Combining reinforcement learning and constraint programming for combinatorial optimization, in: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Palo Alto, California USA, 2021, pp. 3677–3687. doi:10.1609/aaai.v35i5.16484.
  • Ghallab et al. [1998] M. Ghallab, A. Howe, C. Knoblock, D. McDermott, A. Ram, M. Veloso, D. Weld, D. Wilkins, PDDL - The Planning Domain Definition Language, Technical Report, Yale Center for Computational Vison and Control, 1998. CVC TR-98-003/DCS TR-1165.
  • Fox and Long [2003] M. Fox, D. Long, PDDL2.1: An extension to PDDL for expressing temporal planning domains, J. Artif. Intell. Res. 20 (2003) 61–124. doi:10.1613/jair.1129.
  • Fox and Long [2006] M. Fox, D. Long, Modelling mixed discrete-continuous domains for planning, J. Artif. Intell. Res. 27 (2006) 235–297. doi:10.1613/jair.2044.
  • Sanner [2010] S. Sanner, Relational dynamic influence diagram language (rddl): Language description, 2010. URL: http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf, accessed on 2024-05-31.
  • Hernádvölgyi et al. [2000] I. T. Hernádvölgyi, R. C. Holte, T. Walsh, Experiments with automatically created memory-based heuristics, in: Abstraction, Reformulation, and Approximation. SARA 2000, Springer, Berlin, Heidelberg, 2000, pp. 281–290. doi:10.1007/3-540-44914-0_18.
  • Gentzel et al. [2020] R. Gentzel, L. Michel, W.-J. van Hoeve, Haddock: A language and architecture for decision diagram compilation, in: Principles and Practice of Constraint Programming – CP 2020, Springer International Publishing, Cham, 2020, pp. 531–547. doi:10.1007/978-3-030-58475-7_31.
  • Martelli and Montanari [1975] A. Martelli, U. Montanari, From dynamic programming to search algorithms with functional costs, in: Proceedings of the Fourth International Joint Conference on Artificial Intelligence, IJCAI-75, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1975, pp. 345–350.
  • Catalano et al. [1979] A. Catalano, S. Gnesi, U. Montanari, Shortest path problems and tree grammars: An algebraic framework, in: Graph-Grammars and Their Application to Computer Science and Biology, Springer, Berlin, Heidelberg, 1979, pp. 167–179. doi:10.1007/BFb0025719.
  • Gnesi et al. [1981] S. Gnesi, U. Montanari, A. Martelli, Dynamic programming as graph searching: An algebraic approach, J. ACM 28 (1981) 737–751. doi:10.1145/322276.322285.
  • Ibaraki [1978] T. Ibaraki, Branch-and-bound procedure and state-space representation of combinatorial optimization problems, Inf. Control 36 (1978) 1–27. doi:10.1016/S0019-9958(78)90197-3.
  • Holte and Fan [2015] R. Holte, G. Fan, State space abstraction in artificial intelligence and operations research, in: Planning, Search, and Optimization: Papers from the 2015 AAAI Workshop, 2015, pp. 55–60.
  • Kuroiwa and Beck [2023a] R. Kuroiwa, J. C. Beck, Domain-independent dynamic programming: Generic state space search for combinatorial optimization, in: Proceedings of the 33rd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2023a, pp. 236–244. doi:10.1609/icaps.v33i1.27200.
  • Kuroiwa and Beck [2023b] R. Kuroiwa, J. C. Beck, Solving domain-independent dynamic programming problems with anytime heuristic search, in: Proceedings of the 33rd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2023b, pp. 245–253. doi:10.1609/icaps.v33i1.27201.
  • Kuroiwa and Beck [2024] R. Kuroiwa, J. C. Beck, Parallel beam search algorithms for domain-independent dynamic programming, in: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Washington, DC, USA, 2024, pp. 20743–20750. doi:10.1609/aaai.v38i18.30062.
  • Howard [1960] R. A. Howard, Dynamic programming and markov process, John Wiley & Sons, Inc., New York, 1960.
  • Sutton and Barto [2018] R. S. Sutton, A. G. Barto, Dynamic programming, in: Reinforcement Learning: An Introduction, second edition ed., A Bradford Book, Cambridge, MA, USA, 2018.
  • Savelsbergh [1985] M. W. P. Savelsbergh, Local search in routing problems with time windows, Ann. Oper. Res. 4 (1985) 285–305. doi:10.1007/BF02022044.
  • Fikes and Nilsson [1971] R. E. Fikes, N. J. Nilsson, Strips: A new approach to the application of theorem proving to problem solving, Artif. Intell. 2 (1971) 189–208. doi:10.1016/0004-3702(71)90010-5.
  • Bäckström and Nebel [1995] C. Bäckström, B. Nebel, Complexity results for SAS+ planning, Comput. Intell. 11 (1995) 625–656. doi:10.1111/j.1467-8640.1995.tb00052.x.
  • Helmert [2002] M. Helmert, Decidability and undecidability results for planning with numerical state variables, in: Proceedings of the Sixth International Conference on Artificial Intelligence Planning Systems (AIPS), AAAI Press, 2002, pp. 44–53.
  • Gnad et al. [2023] D. Gnad, M. Helmert, P. Jonsson, A. Shleyfman, Planning over integers: Compilations and undecidability, in: Proceedings of the 23rd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2023, pp. 148–152. doi:10.1609/icaps.v33i1.27189.
  • Shleyfman et al. [2023] A. Shleyfman, D. Gnad, P. Jonsson, Structurally restricted fragments of numeric planning – a complexity analysis, in: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Washington, DC, USA, 2023, pp. 12112–12119. doi:10.1609/aaai.v37i10.26428.
  • Gigante and Scala [2023] N. Gigante, E. Scala, On the compilability of bounded numeric planning, in: Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI-23, International Joint Conferences on Artificial Intelligence Organization, 2023, pp. 5341–5349. doi:10.24963/ijcai.2023/593, main track.
  • McDermott [2000] D. M. McDermott, The 1998 AI planning systems competition, AI Mag. 21 (2000) 35–55. doi:10.1609/aimag.v21i2.1506.
  • Torralba and Hoffmann [2015] Á. Torralba, J. Hoffmann, Simulation-based admissible dominance pruning, in: Proceedings of the 24th International Joint Conference on Artificial Intelligence, IJCAI-15, AAAI Press/International Joint Conferences on Artificial Intelligence Organization, Palo Alto, California USA, 2015, pp. 1689–1695.
  • Libralesso et al. [2020] L. Libralesso, A.-M. Bouhassoun, H. Cambazard, V. Jost, Tree search for the sequential ordering problem, in: ECAI 2020 – 24th European Conference on Artificial Intelligence, volume 325 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2020, pp. 459–465. doi:10.3233/FAIA200126.
  • Pearl [1984] J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley Longman Publishing Co., Inc., USA, 1984. doi:10.5555/525.
  • Dijkstra [1959] E. W. Dijkstra, A note on two problems in connexion with graphs, Numer. Math. (1959) 269–271. doi:10.1007/BF01386390.
  • Hart et al. [1968] P. E. Hart, N. J. Nilsson, B. Raphael, A formal basis for the heuristic determination of minimum cost paths, IEEE Trans. Syst. Sci. Cybern. 4 (1968) 100–107. doi:10.1109/TSSC.1968.300136.
  • Edelkamp et al. [2005] S. Edelkamp, S. Jabbar, A. L. Lafuente, Cost-algebraic heuristic search, in: Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), AAAI Press, 2005, pp. 1362–1367.
  • Nau et al. [1984] D. S. Nau, V. Kumar, L. Kanal, General branch and bound, and its relation to A* and AO*, Artif. Intell. 23 (1984) 29–58. doi:10.1016/0004-3702(84)90004-3.
  • Chakrabarti et al. [1989] P. Chakrabarti, S. Ghose, A. Pandey, S. De Sarkar, Increasing search efficiency using multiple heuristics, Inf. Process. Lett. 30 (1989) 33–36. doi:10.1016/0020-0190(89)90171-3.
  • Baier et al. [2009] J. A. Baier, F. Bacchus, S. A. McIlraith, A heuristic search approach to planning with temporally extended preferences, Artif. Intell. 173 (2009) 593–618. doi:10.1016/j.artint.2008.11.011, advances in Automated Plan Generation.
  • Thayer and Ruml [2011] J. T. Thayer, W. Ruml, Bounded suboptimal search: A direct approach using inadmissible estimates, in: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI-11, AAAI Press/International Joint Conferences on Artificial Intelligence Orginization, Menlo Park, California, 2011, pp. 674–679. doi:10.5591/978-1-57735-516-8/IJCAI11-119.
  • Aine et al. [2016] S. Aine, S. Swaminathan, V. Narayanan, V. Hwang, M. Likhachev, Multi-heuristic A*, Int. J. Robot. Res. 35 (2016) 224–243. doi:10.1177/0278364915594029.
  • Fickert et al. [2022] M. Fickert, T. Gu, W. Ruml, New results in bounded-suboptimal search, in: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Palo Alto, California USA, 2022, pp. 10166–10173. doi:10.1609/aaai.v36i9.21256.
  • Vadlamudi et al. [2012] S. G. Vadlamudi, P. Gaurav, S. Aine, P. P. Chakrabarti, Anytime column search, in: AI 2012: Advances in Artificial Intelligence, Springer, Berlin, Heidelberg, 2012, pp. 254–265. doi:10.1007/978-3-642-35101-3_22.
  • Vadlamudi et al. [2016] S. G. Vadlamudi, S. Aine, P. P. Chakrabarti, Anytime pack search, Nat. Comput. 15 (2016) 395–414. doi:10.1007/978-3-642-45062-4_88.
  • Kao et al. [2009] G. K. Kao, E. C. Sewell, S. H. Jacobson, A branch, bound, and remember algorithm for the 1|rt|ti1|r_{t}|\sum t_{i} scheduling problem, J. Sched. 12 (2009) 163–175. doi:10.1007/s10951-008-0087-3.
  • Sewell and Jacobson [2012] E. C. Sewell, S. H. Jacobson, A branch, bound, and remember algorithm for the simple assembly line balancing problem, INFORMS J. Comput. 24 (2012) 433–442. doi:10.1287/ijoc.1110.0462.
  • Morrison et al. [2014] D. R. Morrison, E. C. Sewell, S. H. Jacobson, An application of the branch, bound, and remember algorithm to a new simple assembly line balancing dataset, Eur. J. Oper. Res. 236 (2014) 403–409. doi:10.1016/j.ejor.2013.11.033.
  • Harvey and Ginsberg [1995] W. D. Harvey, M. L. Ginsberg, Limited discrepancy search, in: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI-95, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995, pp. 607–613.
  • Beck and Perron [2000] J. C. Beck, L. Perron, Discrepancy-bounded depth first search, in: Second International Workshop on Integration of AI and OR Technologies for Combinatorial Optimization Problems, CPAIOR 2000, 2000.
  • Zhou and Hansen [2006] R. Zhou, E. A. Hansen, Breadth-first heuristic search, Artif. Intell. 170 (2006) 385–408. doi:10.1016/j.artint.2005.12.002.
  • Zhang [1998] W. Zhang, Complete anytime beam search, in: Proceedings of the 15th National Conference on Artificial Intelligence (AAAI), AAAI Press, 1998, pp. 425–430.
  • Libralesso et al. [2022] L. Libralesso, P. A. Focke, A. Secardin, V. Jost, Iterative beam search algorithms for the permutation flowshop, Eur. J. Oper. Res. 301 (2022) 217–234. doi:10.1016/j.ejor.2021.10.015.
  • Dantzig and Ramser [1959] G. B. Dantzig, J. H. Ramser, The truck dispatching problem, Manag. Sci. 6 (1959) 80–91. doi:10.1287/mnsc.6.1.80.
  • Toth and Vigo [2002] P. Toth, D. Vigo, Models, relaxations and exact approaches for the capacitated vehicle routing problem, Discrete Applied Mathematics 123 (2002) 487–512. doi:10.1016/S0166-218X(01)00351-1.
  • Hernández-Pérez and Salazar-González [2009] H. Hernández-Pérez, J. J. Salazar-González, The multi-commodity one-to-one pickup-and-delivery traveling salesman problem, Eur. J. Oper. Res. 196 (2009) 987–995. doi:10.1016/j.ejor.2008.05.009.
  • Gouveia and Ruthmair [2015] L. Gouveia, M. Ruthmair, Load-dependent and precedence-based models for pickup and delivery problems, Comput. Oper. Res. 63 (2015) 56–71. doi:10.1016/j.cor.2015.04.008.
  • Castro et al. [2020] M. P. Castro, A. A. Cire, J. C. Beck, An MDD-based lagrangian approach to the multicommodity pickup-and-delivery TSP, INFORMS J. Comput. 32 (2020) 263–278. doi:10.1287/ijoc.2018.0881.
  • Kantor and Rosenwein [1992] M. G. Kantor, M. B. Rosenwein, The orienteering problem with time windows, J. Oper. Res. Soc. 43 (1992) 629–635. doi:10.1057/jors.1992.88.
  • Golden et al. [1987] B. L. Golden, L. Levy, R. Vohra, The orienteering problem, Naval Research Logistics (NRL) 34 (1987) 307–318. doi:10.1002/1520-6750(198706)34:3<307::AID-NAV3220340302>3.0.CO;2-D.
  • Martello and Toth [1990] S. Martello, P. Toth, Knapsack Problems: Algorithms and Computer Implementations, John Wiley & Sons, Inc., New York, NY, USA, 1990.
  • Kellerer et al. [2004] H. Kellerer, U. Pferschy, D. Pisinger, Knapsack Problems, Springer, Berlin, Heidelberg, 2004. doi:10.1007/978-3-540-24777-7.
  • Dantzig [1957] G. B. Dantzig, Discrete-variable extremum problems, Oper. Res. 5 (1957) 266–277. doi:10.1287/opre.5.2.266.
  • Garey and Johnson [1979] M. R. Garey, D. S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness, W. H. Freeman and Company, New York, 1979.
  • Johnson [1988] R. V. Johnson, Optimally balancing large assembly lines with ‘Fable’, Manag. Sci. 34 (1988) 240–253. doi:10.1287/mnsc.34.2.240.
  • Salveson [1955] M. E. Salveson, The assembly-line balancing problem, J. Ind. Eng. 6 (1955) 18–25. doi:10.1115/1.4014559.
  • İlker Baybars [1986] İlker Baybars, A survey of exact algorithms for the simple assembly line balancing problem, Manag. Sci. 32 (1986) 909–932. doi:10.1287/mnsc.32.8.909.
  • Álvarez Miranda and Pereira [2019] E. Álvarez Miranda, J. Pereira, On the complexity of assembly line balancing problems, Computers & Operations Research 108 (2019) 182–186. doi:10.1016/j.cor.2019.04.005.
  • Jackson [1956] J. R. Jackson, A computing procedure for a line balancing problem, Manag. Sci. 2 (1956) 261–271. doi:10.1287/mnsc.2.3.261.
  • Scholl and Klein [1997] A. Scholl, R. Klein, SALOME: A bidirectional branch-and-bound procedure for assembly line balancing, INFORMS J. Comput. 9 (1997) 319–335. doi:10.1287/ijoc.9.4.319.
  • Kolling and Carpin [2007] A. Kolling, S. Carpin, The graph-clear problem: Definition, theoretical properties and its connections to multirobot aided surveillance, in: Proceedings of IEEE International Conference on Intelligent Robots and Systems (IROS), 2007, pp. 1003–1008. doi:10.1109/IROS.2007.4399368.
  • Gurobi Optimization [2023] L. Gurobi Optimization, Gurobi optimizer reference manual, 2023. URL: https://www.gurobi.com, accessed on 2024-05-31.
  • Gendreau et al. [1998] M. Gendreau, A. Hertz, G. Laporte, M. Stan, A generalized insertion heuristic for the traveling salesman problem with time windows, Oper. Res. 46 (1998) 330–346. doi:10.1287/opre.46.3.330.
  • Ohlmann and Thomas [2007] J. W. Ohlmann, B. W. Thomas, A compressed-annealing heuristic for the traveling salesman problem with time windows, INFORMS J. Comput. 19 (2007) 80–90. doi:10.1287/ijoc.1050.0145.
  • Ascheuer [1995] N. Ascheuer, Hamiltonian Path Problems in the On-Line Optimization of Flexible Manufacturing Systems, Ph.D. thesis, Technische Universität Berlin, 1995.
  • Gavish and Graves [1978] B. Gavish, S. C. Graves, The Travelling Salesman Problem and Related Problems, Technical Report, Operations Research Center, Massachusetts Institute of Technology, 1978. Working Paper OR 078-78.
  • Uchoa et al. [2017] E. Uchoa, D. Pecin, A. Pessoa, M. Poggi, T. Vidal, A. Subramanian, New benchmark instances for the capacitated vehicle routing problem, Eur. J. Oper. Res. 257 (2017) 845–858. doi:10.1016/j.ejor.2016.08.012.
  • Righini and Salani [2006] G. Righini, M. Salani, Dynamic Programming for the Orienteering Problem with Time Windows, Technical Report 91, Dipartimento di Tecnologie dell’Informazione, Universita degli Studi Milano, Crema, Italy, 2006.
  • Montemanni and Gambardella [2009] R. Montemanni, L. M. Gambardella, Ant colony system for team orienteering problems with time windows, Found. Comput. Decis. Sci. 34 (2009) 287–306.
  • Vansteenwegen et al. [2009] P. Vansteenwegen, W. Souffriau, G. Vanden Berghe, D. Van Oudheusden, Iterated local search for the team orienteering problem with time windows, Comput. Oper. Res. 36 (2009) 3281–3290. doi:10.1016/j.cor.2009.03.008, new developments on hub location.
  • Vansteenwegen et al. [2011] P. Vansteenwegen, W. Souffriau, D. V. Oudheusden, The orienteering problem: A survey, Eur. J. Oper. Res. 209 (2011) 1–10. doi:10.1016/j.ejor.2010.03.045.
  • Beasley [1990] J. E. Beasley, OR-Library: Distributing test problems by electronic mail, J. Oper. Res. Soc. 41 (1990) 1069–1072. doi:10.2307/2582903.
  • Cacchiani et al. [2022] V. Cacchiani, M. Iori, A. Locatelli, S. Martello, Knapsack problems — an overview of recent advances. part II: Multiple, multidimensional, and quadratic knapsack problems, Comput. Oper. Res. 143 (2022) 105693. doi:10.1016/j.cor.2021.105693.
  • Shaw [2004] P. Shaw, A constraint for bin packing, in: Principles and Practice of Constraint Programming – CP 2004, Springer, Berlin, Heidelberg, 2004, pp. 648–662. doi:10.1007/978-3-540-30201-8_47.
  • Delorme et al. [2018] M. Delorme, M. Iori, S. Martello, BPPLIB: A library for bin packing and cutting stock problems, Optim. Lett. 12 (2018) 235–250. doi:10.1007/s11590-017-1192-z.
  • Falkenauer [1996] E. Falkenauer, A hybrid grouping genetic algorithm for bin packing, J. Heuristics 2 (1996) 5–30. doi:10.1007/BF00226291.
  • Scholl et al. [1997] A. Scholl, R. Klein, C. Jürgens, Bison: A fast hybrid procedure for exactly solving the one-dimensional bin packing problem, Comput. Oper. Res. 24 (1997) 627–645. doi:10.1016/S0305-0548(96)00082-2.
  • Wäscher and Gau [1996] G. Wäscher, T. Gau, Heuristics for the integer one-dimensional cutting stock problem: A computational study, Oper.-Res.-Spektrum 18 (1996) 131–144. doi:10.1007/BF01539705.
  • Schwerin and Wäscher [1997] P. Schwerin, G. Wäscher, The bin-packing problem: A problem generator and some numerical experiments with FFD packing and MTP, Int. Trans. Oper. Res. 4 (1997) 377–389. doi:10.1016/S0969-6016(97)00025-7.
  • Schoenfield [2002] J. E. Schoenfield, Fast, Exact Solution of Open Bin Packing Problems Without Linear Programming, Technical Report, US Army Space and Missile Defense Command, Hutsville, Alabama, USA, 2002.
  • Delorme et al. [2016] M. Delorme, M. Iori, S. Martello, Bin packing and cutting stock problems: Mathematical models and exact algorithms, Eur. J. Oper. Res. 255 (2016) 1–20. doi:10.1016/j.ejor.2016.04.030.
  • Emmons [1969] H. Emmons, One-machine sequencing to minimize certain functions of job tardiness, Oper. Res. 17 (1969) 701–715. doi:10.1287/opre.17.4.701.
  • Kanet [2007] J. J. Kanet, New precedence theorems for one-machine weighted tardiness, Math. Oper. Res. 32 (2007) 579–588. doi:10.1287/moor.1070.0255.
  • Qin et al. [2016] H. Qin, Z. Zhang, A. Lim, X. Liang, An enhanced branch-and-bound algorithm for the talent scheduling problem, Eur. J. Oper. Res. 250 (2016) 412–426. doi:10.1016/j.ejor.2015.10.002.
  • Chu and Stuckey [2015] G. Chu, P. J. Stuckey, Learning value heuristics for constraint programming, in: Integration of AI and OR Techniques in Constraint Programming – 12th International Conference, CPAIOR 2015, Springer International Publishing, Cham, 2015, pp. 108–123. doi:10.1007/978-3-319-18008-3_8.
  • Lauriere [1978] J.-L. Lauriere, A language and a program for stating and solving combinatorial problems, Artif. Intell. 10 (1978) 29–127. doi:10.1016/0004-3702(78)90029-2.
  • Yuen and Richardson [1995] B. J. Yuen, K. V. Richardson, Establishing the optimality of sequencing heuristics for cutting stock problems, Eur. J. Oper. Res. 84 (1995) 590–598. doi:10.1016/0377-2217(95)00025-L.
  • Smith and Gent [2005] B. Smith, I. Gent, Constraint modelling challenge report 2005, https://ipg.host.cs.st-andrews.ac.uk/challenge/, 2005.
  • Faggioli and Bentivoglio [1998] E. Faggioli, C. A. Bentivoglio, Heuristic and exact methods for the cutting sequencing problem, Eur. J. Oper. Res. 110 (1998) 564–575. doi:10.1016/S0377-2217(97)00268-3.
  • Chu and Stuckey [2009] G. Chu, P. J. Stuckey, Minimizing the maximum number of open stacks by customer search, in: Principles and Practice of Constraint Programming – CP 2009, Springer, Berlin, Heidelberg, 2009, pp. 242–257. doi:10.1007/978-3-642-04244-7_21.
  • Fusy [2009] E. Fusy, Uniform random sampling of planar graphs in linear time, Random Struct. Algor. 35 (2009) 464–522. doi:10.1002/rsa.20275.
  • Tange [2011] O. Tange, GNU parallel - the command-line power tool, ;login: USENIX Mag. 36 (2011) 42–47.
  • Berthold [2013] T. Berthold, Measuring the impact of primal heuristics, Oper. Res. Lett. 41 (2013) 611–614. doi:10.1016/j.orl.2013.08.007.
  • Kuroiwa et al. [2023] R. Kuroiwa, A. Shleyfman, J. C. Beck, NLM-CutPlan, 2023. URL: https://ipc2023-numeric.github.io/abstracts/NLM_CutPlan_Abstract.pdf, accessed on 2024-05-31.
  • Drexler et al. [2023] D. Drexler, D. Gnad, P. Höft, J. Seipp, D. Speck, S. Ståhlberg, Ragnarok, 2023. URL: https://ipc2023-classical.github.io/abstracts/planner17_ragnarok.pdf, accessed on 2024-05-31.
  • Gillard et al. [2021] X. Gillard, V. Coppé, P. Schaus, A. A. Cire, Improving the filtering of branch-and-bound MDD solver, in: Integration of Constraint Programming, Artificial Intelligence, and Operations Research – 18th International Conference, CPAIOR 2021, Springer International Publishing, Cham, 2021, pp. 231–247. doi:10.1007/978-3-030-78230-6_15.
  • Coppé et al. [2023] V. Coppé, X. Gillard, P. Schaus, Boosting decision diagram-based branch-and-bound by pre-solving with aggregate dynamic programming, in: 29th International Conference on Principles and Practice of Constraint Programming (CP 2023), Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2023, pp. 13:1–13:17. doi:10.4230/LIPIcs.CP.2023.13.
  • Coppé et al. [2024a] V. Coppé, X. Gillard, P. Schaus, Decision diagram-based branch-and-bound with caching for dominance and suboptimality detection, INFORMS J. Comput. (2024a). doi:10.1287/ijoc.2022.0340.
  • Coppé et al. [2024b] V. Coppé, X. Gillard, P. Schaus, Modeling and exploiting dominance rules for discrete optimization with decision diagrams, in: Integration of Constraint Programming, Artificial Intelligence, and Operations Research – 21st International Conference, CPAIOR 2024, 2024b.
  • Baptiste et al. [2001] P. Baptiste, C. Pape, W. Nuijten, Constraint-Based Scheduling: Applying Constraint Programming to Scheduling Problems, International Series in Operations Research & Management Science, Springer, New York, NY, 2001. doi:10.1007/978-1-4615-1479-4.
  • Lenstra et al. [1977] J. Lenstra, A. Rinnooy Kan, P. Brucker, Complexity of machine scheduling problems 1 (1977) 343–362. doi:10.1016/S0167-5060(08)70743-X.
  • Abdul-Razaq et al. [1990] T. S. Abdul-Razaq, C. N. Potts, L. N. V. Wassenhove, A survey of algorithms for the single machine total weighted tardiness scheduling problem, Discrete Appl. Math. 26 (1990) 235–253. doi:10.1016/0166-218X(90)90103-J.
  • Nethercote et al. [2007] N. Nethercote, P. J. Stuckey, R. Becket, S. Brand, G. J. Duck, G. Tack, MiniZinc: Towards a standard CP modelling language, in: Principles and Practice of Constraint Programming – CP 2007, Springer, Berlin, Heidelberg, 2007, pp. 529–543.
  • Linhares and Yanasse [2002] A. Linhares, H. H. Yanasse, Connections between cutting-pattern sequencing, vlsi design, and flexible machines, Computers & Operations Research 29 (2002) 1759–1772. doi:10.1016/S0305-0548(01)00054-5.
  • Hoffmann [2003] J. Hoffmann, The Metric-FF planning system: Translating ”ignoring delete lists” to numeric state variables, J. Artif. Intell. Res. 20 (2003) 291–341. doi:10.1613/jair.1144.
  • Scala et al. [2017] E. Scala, P. Haslum, D. Magazzeni, S. Thiébaux, Landmarks for numeric planning problems, in: Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI-17, International Joint Conferences on Artificial Intelligence Organization, 2017, pp. 4384–4390. doi:10.24963/ijcai.2017/612, main track.
  • Piacentini et al. [2018a] C. Piacentini, M. Castro, A. Cire, J. C. Beck, Compiling optimal numeric planning to mixed integer linear programming, in: Proceedings of the 28th International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, 2018a, pp. 383–387. doi:10.1609/icaps.v28i1.13919.
  • Piacentini et al. [2018b] C. Piacentini, M. P. Castro, A. A. Cire, J. C. Beck, Linear and integer programming-based heuristics for cost-optimal numeric planning, in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Palo Alto, California USA, 2018b, pp. 6254–6261. doi:10.1609/aaai.v32i1.12082.
  • Scala et al. [2020] E. Scala, P. Haslum, S. Thiébaux, M. Ramírez, Subgoaling techniques for satisficing and optimal numeric planning, J. Artif. Intell. Res. 68 (2020) 691–752. doi:10.1613/jair.1.11875.
  • Leofante et al. [2020] F. Leofante, E. Giunchiglia, E. Ábráham, A. Tacchella, Optimal planning modulo theories, in: Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint Conferences on Artificial Intelligence Organization, 2020, pp. 4128–4134. doi:10.24963/ijcai.2020/571, main track.
  • Kuroiwa et al. [2022a] R. Kuroiwa, A. Shleyfman, C. Piacentini, M. P. Castro, J. C. Beck, The LM-cut heuristic family for optimal numeric planning with simple conditions, J. Artif. Intell. Res. 75 (2022a) 1477–1548. doi:10.1613/jair.1.14034.
  • Kuroiwa et al. [2022b] R. Kuroiwa, A. Shleyfman, J. C. Beck, LM-cut heuristics for optimal linear numeric planning, in: Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2022b, pp. 203–212. doi:10.1609/icaps.v32i1.19803.
  • Shleyfman et al. [2023] A. Shleyfman, R. Kuroiwa, J. C. Beck, Symmetry detection and breaking in linear cost-optimal numeric planning, in: Proceedings of the 33rd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2023, pp. 393–401. doi:10.1609/icaps.v33i1.27218.
  • Kuroiwa et al. [2023] R. Kuroiwa, A. Shleyfman, J. Beck, Extracting and exploiting bounds of numeric variables for optimal linear numeric planning, in: ECAI 2023 – 26th European Conference on Artificial Intelligence, volume 372 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2023, pp. 1332–1339. doi:10.3233/FAIA230409.