This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dynamic programming by polymorphic semiring algebraic shortcut fusion

Max A. Little⋆,†    Xi He    Ugur Kayas#
Abstract

Dynamic programming (DP) is a broadly applicable algorithmic design paradigm for the efficient, exact solution of otherwise intractable, combinatorial problems. However, the design of such algorithms is often presented informally in an ad-hoc manner. It is sometimes difficult to justify the correctness of these DP algorithms. To address this issue, this paper presents a rigorous algebraic formalism for systematically deriving DP algorithms, based on semiring polymorphism. We start with a specification, construct a (brute-force) algorithm to compute the required solution which is self-evidently correct because it exhaustively generates and evaluates all possible solutions meeting the specification. We then derive, primarily through the use of shortcut fusion, an implementation of this algorithm which is both efficient and correct. We also demonstrate how, with the use of semiring lifting, the specification can be augmented with combinatorial constraints and through semiring lifting, show how these constraints can also be fused with the derived algorithm. This paper furthermore demonstrates how existing DP algorithms for a given combinatorial problem can be abstracted from their original context and re-purposed to solve other combinatorial problems.

This approach can be applied to the full scope of combinatorial problems expressible in terms of semirings. This includes, for example: optimization, optimal probability and Viterbi decoding, probabilistic marginalization, logical inference, fuzzy sets, differentiable softmax, and relational and provenance queries. The approach, building on many ideas from the existing literature on constructive algorithmics, exploits generic properties of (semiring) polymorphic functions, tupling and formal sums (lifting), and algebraic simplifications arising from constraint algebras. We demonstrate the effectiveness of this formalism for some example applications arising in signal processing, bioinformatics and reliability engineering. Python software implementing these algorithms can be downloaded from: http://www.maxlittle.net/software/dppolyalg.zip.

School of Computer Science, University of Birmingham, UK

MIT, Cambridge, MA, USA

#Abdullah Gul University, Turkey

First author contact: [email protected]111This work partially funded by NIH grant UR-Udall Center, award number P50 NS108676.

1 Introduction

Dynamic programming (DP) (Bellman, 1957; Sniedovich, 2011) is one of the most effective and widely used computational tools for finding exact solutions to a large range of otherwise intractable combinatorial problems (Kleinberg and Tardos, 2005). Typically, the exhaustive (brute-force) solution to problems for which DP is amenable are of exponential or even factorial, time complexity. Where DP is applicable, it is often possible to reduce the worst case computational effort required to solve the problem, to something tractable such as low-order (quasi)-polynomial.

Nonetheless, devising correct and efficient DP algorithms typically relies on special intuition and insight (de Moor, 1999). It is also often difficult to prove that an algorithm is correct with respect to a formal problem specification and gain understanding of the function of these algorithms from their, sometimes inscrutable, implementations. To address these shortcomings, a more systematic approach is to start with a specification of the combinatorial problem and then compute an efficient implementation of the same, through provably correct derivation steps. In this way, the resulting algorithm is both efficient and guaranteed correct. This approach is exemplified in constructive algorithmics frameworks described in e.g. Bird and de Moor (1996), de Moor (1991) and Jeuring (1993).

This paper reports on a novel and relatively simple approach to the systematic design of such DP algorithms which exploits semiring polymorphism and shortcut fusion. Our approach starts from a specification given in terms of a semiring objective over the set of all combinatorial configurations in the DP problem. With a semiring polymorphic generator of this set of combinatorial configurations (often given by a Bellman recursion) we show how to derive an efficient and correct algorithm for solving the semiring objective by applying semiring shortcut fusion. We furthermore show how this approach can be extended to complex DP problems which solve semiring objectives over multiple combinatorial constraints.

Our approach brings together several concepts that have traditionally existed independently across various fields, including machine learning, computational linguistics, and automata theory. Semirings (Golan, 1999) are widely used in special DP applications (Huang, 2008; Goodman, 1999; Mensch and Blondel, 2018; Li and Eisner, 2009), but their general usage in DP lacks rigorous justifications in terms of correctness with respect to a specification, which we provide here. We furthermore show how semiring lifting (Emoto et al., 2012) can be used to solve more complex constrained DP problems where the constraint is expressed in terms of an algebra homomorphism. When this constraint algebra is a group we show that it further simplifies the algorithm implementation. We also show how semirings can often be combined in parallel (tupled) to significant advantage such as providing a general implementation of backtracing applicable to any such semiring DP algorithm.

Our approach is applicable to a very wide range of unconstrained or constrained combinatorial problems which can be expressed in terms of semirings. We demonstrate its effectiveness on some practical, novel extensions of classical problems in signal processing, machine learning, computational statistics and engineering.

The layout of the paper is as follows. In Section 2, we detail the main theoretical developments of this paper, and in Section 3 we develop DP algorithms for applications from several disciplines. Section 4 puts the work into the context of existing research on DP algorithms in general. We end in Section 5 with a summary and discussion of the importance, general scope and possible extensions of the work. The appendices contain detailed proofs of the main results in the paper, list some widely-used semirings and simplified constraint algebras.

2 Theory

In this paper, sets are indicated by the upper case double-strike letters 𝕊,𝕋\mathbb{S},\mathbb{T} with their corresponding cardinalities, S=|𝕊|S=\left|\mathbb{S}\right| and T=|𝕋|T=\left|\mathbb{T}\right|, or the standard sets ,\mathbb{R},\mathbb{N} etc. The Boolean set is given by 𝔹={T,F}\mathbb{B}=\left\{T,F\right\} (for true, false respectively). To indicate the type of sets with elements of type 𝕊\mathbb{S}, we write {𝕊}\left\{\mathbb{S}\right\}. The type of lists (that is, ordered sequences) with elements in the arbitrary type 𝕊\mathbb{S} are denoted with [𝕊]\left[\mathbb{S}\right]. Binary algebraic operators are written as circled symbols, ,,\oplus,\otimes,\odot and their corresponding identities are i,i,ii_{\oplus},i_{\otimes,}i_{\odot}. Algebras and objects such as monoids, groups and semirings using binary operators are given as tuples with upper-case calligraphic letters for names, e.g. 𝒮=(𝕊,,,i,i)\mathcal{S}=\left(\mathbb{S},\oplus,\otimes,i_{\oplus},i_{\otimes}\right) is a semiring over values of type 𝕊\mathbb{S} with binary operators ,:(𝕊𝕊)𝕊\oplus,\otimes:\mathbb{\left(S\to\mathbb{S}\right)\to\mathbb{S}} with identities i,i𝕊i_{\oplus},i_{\otimes}\in\mathbb{S} and =(𝕄,,i)\mathcal{M}=\left(\mathbb{M},\odot,i_{\odot}\right) a monoid with values of type 𝕄\mathbb{M} and binary operator \odot. Integer and natural number indices are given by lower case letters n,in,i etc. We use the superscript notation fu:𝕐f^{u}:\mathbb{Y} as shorthand for (higher-order) function application f(u)f\left(u\right) which does not have any input (it is not a function). Thus, for uu having type 𝕌𝕍\mathbb{U}\to\mathbb{V} it follows the function has type f:(𝕌𝕍)𝕐f:\left(\mathbb{U}\to\mathbb{V}\right)\to\mathbb{Y}. If ff has an additional recurrence index, nn\in\mathbb{N}, then it has type f:(𝕌𝕍)𝕐f:\left(\mathbb{U}\to\mathbb{V}\right)\to\mathbb{N}\to\mathbb{Y} so that fu:𝕐f^{u}:\mathbb{N}\to\mathbb{Y} is shorthand for partial application f(u)f\left(u\right), full application uses subscript notation fnu:𝕐f_{n}^{u}:\mathbb{Y}. Where an algebra appears in the superscript ff^{\mathcal{M}}, this is shorthand for f(,i)f\left(\odot,i_{\odot}\right) which in this case means full application to the operators and identities of the algebra \mathcal{M}. Where there is no ambiguity we suppress superscript parameters purely for presentational clarity. Binary operators are subscripted to indicate lifting, e.g. \oplus_{\mathcal{M}} is the \oplus operator lifted over the monoid algebra \mathcal{M} and the lifted semiring value xmx_{m} denotes a function of type x:𝕄𝕊x:\mathbb{M}\text{$\to\mathbb{S}$} which has an input of type 𝕄\mathbb{M} of values in the monoid algebra \mathcal{M}. Regarding types, lifting means that, whereas the plain semiring operator is a function taking two semiring values to a single semiring value e.g. of type :(𝕊𝕊)𝕊\oplus:\left(\mathbb{S\to}\mathbb{S}\right)\to\mathbb{S}, its lifted version takes two lifted values 𝕄𝕊\mathbb{M}\text{$\to\mathbb{S}$}, returning another lifted value so it has type :((𝕄𝕊)(𝕄𝕊))(𝕄𝕊)\oplus_{\mathcal{M}}:\left(\left(\mathbb{M}\text{$\to\mathbb{S}$}\right)\to\left(\mathbb{M}\text{$\to\mathbb{S}$}\right)\right)\to\left(\mathbb{M}\text{$\to\mathbb{S}$}\right).

2.1 Correct and efficient dynamic programming by semiring fusion

A very wide class of DP problems can be specified as a combinatorial problem over a semiring 𝒮=(𝕊,,,i,i)\mathcal{S}=\left(\mathbb{S},\bigoplus,\bigotimes,i_{\oplus},i_{\otimes}\right),

s=l𝕃xlw(x),\text{$s^{*}$=$\bigoplus_{l\in\mathbb{L}}$$\bigotimes_{x\in l}w\left(x\right)$}, (1)

where 𝕃\mathbb{L} is the set of all combinatorial configurations and w:𝕏𝕊w:\mathbb{X}\to\mathbb{S} embeds values in the DP problem into values in the semiring 𝒮\mathcal{S}. As an example, consider the rod-cutting problem which maximizes the sum of values of segments of a list partition. A partition of a list is a decomposition of a list of values, into a list of non-empty contiguous segments. For instance, for the list [1,2,3]\left[1,2,3\right], the set of configurations is

𝕃={[[1],[2],[3]],[[1,2],[3]],[[1],[2,3]],[[1,2,3]]}.\mathbb{L}=\left\{\left[\left[1\right],\left[2\right],\left[3\right]\right],\left[\left[1,2\right],\left[3\right]\right],\left[\left[1\right],\left[2,3\right]\right],\left[\left[1,2,3\right]\right]\right\}. (2)

In this example, the combinatorial configurations are list partitions of segments, so variable ll represents a partition such as [[1],[2,3]]\left[\left[1\right],\left[2,3\right]\right] and variable xx a segment of that partition such as [2,3]\left[2,3\right], with w(x)w\left(x\right) being the value of that segment. We can specify the rod-cutting problem as

srodcut=maxl𝕃xlw(x).s_{\text{rodcut}}^{*}=\max_{l\in\mathbb{L}}\underset{x\in l}{\sum}w\left(x\right). (3)

which is an instance of (1) with the max-plus semiring (,max,+,,0)\left(\mathbb{R},\max,+,-\infty,0\right).

We will next demonstrate how to solve (1) through brute-force (exhaustive) generate-evaluate computation. There is a special semiring which can be used to exhaustively generate the set of all possible combinatorial configurations 𝕃\mathbb{L}. We call this special semiring the generator semiring 𝒢=({[𝕏]},,,,{[]})\mathcal{G}=\left(\left\{\left[\mathbb{X}\right]\right\},\cup,\circ,\emptyset,\left\{\left[\,\right]\right\}\right). This well-known semiring (and variants) arise in several contexts; for example, to computational linguists it is called the formal language semiring over sets of lists with elements of type 𝕏\mathbb{X}, which we denote by {[𝕏]}\left\{\left[\mathbb{X}\right]\right\}. The operator \cup is set union, and l1l2l_{1}\circ l_{2} is the cross-join of two sets of lists l1,l2:{[𝕏]}l_{1},l_{2}:\left\{\left[\mathbb{X}\right]\right\} obtained by concatenating each element in configuration l1l_{1} with each element in l2l_{2}. To illustrate, for elements set 𝕏={a,b,c,d,e}\mathbb{X}=\left\{a,b,c,d,e\right\},

{[a,b],[c]}{[d],[e]}={[a,b,d],[a,b,e],[c,d],[c,e]}.\left\{\left[a,b\right],\left[c\right]\right\}\circ\left\{\left[d\right],\left[e\right]\right\}=\left\{\left[a,b,d\right],\left[a,b,e\right],\left[c,d\right],\left[c,e\right]\right\}. (4)

Define a semiring generator f𝒮,w:𝕊f^{\mathcal{S},w}:\mathbb{S} which is a function ff fully applied to the arbitrary semiring 𝒮\mathcal{S}’s binary operators and operator identities and value mapping function w:𝕏𝕊w:\mathbb{X}\to\mathbb{S}, which maps some data into semiring values. Apply this function ff to the generator semiring 𝒢\mathcal{G} and value mapping function w(x)=w(x)={[x]}w\left(x\right)=w^{\prime}\left(x\right)=\left\{\left[x\right]\right\} which embeds an element x𝕏x\in\mathbb{X} into a singleton configuration, {[x]}\left\{\left[x\right]\right\}. In this way, we will generate (enumerate) all combinatorial configurations by computing 𝕃=f𝒢,w\mathbb{L}=f^{\mathcal{G},w^{\prime}}. Also, assume we have an evaluation function g𝒮,w:{[𝕏]}𝕊g^{\mathcal{S},w}:\left\{\left[\mathbb{X}\right]\right\}\to\mathbb{S} which is a semiring homomorphism preserving the semiring structure for all l1,l2:{[𝕏]}l_{1},l_{2}:\left\{\left[\mathbb{X}\right]\right\},

g(l1l2)\displaystyle g\left(l_{1}\cup l_{2}\right) =g(l1)g(l2)\displaystyle=g\left(l_{1}\right)\oplus g\left(l_{2}\right) (5)
g(l1l2)\displaystyle g\left(l_{1}\circ l_{2}\right) =g(l1)g(l2)\displaystyle=g\left(l_{1}\right)\otimes g\left(l_{2}\right)
g()\displaystyle g\left(\emptyset\right) =i\displaystyle=i_{\oplus}
g({[]})\displaystyle g\left(\left\{\left[\,\right]\right\}\right) =i\displaystyle=i_{\otimes}

along with the requirement that g({[x]})=w(x)g\left(\left\{\left[x\right]\right\}\right)=w\left(x\right) for all x𝕏x\in\mathbb{X}. Here, we suppress the subscripted parameters from gg for clarity. Now, using this generator ff and evaluator gg, problems defined by the semiring specification (1) can be solved exactly using the following exhaustive generate-evaluate algorithm,

s=g𝒮,w(f𝒢,w).s^{*}=g^{\mathcal{S},w}\left(f^{\mathcal{G},w^{\prime}}\right). (6)

Self-evidently, this solves (1) because f𝒢,wf^{\mathcal{G},w^{\prime}} generates the set 𝕃\mathbb{L} and then g𝒮,wg^{\mathcal{S},w} evaluates each one of the configurations l𝕃l\in\mathbb{L} over \otimes and aggregates them over the semiring \oplus.

While problem (1) is indeed correctly solved by this generate-evaluate algorithm, this computation is usually intractable because the configurations set 𝕃\mathbb{L} is typically exponential (or larger). However, it turns out there is no need to generate the set 𝕃\mathbb{L} in the first place. We have assumed that the generator ff is polymorphic in an arbitrary semiring 𝒮\mathcal{S} i.e. it is a function of type

f:(𝕊𝕊𝕊)(𝕊𝕊𝕊)𝕊𝕊(𝕏𝕊)𝕊.f:\left(\mathbb{S}\to\mathbb{S}\to\mathbb{S}\right)\to\left(\mathbb{S}\to\mathbb{S}\to\mathbb{S}\right)\to\mathbb{S}\to\mathbb{S}\to\left(\mathbb{X}\to\mathbb{S}\right)\to\mathbb{S}. (7)

In words, this function takes as parameters two semiring operators of type 𝕊𝕊𝕊\mathbb{S}\to\mathbb{S}\to\mathbb{S}, two semiring operator identities of type 𝕊\mathbb{S} and a mapping function w:𝕏𝕊w:\mathbb{X\to\mathbb{S}}, returning a value of type 𝕊\mathbb{S}. Then, we can show the following equivalence holds,

s=g𝒮,w(f𝒢,w)=f𝒮,w.s^{*}=g^{\mathcal{S},w}\left(f^{\mathcal{G},w^{\prime}}\right)=f^{\mathcal{S},w}. (8)

provided only in addition that g𝒮,wg^{\mathcal{S},w} is a semiring homomorphism mapping 𝒢𝒮\mathcal{G}\to\mathcal{S}. We call this theorem semiring fusion. If the semiring 𝒮\mathcal{S} has operators with O(1)O\left(1\right) computational complexity, then computing s=f𝒮,ws^{*}=f^{\mathcal{S},w} will be much more computationally efficient than using (6). The proof of this theorem, given rigorously in Appendix A: Proof of DP semiring fusion is a straightforward application of Wadler’s free theorem (Wadler, 1989). This theorem is very widely applicable in the context of DP because semiring polymorphic generators f𝒮,wf^{\mathcal{S},w} are extremely common, for instance, all Bellman recursions are of this form (Bellman, 1957; Sniedovich, 2011).

2.2 Constraint lifting

Combinatorial problems in applications of DP, are often more complex than those which can be specified by the simple semiring formulation (1). For instance, let us suppose we want to constrain the semiring problem to restrict it to apply only to configurations which satisfy some constraint, c:[𝕏]𝔹c:\left[\mathbb{X}\right]\to\mathbb{B}. Such constrained combinatorial problems are specified as,

scons=l𝕃:c(l)xlw(x).s_{\mathrm{cons}}^{*}=\bigoplus_{l\in\mathbb{L}:c\left(l\right)}\underset{x\in l}{\bigotimes}w\left(x\right). (9)

As with exhaustive generate-evaluate above, a self-evidently correct (but not at all “smart”) way to solve (9) is to apply the following algorithm. Firstly, compute all configurations 𝕃\mathbb{L} using a semiring generator f𝒢,wf^{\mathcal{G},w^{\prime}}. Then, remove (filter away) those configurations l𝕃l\in\mathbb{L} for which c(l)=Fc\left(l\right)=F using a filtering function ϕc:{[𝕏]}{[𝕏]}\phi^{c}:\left\{\left[\mathbb{X}\right]\right\}\to\left\{\left[\mathbb{X}\right]\right\} partially applied to the constraint function, cc. Finally, evaluate the remaining configurations using a evaluation function g𝒮,wg^{\mathcal{S},w}. This generate-filter-evaluate algorithm computes,

scons=g𝒮,w(ϕc(f𝒢,w)).s_{\mathrm{cons}}^{*}=g^{\mathcal{S},w}\left(\phi^{c}\left(f^{\mathcal{G},w^{\prime}}\right)\right). (10)

The difficulty with this approach is the same as faced above: the intractable size of the intermediate configurations 𝕃\mathbb{L}. Above, this was solved by invoking semiring fusion (93), but unfortunately here the filtering ϕc\phi^{c} prevents this theorem from being applicable. We will present a practical solution to this which makes use of semiring lifting (Jeuring, 1993; Emoto et al., 2012). This is based on the following idea: if we can find a new semiring which implicitly evaluates the constraints cc, then we can fuse the constraint with the semiring homomorphism (5) so that semiring fusion (93) becomes applicable once more. This will effectively eliminate the need to compute ϕc(f𝒢,w)\phi^{c}\left(f^{\mathcal{G},w^{\prime}}\right), which is responsible for the intractability of the algorithm (10). The resulting theorem will be a generalization of theorem (8).

To apply this idea, we will need constraints cc expressed in a separable form, which we explain in detail next. Although not entirely general, many kinds of constraints typically encountered in combinatorial problems are in this form (Sniedovich, 2011). Such separable constraints are formalized using a constraint algebra which we denote by =(𝕄,,i)\mathcal{M}=\left(\mathbb{M},\odot,i_{\odot}\right). The only restrictions on the constraint set 𝕄\mathbb{M} is that it is countable and (for practical implementation purposes) of finite cardinality. The binary operator \odot is usually accompanied by an identity, ii_{\odot} (but this is not essential, we will illustrate this below). Then, a typical separable constraint can be defined in terms of a recurrence function h,v:𝕄h^{\mathcal{M},v}:\mathbb{N}\to\mathbb{M} partially applied to monoid \mathcal{M}’s operator \odot and identity ii_{\odot} over a set of configurations of length LL,

h0,v\displaystyle h_{0}^{\mathcal{M},v} =i\displaystyle=i_{\odot} (11)
hl,v\displaystyle h_{l}^{\mathcal{M},v} =hl1,vv(xl)l{1,2,,L},\displaystyle=h_{l-1}^{\mathcal{M},v}\odot v\left(x_{l}\right)\quad\forall l\in\left\{1,2,\ldots,L\right\},

where the constraint map v:𝕏𝕄v:\mathbb{X}\to\mathbb{M} maps an element xx in a configuration l𝕃l\in\mathbb{L} into a constraint value 𝕄\mathbb{M}. Widely encountered algebras include arbitrary finite monoids (\odot is associative) and finite groups. To complete this separable specification of the constraint, we define a Boolean acceptance condition, a:𝕄𝔹a:\mathbb{M}\to\mathbb{B}, whereby a configuration is retained if a(hL,v)a\left(h_{L}^{\mathcal{M},v}\right) evaluates to true. Thus, for a configuration l:{[𝕏]}l:\left\{\left[\mathbb{X}\right]\right\} the constraint in (9) is computed using c(l)=a(hL,v)c\left(l\right)=a\left(h_{L}^{\mathcal{M},v}\right).

In this separable formulation, problem (10) is restated by

scons=g𝒮,w(ϕ,v,a(f𝒢,w)),s_{\mathrm{cons}}^{*}=g^{\mathcal{S},w}\left(\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right)\right), (12)

with the modified filtering function ϕ,v,a:{[𝕏]}{[𝕏]}\phi^{\mathcal{M},v,a}:\left\{\left[\mathbb{X}\right]\right\}\to\left\{\left[\mathbb{X}\right]\right\} which is partially applied to the algebra \mathcal{M}, constraint mapping vv and acceptance criteria aa, implementing cc. To illustrate, a specific, recursive implementation of ϕ\phi can be given as (Bird and de Moor, 1996),

ϕ()\displaystyle\phi\left(\emptyset\right) =\displaystyle=\emptyset (13)
ϕ({l})\displaystyle\phi\left(\left\{l\right\}\right) ={{l}a(hl)=Totherwise\displaystyle=\begin{cases}\left\{l\right\}&a\left(h_{l}\right)=T\\ \emptyset&\textrm{otherwise}\end{cases}
ϕ(l1l2)\displaystyle\phi\left(l_{1}\cup l_{2}\right) =ϕ(l1)ϕ(l2)\displaystyle=\phi\left(l_{1}\right)\cup\phi\left(l_{2}\right)

where l1l_{1}, l2l_{2} are configurations in the configuration set 𝕃\mathbb{L} and we suppress the superscripted parameters of ϕ\phi and hh only for clarity.

To give a concrete, practical example of this separable constraint formalism, with the additive constraint group =(,+,0)\mathcal{M}=\left(\mathbb{N},+,0\right), the constraint with the mapping v(x)=1v\left(x\right)=1 computes lengths of configurations l𝕃l\in\mathbb{L} . Indeed, this algebra is just the list length homomorphism defined by the recursion h0=0h_{0}=0, hl=hl1+1h_{l}=h_{l-1}+1 (Bird and de Moor, 1996). Thus, the recurrence (26) coupled with this constraint group and the acceptance criteria,

a(m)={Tm=MFotherwisea\left(m\right)=\begin{cases}T&m=M\\ F&\textrm{otherwise}\end{cases} (14)

restricts configurations to only those of length MM\in\mathbb{N}.

With this separable formulation, we can construct a new semiring by lifting 𝒮\mathcal{S} over the algebra \mathcal{M} (Jeuring, 1993; Emoto et al., 2012). This is the desired semiring with which we can fuse the constraint cc with the semiring homomorphism (5). Lifting creates vectors (which we can also consider as functions) of semiring values of type 𝕄𝕊\mathbb{M}\to\mathbb{S}. The new, composite semiring 𝒮=(𝕄𝕊,,,i,i)\mathcal{S}\mathcal{M}=\left(\mathbb{M}\text{$\to\mathbb{S}$},\oplus_{\mathcal{M}},\otimes_{\mathcal{M}},i_{\oplus_{\mathcal{M}}},i_{\otimes_{\mathcal{M}}}\right) has binary operators over values x,y:𝕄𝕊x,y:\mathbb{M}\text{$\to\mathbb{S}$}given explicitly by,

(xy)m\displaystyle\left(x\oplus_{\mathcal{M}}y\right)_{m} =xmym\displaystyle=x_{m}\oplus y_{m} (15)
(xy)m\displaystyle\left(x\otimes_{\mathcal{M}}y\right)_{m} =mm′′=mm,m′′𝕄(xmym′′),\displaystyle=\bigoplus_{\begin{subarray}{c}m^{\prime}\odot m^{\prime\prime}=m\\ \forall m^{\prime},m^{\prime\prime}\in\mathbb{M}\end{subarray}}\left(x_{m^{\prime}}\otimes y_{m^{\prime\prime}}\right),

where xmx_{m} refers to indexing with values in 𝕄\mathbb{M}. The associated operator identities are,

(i)m\displaystyle\left(i_{\oplus_{\mathcal{M}}}\right)_{m} =im𝕄\displaystyle=i_{\oplus}\quad\forall m\in\mathbb{M} (16)
(i)m\displaystyle\left(i_{\otimes_{\mathcal{M}}}\right)_{m} ={im=iiotherwise.\displaystyle=\begin{cases}i_{\otimes}&m=i_{\odot}\\ i_{\oplus}&\textrm{otherwise}.\end{cases}

The operator \otimes_{\mathcal{M}} is an interesting and far-reaching generalization of discrete convolution operators found in contexts such as digital signal processing, machine learning and applied mathematics. We also need the lifted value mapping, w:𝕏(𝕄𝕊)w_{\mathcal{M}}:\mathbb{X}\to\left(\mathbb{M}\to\mathbb{S}\right) given by,

(w(x))m={w(x)v(x)=miotherwise,\left(w_{\mathcal{M}}\left(x\right)\right)_{m}=\begin{cases}w\left(x\right)&v\left(x\right)=m\\ i_{\oplus}&\textrm{otherwise},\end{cases} (17)

where the original semiring map is w:𝕏𝕊w:\mathbb{X}\to\mathbb{S} and the constraint map is v:𝕏𝕄v:\mathbb{X}\to\mathbb{M}.

Finally, to obtain the solution to sconss_{\mathrm{cons}}^{*} to (8), we need to project the vector lifted over 𝕄\mathbb{M}, onto 𝔹\mathbb{B} using the projection function π𝒮,a:(𝕄𝕊)𝕊\pi^{\mathcal{S},a}:\left(\mathbb{M}\to\mathbb{S}\right)\to\mathbb{S} partially applied to semiring 𝒮\mathcal{S} operators and identities and acceptance criteria aa,

π𝒮,a(x)=m𝕄:a(m)xm.\pi^{\mathcal{S},a}\left(x\right)=\bigoplus_{m^{\prime}\in\mathbb{M}:a\left(m^{\prime}\right)}x_{m^{\prime}}. (18)

This yields all the ingredients to define a theorem which we call semiring constrained fusion,

scons=g𝒮,w(ϕ,v,a(f𝒢,w))=π𝒮,a(f𝒮,w).s_{\mathrm{cons}}^{*}=g^{\mathcal{S},w}\left(\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right)\right)=\pi^{\mathcal{S},a}\left(f^{\mathcal{S}\mathcal{M},w_{\mathcal{M}}}\right). (19)

To summarize (19), on the left hand side, f𝒢,wf^{\mathcal{G},w^{\prime}} exhaustively computes configurations in 𝕃\mathbb{L}; ϕ,v,a\phi^{\mathcal{M},v,a} filters away any configurations which do not satisfy the constraint cc implemented by vv and aa over the constraint algebra \mathcal{M} and g𝒮,wg^{\mathcal{S},w} evaluates the remaining configurations in 𝕃\mathbb{L} with the semiring 𝒮\mathcal{S} using the mapping ww. On the right hand side, fS,w:(𝕄𝕊)f^{S\mathcal{M},w_{\mathcal{M}}}:\left(\mathbb{M}\to\mathbb{S}\right) (fully) applied to lifted semiring’s 𝒮\mathcal{SM}’s operators and identities and lifted mapping function ww_{\mathcal{M}}, first maps the element 𝕏\mathbb{X} into lifted semiring values using ww and lifts them over the constraint set using vv. It then evaluates all configurations in 𝕃\mathbb{L}, for every value of the constraint set 𝕄\mathbb{M}, using the lifted semiring 𝒮\mathcal{S}\mathcal{M}. Finally, π𝒮,a:(𝕄𝕊)𝕊\pi^{\mathcal{S},a}:\left(\mathbb{M}\to\mathbb{S}\right)\to\mathbb{S} projects this lifted computation back down onto the plain semiring 𝒮\mathcal{S}.

The proof of (19) result is given in Appendix B: Constraint lifting proofs. To give some informal intuition into the steps in this proof, we show that (15) is a semiring and then using a change of variables argument, show how the projection is obtained. We then demonstrate that the composition of the filtering with the semiring homomorphism, can be fused into a new homomorphism over the constraint-lifted semiring. This allows us to use our already established result (19) to show that the composition of the exhaustive configuration generator followed by the filtering, is equivalent to evaluation of these configurations in the lifted semiring, followed by a projection.

We raise a couple of important comments about this theorem here:

  1. 1.

    The effect on the computational and memory complexity of the corresponding unconstrained algorithm (8), is quite predictable. For most practical computational semirings (e.g. max-plus or sum-product) the operators ,\oplus,\otimes will be O(1)O\left(1\right). In this setting, for each value of m𝕄m\in\mathbb{M}, the binary operator \oplus_{\mathcal{M}} is O(1)O\left(1\right), and the operator \otimes_{\mathcal{M}} is O(M2)O\left(M^{2}\right). Thus, in general, applying a constraint increases the worst-case computational complexity of an existing polymorphic semiring generator function f𝒮,wf^{\mathcal{S},w} by a multiplicative factor O(M3)O\left(M^{3}\right). In terms of memory, lifting requires storing MM values per configuration, therefore the memory complexity increases multiplicatively by a factor O(M)O\left(M\right).

  2. 2.

    Lifting is intimately related to the design of DP algorithms found in textbooks, in the following way. A widely stated, but intuitive observation, is that designing practical DP algorithms boils down to identifying a structural decomposition which makes frequent re-use of sub-problems (Kleinberg and Tardos, 2005). This design principle is easy to state, but often quite tricky to apply in practice, as it can depend upon a serendipitous discovery of the right way to parameterize the problem. However, implicit to the definition of the constraint operator \odot for constraints c(l)c\left(l\right) which can be implemented in separable algebraic form, is the relationship that solutions for different values of the constraint algebra have with each other. The lifted product in (15) combines all solutions at every value of the constraint. However, for each m𝕄m\in\mathbb{M}, the condition mm′′=mm^{\prime}\odot m^{\prime\prime}=m in the product partitions the solutions in a way which determines how the DP sub-problems should be combined. In other words, this partitioning, coupled with the pairwise summation, determines the dependency structure of configurations l𝕃l\in\mathbb{L}. For example, in the case of the simple natural number addition, m+m′′=mm^{\prime}+m^{\prime\prime}=m, to find the sub-problem for the case m=5m=5 requires us to combine all sub-problems (m,m′′)=(1,4)\left(m^{\prime},m^{\prime\prime}\right)=\left(1,4\right), (2,3)\left(2,3\right), (3,2)\left(3,2\right) and (4,1)\left(4,1\right), together. Interestingly, this also demonstrates that DP decompositions can be performed in ways that are much more general than the fairly limited descriptions of combining “smaller”, self-similar problems. Indeed, it is useful to think of DP decomposition as arising from a partitioning of the space of the constraint value under the constraint operator \odot, into subsets of pairs of sub-problems for a given value of m𝕄m\in\mathbb{M}.

2.3 Simplifying the constraint algebra

The main problem with the construction (19) is that the direct computation of xyx\otimes_{\mathcal{M}}y is quadratic in the size of 𝕄\mathbb{M}. This is not a problem for small lifting sets, but for many practical problems we want to apply constraints which can take on a potentially large set of values, which makes the naive application of constraint lifting computationally inefficient. We also know that it is often possible to come up with hand-crafted DP algorithms which are more efficient than this. We can, however, substantially improve on this quadratic dependence by noting that for many semiring polymorphic configuration generators f𝒮,w:𝕊f^{\mathcal{S},w}:\mathbb{N}\to\mathbb{S} given as DP (Bellman) recursions, we need to compute terms of the form uw(x)u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(x\right) for some general u:𝕄𝕊u:\mathbb{M}\text{$\to\mathbb{S}$}. Since the lifted mapping function w(x)miw_{\mathcal{M}}\left(x\right)_{m}\neq i_{\oplus} only for one value, m′′=v(x)m^{\prime\prime}=v\left(x\right), we can simplify the double summation to a single one,

(uw(x))m\displaystyle\left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(x\right)\right)_{m} =mm′′=mm,m′′𝕄(umw(x)m′′)\displaystyle=\bigoplus_{\begin{subarray}{c}m^{\prime}\odot m^{\prime\prime}=m\\ \forall m^{\prime},m^{\prime\prime}\in\mathbb{M}\end{subarray}}\left(u_{m^{\prime}}\otimes w_{\mathcal{M}}\left(x\right)_{m^{\prime\prime}}\right) (20)
=(m𝕄:mv(x)=mum)w(x).\displaystyle=\left(\bigoplus_{\begin{subarray}{c}m^{\prime}\in\mathbb{M}:m^{\prime}\odot v\left(x\right)=m\end{subarray}}u_{m^{\prime}}\right)\otimes w\left(x\right).

Because the operator \odot does not necessarily have inverses, solutions m𝕄m^{\prime}\in\mathbb{M} to the equation mv(x)=mm^{\prime}\odot v\left(x\right)=m are not necessarily unique. However, we can flip this around and instead explicitly compute m=mv(x)m=m^{\prime}\odot v\left(x\right) for each m𝕄m^{\prime}\in\mathbb{M}. This leads to an obvious iterative algorithm,

z\displaystyle z i\displaystyle\mapsfrom i_{\oplus_{\mathcal{M}}} (21)
zmv(x)\displaystyle z_{m\odot v\left(x\right)} zmv(x)(umw(x))m𝕄,\displaystyle\mapsfrom z_{m\odot v\left(x\right)}\oplus\left(u_{m}\otimes w\left(x\right)\right)\quad\forall m\in\mathbb{M},

to obtain uw(x)=zu\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(x\right)=z at the end of the iteration; here the arrow \mapsfrom denotes setting the variable on the left, to the value on the right, possibly overwriting the previous value of the variable. Thus the product (20) is an inherently O(M)O\left(M\right) operation. As a result, semiring polymorphic generators modified to produce constrained configurations will have worst-case multiplicative increase in time complexity of O(M2)O\left(M^{2}\right), if the semiring product \otimes appears in the generator only in terms of the form uw(x)u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(x\right). This is true, for instance of all algebraic path problems (Huang, 2008) and we will encounter several other examples below.

If, additionally, the algebra \mathcal{M} has inverses (for example, if the algebra is a group), on fixing mm and mm^{\prime}, there is a unique (and often analytical) solution to mm′′=mm^{\prime}\odot m^{\prime\prime}=m which we can write as m′′=(m)1mm^{\prime\prime}=\left(m^{\prime}\right)^{-1}\odot m. This also allows us to simplify the lifted semiring product to the O(M)O\left(M\right) computation,

(xy)m=m𝕄(xmy(m)1m).\left(x\otimes_{\mathcal{M}}y\right)_{m}=\bigoplus_{m^{\prime}\in\mathbb{M}}\left(x_{m^{\prime}}\otimes y_{\left(m^{\prime}\right)^{-1}\odot m}\right). (22)

Note that we often have finite groups where we are not interested in defining inverses for all elements, for example where we need y(m)1my_{\left(m^{\prime}\right)^{-1}\odot m} but (m)1m𝕄\left(m^{\prime}\right)^{-1}\odot m\notin\mathbb{M}. In that case, setting y(m)1m=iy_{\left(m^{\prime}\right)^{-1}\odot m}=i_{\oplus} suffices to appropriately truncate the above product.

For such group lifting algebras, terms of the form uw(x)u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(x\right) simplify even further. We can solve mv(x)=mm^{\prime}\odot v\left(x\right)=m uniquely to find m=v(x)1mm^{\prime}=v\left(x\right)^{-1}\odot m, so that the product (20) can, in this situation, now be computed as,

(uw(x))m\displaystyle\left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(x\right)\right)_{m} ={imv(x)1𝕄umv(x)1w(x)otherwise,\displaystyle=\begin{cases}i_{\oplus}&m\odot v\left(x\right)^{-1}\notin\mathbb{M}\\ u_{m\odot v\left(x\right)^{-1}}\otimes w\left(x\right)&\textrm{otherwise},\end{cases} (23)

which is an O(1)O\left(1\right) time operation. Thus, semiring polymorphic configuration generators can be modified to produce constrained configurations with multiplicative time complexity increase of only O(M)O\left(M\right), if constraints are expressible in terms of a group algebra. Some examples of useful, simplified constraint algebras are listed in Appendix D: Some useful constraint algebras.

2.4 Taking stock: worked example

Let us pause to examine how to apply the preceding theory. Consider the problem of finding the value of the minimum sum subsequence of a list (a subsequence being a sublist of generally non-consecutive elements of a list). We can specify this simple, unconstrained combinatorial optimization problem as,

ssubs=minl𝕃xlx,s_{\text{subs}}^{*}=\min_{l\in\mathbb{L}}\underset{x\in l}{\sum}x, (24)

where the configuration set 𝕃\mathbb{L} contains all possible subsequences of a list, of which there are 2N2^{N} for lists of length NN. For instance, the subsequences of the list [2,1,8]\left[-2,1,8\right] are

𝕃={[],[2],[1],[8],[2,1],[2,8],[1,8],[2,1,8]}.\mathbb{L}=\left\{\left[\,\right],\left[-2\right],\left[1\right],\left[8\right],\left[-2,1\right],\left[-2,8\right],\left[1,8\right],\left[-2,1,8\right]\right\}. (25)

This is a specification (1) over the min-plus semiring =(,min,+,,0)\mathcal{R}=\left(\mathbb{R},\min,+,\infty,0\right). For a given a length NN list l:𝕏l:\mathbb{N}\to\mathbb{X} indexed as lnl_{n}, a simple, semiring polymorphic generator is given by the following DP Bellman recursion,

f0𝒮,w\displaystyle f_{0}^{\mathcal{S},w} =i\displaystyle=i_{\otimes} (26)
fn𝒮,w\displaystyle f_{n}^{\mathcal{S},w} =fn1𝒮,w(iw(ln))n{1,2,,N},\displaystyle=f_{n-1}^{\mathcal{S},w}\otimes\left(i_{\otimes}\oplus w\left(l_{n}\right)\right)\quad\forall n\in\left\{1,2,\ldots,N\right\},

where w:𝕏𝕊w:\mathbb{X}\to\mathbb{S} embeds elements of some given list ll to 𝒮\mathcal{S}-semiring values. Semiring fusion (8) tells us that ssubs=fN,ids_{\text{subs}}^{*}=f_{N}^{\mathcal{R},id} given by

f0,id\displaystyle f_{0}^{\mathcal{R},id} =0\displaystyle=0 (27)
fn,id\displaystyle f_{n}^{\mathcal{R},id} =fn1,id+min(0,ln)n{1,2,,N},\displaystyle=f_{n-1}^{\mathcal{R},id}+\min\left(0,l_{n}\right)\quad\forall n\in\left\{1,2,\ldots,N\right\},

is a correct algorithm for solving problem (24) exactly in O(N)O\left(N\right) time complexity, even though the exhaustive solution would require generating and evaluating 2N2^{N} subsequences requiring exponential time.

Now, assume that, rather than subsequences, we are interested in combinations of elements of a list (that is, fixed-size subsequences). We can define a constrained subsequence problem as

scombs=minl𝕃:#(l)=Mxlx,s_{\text{combs}}^{*}=\min_{l\in\mathbb{L}:\#\left(l\right)=M}\underset{x\in l}{\sum}x, (28)

where the constraint condition c(l)=(#(l)=M)c\left(l\right)=\left(\#\left(l\right)=M\right) receives a subsequence, and evaluates to true if that subsequence has the given length MM. For instance, if M=2M=2, the configurations 𝕃\mathbb{L} constrained by sublist length for list [2,1,8]\left[-2,1,8\right] is the set

𝕃={l𝕃:#(l)=2}={[2,1],[2,8],[1,8]}.\mathbb{L}^{\prime}=\left\{l\in\mathbb{L}:\#\left(l\right)=2\right\}=\left\{\left[-2,1\right],\left[-2,8\right],\left[1,8\right]\right\}. (29)

This sequence length constraint can be formulated using the lifting algebra =({1,,M},+,0)\mathcal{M}=\left(\left\{1,\ldots,M\right\},+,0\right), constraint mapping function v(n)=1v\left(n\right)=1 and acceptance criteria a(m)=Ta\left(m\right)=T if m=Mm=M and FF otherwise. Inserting the semiring 𝒮\mathcal{SM}’s operators and lifted mapping ww_{\mathcal{M}} into the semiring polymorphic generator recursion (26), we obtain,

f0,m\displaystyle f_{0,m} =(i)m\displaystyle=\left(i_{\otimes_{\mathcal{M}}}\right)_{m} (30)
fn,m\displaystyle f_{n,m} =(fn1(iw(n)))m,\displaystyle=\left(f_{n-1}\otimes_{\mathcal{M}}\left(i_{\otimes_{\mathcal{M}}}\oplus_{\mathcal{M}}w_{\mathcal{M}}\left(n\right)\right)\right)_{m},

where we suppress fn,mf_{n,m} ’s superscripts for clarity. The first line above simplifies to,

f0,m={im=0iotherwise,f_{0,m}=\begin{cases}i_{\otimes}&m=0\\ i_{\oplus}&\textrm{otherwise,}\end{cases} (31)

and the second line can be simplified as follows,

fn,m\displaystyle f_{n,m} =(fn1fn1w(n))m\displaystyle=\left(f_{n-1}\oplus_{\mathcal{M}}f_{n-1}\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(n\right)\right)_{m} (32)
=fn1,m(fn1w(n))m\displaystyle=f_{n-1,m}\oplus\left(f_{n-1}\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(n\right)\right)_{m}
=fn1,m{im1𝕄fn1,m1w(n)otherwise\displaystyle=f_{n-1,m}\oplus\begin{cases}i_{\oplus}&m-1\notin\mathbb{M}\\ f_{n-1,m-1}\otimes w\left(n\right)&\textrm{otherwise}\end{cases}
={fn1,0m=0fn1,m(fn1,m1w(n))otherwise,\displaystyle=\begin{cases}f_{n-1,0}&m=0\\ f_{n-1,m}\oplus\left(f_{n-1,m-1}\otimes w\left(n\right)\right)&\textrm{otherwise},\end{cases}

for all n{1,2,,N}n\in\left\{1,2,\ldots,N\right\} and m{1,2,,M}m\in\left\{1,2,\ldots,M\right\}. Explicitly, by invoking constrained semiring fusion (19), in the min-plus semiring \mathcal{R} with w=idw=id this is,

f0,m,id\displaystyle f_{0,m}^{\mathcal{R},id} ={0m=0otherwise\displaystyle=\begin{cases}0&m=0\\ \infty&\textrm{otherwise}\end{cases} (33)
fn,m,id\displaystyle f_{n,m}^{\mathcal{R},id} ={fn1,0,idm=0min(fn1,m,id,fn1,m1,id+ln)otherwise.\displaystyle=\begin{cases}f_{n-1,0}^{\mathcal{R},id}&m=0\\ \min\left(f_{n-1,m}^{\mathcal{R},id},f_{n-1,m-1}^{\mathcal{R},id}+l_{n}\right)&\textrm{otherwise}.\end{cases}

which is a simple O(NM)O\left(N\,M\right), provably correct algorithm for solving 28 through computing scombs=π,a(fNR,id)=fN,M,ids_{\text{combs}}^{*}=\pi^{\mathcal{R},a}\left(f_{N}^{R,id}\right)=f_{N,M}^{\mathcal{R},id} using the recursion (33) above.

It is instructive to compare this systematically derived algorithm to the textbook presentation of similar DP algorithms such as the quasi-polynomial knapsack problem (Kleinberg and Tardos, 2005; Emoto et al., 2012). We have obtained this DP algorithm by starting from a specification and by provably correct derivation steps, arrived at the new, computationally efficient recurrence above which solves the constrained problem. Often, the solutions obtained this way resemble hand-coded DP algorithms which involve ad-hoc and specific reasoning where we have to resort to special case analysis to demonstrate correctness and computational complexity, after the algorithm is coded.

2.5 Tupling semirings to avoid backtracing

The above cases have demonstrated the use of arbitrary semirings where some scalar-valued, numerical solution is required. It is often the case for optimization problems (involving the use of selection semirings such min-plus, \mathcal{R}) that we also want to know which solutions lead to the optimal (semiring) value. The usual solution to this (given in most DP literature) is backtracing, which retains a list of decisions at each stage and a series of “back pointers” to the previous decision, and then recovers the unknown decisions by following the sequence of pointers backwards.

In fact, there is an alternative and conceptually simpler solution made possible with the use of appropriate semirings. In particular we will focus on the generator semiring 𝒢\mathcal{G}. We can always exploit what is known as the tupling trick to compute two different semirings simultaneously (Bird and de Moor, 1996). If we map the semiring values used during the DP computations inside a pair (𝕊,{[𝕊]})\left(\mathbb{S},\left\{\left[\mathbb{S}\right]\right\}\right), then we can simultaneously update a semiring total while retaining the values selected in that stage. For example, the arg-max-plus selection, also known as the Viterbi, semiring (Goodman, 1999; Emoto et al., 2012),

𝒮×𝒢=(𝕊×{[𝕊]},,,(,),(0,{[]})),\mathcal{S}\times\mathcal{G}=\left(\mathbb{S}\times\left\{\left[\mathbb{S}\right]\right\},\oplus,\otimes,\left(-\infty,\emptyset\right),\left(0,\left\{\left[\,\right]\right\}\right)\right), (34)

has operators given explicitly by,

(u,x)(v,y)\displaystyle\left(u,x\right)\oplus\left(v,y\right) ={(u,x)u>v(v,y)u<v(u,xy)otherwise\displaystyle=\begin{cases}\left(u,x\right)&u>v\\ \left(v,y\right)&u<v\\ \left(u,x\cup y\right)&\textrm{otherwise}\end{cases} (35)
(u,x)(v,y)\displaystyle\left(u,x\right)\otimes\left(v,y\right) =(u+v,xy),\displaystyle=\left(u+v,x\circ y\right),

with identities i=(,)i_{\oplus}=\left(-\infty,\emptyset\right) and i=(0,{[]})i_{\otimes}=\left(0,\left\{\left[\,\right]\right\}\right). Furthermore, it is straightforward to construct a semiring which extends the Viterbi semiring by maintaining a ranked list of optima, i.e. computing the top kk optimal solutions, not merely the single highest scoring one (Goodman, 1999).

In most practical situations, for instance, where the mapping function in the DP problem is real-valued w:𝕏w:\mathbb{X}\to\mathbb{R} and thus have effectively zero probability of not being unique, there will only be a single, rather than potentially multiple, optimal solutions. In that case, we can remove the ambiguities in the selection with a simpler semiring,

𝒮×𝒢=(𝕊×[𝕊],,,(,),(0,[])),\mathcal{S}\times\mathcal{G}=\left(\mathbb{S}\times\left[\mathbb{S}\right],\oplus,\otimes,\left(-\infty,\emptyset\right),\left(0,\left[\,\right]\right)\right), (36)

with operators,

(u,x)(v,y)\displaystyle\left(u,x\right)\oplus\left(v,y\right) ={(u,x)uv(u,y)u<v\displaystyle=\begin{cases}\left(u,x\right)&u\geq v\\ \left(u,y\right)&u<v\end{cases} (37)
(u,x)(v,y)\displaystyle\left(u,x\right)\otimes\left(v,y\right) =(u+v,xy).\displaystyle=\left(u+v,x\cup y\right).

Clearly, the semiring 𝒮×𝒢\mathcal{S}\times\mathcal{G} is the tupling of max-plus with 𝒢\mathcal{G} in such a way as to compute both the value of the optimal solution alongside the values used to compute it.

Backtracing and the simple (Viterbi) tupled semiring will usually be similar in terms of computational complexity. Any particular, practical implementation may well require a more detailed investigation of the specifics of the particular data structures in the DP recurrence and the software and/or hardware platforms involved. For instance, while semiring tupling does require list concatenation and array structures could certainly pose a complexity issue, when implemented instead as linked lists this concatenation takes O(1)O\left(1\right) time, and indeed in practice, DP algorithm recurrences derived using the methods described here reduce to a sequence of list append operations, O(1)O\left(1\right) each for linked lists.

However, the specifics of lower-level implementation complexity are incidental: the overall argument is one of high-level conceptual clarity and systematic derivation, since the tupling construction requires no modification to the algebraic derivations given here. For instance, backtracing requires a way to traverse the DP decisions correctly in the reverse order, which will be special to each DP problem. With tupled semirings and semiring polymorphic generator functions, all that is required is to change the semiring of the DP generator algorithm function as described above, we do not need to know anything about how the generator algorithm works. Additionally, tupling semirings is trivially compatible with any set of multiple lifting constraints where hand-coding of special implementations of backtracking through the same set of multiple interacting constraints, would be error-prone.

2.6 Constructing new DP algorithms from old

The above sections have explained how to specify a combinatorial problem in an arbitrary semiring and how to derive an efficient algorithm to solve it. It has demonstrated how the use of semiring lifting allows us to modify an existing DP algorithm specification with a constraint, which can then be solved efficiently using constrained semiring fusion (19). Here, we show how this gives us a way of creating new, semiring polymorphic DP generators from existing ones such as f𝒮,wf^{\mathcal{S},w}. To see this, note that the semiring homomorphism g𝒢,wg^{\mathcal{G},w^{\prime}} with w(x)={[x]}w^{\prime}\left(x\right)=\left\{\left[x\right]\right\} is the identity homomorphism id𝒢:{[𝕏]}{[𝕏]}id^{\mathcal{G}}:\left\{\left[\mathbb{X}\right]\right\}\to\left\{\left[\mathbb{X}\right]\right\} for values in the semiring 𝒢\mathcal{G}, i.e. it maps sets of lists, into the same sets of lists unchanged. Thus we obtain,

ϕ,v,a(f𝒢,w)\displaystyle\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right) =id𝒢(ϕ,v,a(f𝒢,w))\displaystyle=id^{\mathcal{G}}\left(\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right)\right) (38)
=g𝒢,w(ϕ,v,a(f𝒢,w))\displaystyle=g^{\mathcal{G},w^{\prime}}\left(\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right)\right)
=π𝒢,a(f𝒢,w)\displaystyle=\pi^{\mathcal{G},a}\left(f^{\mathcal{G}\mathcal{M},w_{\mathcal{M}}^{\prime}}\right)

where the last step invokes (19). We have shown that ϕ,v,a(f𝒢,w)=π𝒢,a(f𝒢,w)\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right)=\pi^{\mathcal{G},a}\left(f^{\mathcal{G}\mathcal{M},w_{\mathcal{M}}^{\prime}}\right), which we denote by f𝒢,w=ϕ,v,a(f𝒢,w)f^{\prime\mathcal{G},w^{\prime}}=\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right), this is the generator which computes a constrained configuration set. Now, fixing \mathcal{M}, acceptance criteria aa and constraint map vv, we notice that f𝒢,wf^{\prime\mathcal{G},w^{\prime}} depends only upon the semiring 𝒢\mathcal{G} and map ww^{\prime}. Thus it, too, is semiring polymorphic since we can replace the 𝒢\mathcal{G} and ww^{\prime} in ff^{\prime} with any arbitrary semiring 𝒮\mathcal{S} and mapping ww to obtain f𝒮,wf^{\prime\mathcal{S},w}. It follows that we can consider f𝒮,wf^{\prime\mathcal{S},w} as a new semiring polymorphic generator function derived from the existing generator, f𝒮,wf^{\mathcal{S},w}. This implies, in particular:

  1. 1.

    If f𝒮,wf^{\mathcal{S},w} is the generator of 𝕃\mathbb{L} for the problem s=l𝕃xlw(x)s^{*}=\bigoplus_{l\in\mathbb{L}}\bigotimes_{x\in l}w\left(x\right) in (1), then the new, derived polymorphic generator f𝒮,wf^{\prime\mathcal{S},w} is the generator for the constraint augmented problem s=l𝕃xlw(x)s^{\prime*}=\bigoplus_{l\in\mathbb{L}^{\prime}}\bigotimes_{x\in l}w\left(x\right) where 𝕃={l𝕃:c(l)}\mathbb{L}^{\prime}=\left\{l\in\mathbb{L}:c\left(l\right)\right\} and cc is implemented by \mathcal{M}, vv and aa which are fixed in (implicit to) f𝒮,wf^{\prime\mathcal{S},w}. This, of course, just a way of expressing an algorithm for efficiently computing sconss_{\mathrm{cons}}^{*} in terms of an algorithm for efficiently computing ss^{*} which is implicit to the constrained semiring fusion theorem (19).

  2. 2.

    Being semiring polymorphic, this new generator function f𝒮,wf^{\prime\mathcal{S},w} satisfies all the conditions of semiring fusion (8), i.e. with an arbitrary semiring homomorphism g𝒮,wg^{\mathcal{S},w}, we have g𝒮,w(f𝒢,w)=f𝒮,wg^{\mathcal{S},w}\left(f^{\prime\mathcal{G},w^{\prime}}\right)=f^{\prime\mathcal{S},w}.

  3. 3.

    We can repeat this process of augmenting an existing specification solved by use of a semiring polymorphic generator to derive novel, polymorphic DP algorithm with multiple, simultaneous constraints. This is possible, essentially, because lifting can always be “nested”, i.e. lifted semirings can themselves be lifted.

These implications have practical consequences in applications, which we now illustrate. Above, we demonstrated how to derive an algorithm for the min-plus problem over subsequence combinations (33) from the min-plus problem over subsequences, by semiring lifting applied to the polymorphic generator for subsequences, (26) which we repeat here for convenience,

f0𝒮,w\displaystyle f_{0}^{\mathcal{S},w} =i\displaystyle=i_{\otimes} (39)
fn𝒮,w\displaystyle f_{n}^{\mathcal{S},w} =fn1𝒮,w(iw(n))n{1,2,,N},\displaystyle=f_{n-1}^{\mathcal{S},w}\otimes\left(i_{\otimes}\oplus w\left(n\right)\right)\quad\forall n\in\left\{1,2,\ldots,N\right\},

which is known as an efficient DP algorithm for solving the specification ssubspoly=l𝕃xlw(x)s_{\text{subspoly}}^{*}=\bigoplus_{l\in\mathbb{L}}\bigotimes_{x\in l}w\left(x\right) over an arbitrary semiring 𝒮\mathcal{S} and mapping ww by computing ssubspoly=fN𝒮,ws_{\text{subspoly}}^{*}=f_{N}^{\mathcal{S},w}. The resulting polymorphic generator of combinations, (30), is a straightforward DP Bellman recursion for solving specifications over subsequence combinations of length MM. For semirings wherein the operators can be evaluated in constant time, this has O(NM)O\left(N\,M\right) time complexity. It is interesting and useful in its own right so we provide a pseudocode implementation here, Algorithm 1.

  • function polycombs(,,i,i,w,N,M\oplus,\otimes,i_{\oplus},i_{\otimes},w,N,M)

    f[0,0]=if\left[0,0\right]=i_{\otimes}

    f[0,1M]=if\left[0,1\ldots M\right]=i_{\oplus}

    for n=1Nn=1\ldots N

    • for m=0Mm=0\ldots M

      if m=0m=0

      • f[n,m]=f[n1,0]f\left[n,m\right]=f\left[n-1,0\right]

      else

      • f[n,m]=f[n1,m]f[n1,m1]w(n)f\left[n,m\right]=f\left[n-1,m\right]\oplus f\left[n-1,m-1\right]\otimes w\left(n\right)

    return f[N,M]f\left[N,M\right]

Algorithm 1 Procedural pseudocode implementation of a semiring polymorphic DP Bellman recursion for subsequence combinations, derived systematically from a polymorphic subsequence recurrence using constraint lifting and algebraic simplifications described in the text.

As a somewhat more complex example, for some applications, there is a need to perform computations over non-empty subsequences, that is subsequences without the empty sub-sequence {}\left\{\emptyset\right\}. Analogous to the subsequence problem, we can define the non-empty subsequence problem in an arbitrary semiring as,

ssubsnepoly=l𝕃:lxlw(x),s_{\text{subsnepoly}}^{*}=\bigoplus_{l\in\mathbb{L}:l\neq\emptyset}\underset{x\in l}{\bigotimes}w\left(x\right), (40)

where the constraint function c(l)=(l)c\left(l\right)=\left(l\neq\emptyset\right) is true if the configuration is non-empty. For instance, for the sequence [1,2,3]\left[1,2,3\right], the set of non-empty subsequences consists of the set

𝕃={l𝕃:l}={[1],[2],[3],[1,2],[1,3],[2,3],[1,2,3]}\mathbb{L}^{\prime}=\left\{l\in\mathbb{L}:l\neq\emptyset\right\}=\left\{\left[1\right],\left[2\right],\left[3\right],\left[1,2\right],\left[1,3\right],\left[2,3\right],\left[1,2,3\right]\right\} (41)

which has size 2N12^{N}-1. To use semiring lifting we need a constraint algebra for c(l)c\left(l\right), for which we define an existence constraint algebra =(𝔹,,F)\mathcal{M}=\left(\mathbb{B},\vee,F\right) (see Appendix B: Constraint lifting proofs) with the constant constraint map v(n)=Tv\left(n\right)=T which partitions the set of subsequences generated by the recurrence (26), into empty m=Fm=F and non-empty m=Tm=T subsequences,

f0,m\displaystyle f_{0,m} ={im=Fim=T\displaystyle=\begin{cases}i_{\otimes}&m=F\\ i_{\oplus}&m=T\end{cases} (42)
fn,m\displaystyle f_{n,m} =(fn1fn1w(n))m\displaystyle=\left(f_{n-1}\oplus_{\mathcal{M}}f_{n-1}\otimes w_{\mathcal{M}}\left(n\right)\right)_{m}
=fn1{w(n)(fn1,Ffn1,T)m=Tim=F.\displaystyle=f_{n-1}\oplus\begin{cases}w\left(n\right)\otimes\left(f_{n-1,F}\oplus f_{n-1,T}\right)&m=T\\ i_{\oplus}&m=F.\end{cases}

suppressing the superscript dependence of ff here on 𝒮,w\mathcal{S},w for clarity. The last line can be rewritten,

fn,m\displaystyle f_{n,m} ={fn1,Tw(n)(fn1,Ffn1,T)m=Tfn1,Fm=F\displaystyle=\begin{cases}f_{n-1,T}\oplus w\left(n\right)\otimes\left(f_{n-1,F}\oplus f_{n-1,T}\right)&m=T\\ f_{n-1,F}&m=F\end{cases} (43)
={fn1,T(w(n)fn1,F)(w(n)fn1,T)m=Tim=F,\displaystyle=\begin{cases}f_{n-1,T}\oplus\left(w\left(n\right)\otimes f_{n-1,F}\right)\oplus\left(w\left(n\right)\otimes f_{n-1,T}\right)&m=T\\ i_{\otimes}&m=F,\end{cases}

so that fN,F=if_{N,F}=i_{\otimes} as expected in the empty subsequence case. Focusing on the case we want, fN,Tf_{N,T}, we have,

fn,T=fn1,T(w(n)fn1,T)w(n),\begin{aligned} f_{n,T}&=f_{n-1,T}\oplus\left(w\left(n\right)\otimes f_{n-1,T}\right)\oplus w\left(n\right)\end{aligned}, (44)

which, being expressed entirely in terms of the case m=Tm=T, allows us to ignore the lifting altogether to obtain the following semiring polymorphic generator for non-empty subsequences,

f0\displaystyle f_{0} =i\displaystyle=i_{\oplus} (45)
fn\displaystyle f_{n} =fn1(fn1w(n))w(n)n{1,2,,N}.\displaystyle=f_{n-1}\oplus\left(f_{n-1}\otimes w\left(n\right)\right)\oplus w\left(n\right)\quad\forall n\in\left\{1,2,\ldots,N\right\}.

which solves ssubsnepoly=fNs_{\text{subsnepoly}}^{*}=f_{N} and requires only O(N)O\left(N\right) steps.

We now build on this result further by augmenting this recurrence with additional constraints, to provide a novel class of algorithms for special kinds of non-empty subsequences. Algorithms derived in this subsection include solutions to the longest increasing subsequence problem, which occurs frequently in applications such as computational genomics (Zhang, 2003). This problem can be specified as,

slis=maxl𝕃:(l)inc(l)#(l),s_{\text{lis}}^{*}=\max_{l\in\mathbb{L}:\left(l\neq\emptyset\right)\land\textrm{inc}\left(l\right)}\#\left(l\right), (46)

where the constraint constraint inc(l)\textrm{inc}\left(l\right) returns true if ll is an increasing subsequence. This problem is in the form of (1) with semiring 𝒮=(,max,+,0,0)\mathcal{S}=\left(\mathbb{N},\max,+,0,0\right) with map w(x)=1w\left(x\right)=1 which counts the length of the list, ll. For example, for the sequence[1,5,2]\left[1,5,2\right], the constrained configuration set is

𝕃′′={l𝕃:(l)inc(l)}={[1],[2],[5],[1,2],[1,5]}\mathbb{L}^{\prime\prime}=\left\{l\in\mathbb{L}:\left(l\neq\emptyset\right)\land\textrm{inc}\left(l\right)\right\}=\left\{\left[1\right],\left[2\right],\left[5\right],\left[1,2\right],\left[1,5\right]\right\} (47)

and slis=2s_{\text{lis}}^{*}=2.

Starting from the non-empty subsequence recurrence (45) derived above, we can augment this with a constraint that the subsequence elements must be in an ordered chain according to some binary relation which we write xRyx\mathrm{R}y. For example the ordering x<yx<y holds that xx must be less than yy. Here, we require a somewhat more complex relation in which both sequence and the value must be ordered, so that we can define a lifting algebra using what we call a sequential-value ordering operator,

(i,x)(j,y)={(j,y)(i<j)(x<y)(,)otherwise,\left(i,x\right)\preceq\left(j,y\right)=\begin{cases}\left(j,y\right)&\left(i<j\right)\wedge\left(x<y\right)\\ \left(\infty,\infty\right)&\textrm{otherwise},\end{cases} (48)

over tuples 𝕄=(,)\mathbb{M}=\left(\mathbb{N},\mathbb{R}\right), where (,)=z\left(\infty,\infty\right)=z_{\preceq} is a special tuple which act like an annihilator or zero element. Operator \preceq is left but not right, associative and it does not have an identity, so, a lifting algebra =(𝕄,,z)\mathcal{M}=\left(\mathbb{M},\preceq,z_{\preceq}\right) using this operator, is not a “standard” algebra (such as a monoid, group or semigroup). The lack of identity means that it cannot be applied to empty sequences. Nonetheless, the acceptance criteria a(m)=Ta\left(m\right)=T if mzm\neq z_{\preceq} and TT otherwise, allows us to filter away non-empty subsequences which are not in sequentially increasing order, provided the operator is scanned across the subsequence in left-right order.

To apply this constraint, we can simplify the lifting algebra using this ordering operator

(uw(n))m={(m:𝕄:mmum)w(n)m=v(n)iotherwise.\left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(n\right)\right)_{m}=\begin{cases}\left(\oplus_{m^{\prime}:\mathbb{M}:m^{\prime}\preceq m}u_{m^{\prime}}\right)\otimes w\left(n\right)&m=v\left(n\right)\\ i_{\oplus}&\textrm{otherwise}.\end{cases} (49)

which, when substituted into (45), gives us

f0,m\displaystyle f_{0,m} =(i)m\displaystyle=\left(i_{\oplus_{\mathcal{M}}}\right)_{m} (50)
fn,m\displaystyle f_{n,m} =(fn1({(m:𝕄:mmfn1,m)w(n)m=v(n)iotherwise)w(n))m,\displaystyle=\left(f_{n-1}\oplus_{\mathcal{M}}\left(\begin{cases}\left(\oplus_{m^{\prime}:\mathbb{M}:m^{\prime}\preceq m}f_{n-1,m^{\prime}}\right)\otimes w\left(n\right)&m=v\left(n\right)\\ i_{\oplus}&\textrm{otherwise}\end{cases}\right)\oplus w_{\mathcal{M}}\left(n\right)\right)_{m},

for all n,m{1,2,,N}n,m\in\left\{1,2,\ldots,N\right\}. The first line simplifies to f0,m=if_{0,m}=i_{\oplus}, and the second line can be manipulated to obtain

f0,m\displaystyle f_{0,m} =i\displaystyle=i_{\oplus} (51)
fn,m\displaystyle f_{n,m} ={fn1,m(m:𝕄:mmfn1,m)w(n)w(n)m=v(n)fn1,motherwise.\displaystyle=\begin{cases}f_{n-1,m}\oplus\left(\oplus_{m^{\prime}:\mathbb{M}:m^{\prime}\preceq m}f_{n-1,m^{\prime}}\right)\otimes w\left(n\right)\oplus w\left(n\right)&m=v\left(n\right)\\ f_{n-1,m}&\textrm{otherwise}.\end{cases}

To implement this DP recurrence, we next need to choose the lifting set 𝕄\mathbb{M}. In this setting, we will typically have a unique (finite) list, e.g. one unique value unu_{n}\in\mathbb{R} per n{1,2,,N}n\in\left\{1,2,\ldots,N\right\}. Thus, the lifting set consists of the values from this set, e.g. 𝕄={(n,un),n{1,2,,N}}\mathbb{M}=\left\{\left(n,u_{n}\right),n\in\left\{1,2,\ldots,N\right\}\right\}, and the lift mapping functions merely index this set, e.g. v(n)=(n,un)v\left(n\right)=\left(n,u_{n}\right). Note that with this particular lifting set, there is a one-one mapping between nn and any m:𝕄m:\mathbb{M}, thus, we can reduce the lifting set to 𝕄={1,2,,N}\mathbb{M}=\left\{1,2,\ldots,N\right\} and lift mapping to v(n)=nv\left(n\right)=n, so that the ordering operator becomes,

ij={j(i<j)(ui<uj)otherwise.i\preceq j=\begin{cases}j&\left(i<j\right)\wedge\left(u_{i}<u_{j}\right)\\ \infty&\textrm{otherwise}.\end{cases} (52)

These reductions allow us to simplify the above recurrence 51 to

f0,m\displaystyle f_{0,m} =i\displaystyle=i_{\oplus} (53)
fn,n\displaystyle f_{n,n} =fn1,n(m{1,2,,n1}:(um<un)fn1,m)w(n)w(n)\displaystyle=f_{n-1,n}\oplus\left(\oplus_{m^{\prime}\in\left\{1,2,\ldots,n-1\right\}:\left(u_{m^{\prime}}<u_{n}\right)}f_{n-1,m^{\prime}}\right)\otimes w\left(n\right)\oplus w\left(n\right)
fn,m\displaystyle f_{n,m} =fn1,m.\displaystyle=f_{n-1,m}.

Finally, note that, the second line adds a constant term w(n)w\left(n\right) to each fn,nf_{n,n}, \oplus is associative, and the value of the first line is independent of mm, we can move this term from the second line to the first, leading to the following polymorphic DP recursion for increasing sequential subsequences,

f0,m\displaystyle f_{0,m} =w(m)\displaystyle=w\left(m\right) (54)
fn,n\displaystyle f_{n,n} =fn1,n(m{1,2,,n1}:(um<un)fn1,m)w(n)\displaystyle=f_{n-1,n}\oplus\left(\oplus_{m^{\prime}\in\left\{1,2,\ldots,n-1\right\}:\left(u_{m^{\prime}}<u_{n}\right)}f_{n-1,m^{\prime}}\right)\otimes w\left(n\right)
fn,m\displaystyle f_{n,m} =fn1,m,\displaystyle=f_{n-1,m},

with the projection π𝒮,a(fN)=m{1,2,,N}fN,m\pi^{\mathcal{S},a}\left(f_{N}\right)=\bigoplus_{m\in\left\{1,2,\ldots,N\right\}}f_{N,m}. In terms of computational complexity, the recurrence must be computed for all n,m{1,2,,N}n,m\in\left\{1,2,\ldots,N\right\} and the second requires O(N)O\left(N\right) steps. Note that, the third line does not change the value of fn,mf_{n,m} for mnm\neq n obtained at the previous iteration, so that, iterating over mm, only the term fn,nf_{n,n} needs updating in the second line. Thus, the algorithm requires O(N2)O\left(N^{2}\right) steps.

The longest increasing sub-sequences DP algorithm which computes sliss_{\mathrm{lis}}^{*} is obtained as a special case of (54) with the semiring 𝒮=(,max,+,0,0)\mathcal{S}=\left(\mathbb{N},\max,+,0,0\right) and the lift mapping w(n)=1w\left(n\right)=1,

f0,m\displaystyle f_{0,m} =1\displaystyle=1 (55)
fn,n\displaystyle f_{n,n} =max(fn1,n,maxm{1,2,,n1}:(um<un)fn1,m)+1\displaystyle=\max\left(f_{n-1,n},\max_{m^{\prime}\in\left\{1,2,\ldots,n-1\right\}:\left(u_{m^{\prime}}<u_{n}\right)}f_{n-1,m^{\prime}}\right)+1
fn,m\displaystyle f_{n,m} =fn1,m,\displaystyle=f_{n-1,m},

so that slis=π𝒮,a(fN)=maxm{1,2,,N}fN,ms_{\mathrm{lis}}^{*}=\pi^{\mathcal{S},a}\left(f_{N}\right)=\max_{m\in\left\{1,2,\ldots,N\right\}}f_{N,m}. Compared to existing, classical implementations of this algorithm in the literature (Zhang, 2003), we note that, the algebraic simplifications afforded by our approach makes it transparent that there is no need to perform NN semiring products \otimes in the second line, which may lead to computational savings in practice.

Whilst, for the longest increasing subsequences problem, there are somewhat more efficient algorithms which exploit the special structure of the problem, the generalized ordered subsequences DP algorithm derived here, (51), being polymorphic, can be applied to any arbitrary binary relation R\mathrm{R}:

xy={yxRyzotherwise.x\odot y=\begin{cases}y&x\mathrm{R}y\\ z_{\odot}&\textrm{otherwise}.\end{cases} (56)

For example, we immediately obtain an algorithm for semiring computations over all non-decreasing subsequences (ordering xyx\leq y), or, for subsequences consisting of sets, all subsequences ordered by inclusion, xyx\subseteq y.

3 Applications

In this section we will investigate some practical applications of the algebraic theory developed above.

3.1 Segmentation

A problem of perennial importance in statistics and signal processing is that of segmentation, or dividing up a sequence of data items or a time series yny_{n} for n{1,2,,N}n\in\left\{1,2,\ldots,N\right\}, into contiguous, non-overlapping intervals (segments) (i,j)\left(i,j\right) for i,j{1,2,,N}i,j\in\left\{1,2,\ldots,N\right\} with iji\leq j. Thus, (i,i)\left(i,i\right) is a segment of length one. An example is the problem of (1D) piecewise regression, which involves fitting a functional curve f(n,ci,j)f\left(n,c_{i,j}\right) to segments, and minimizing the sum of model fit errors E(l)=(i,j)lei,jE\left(l\right)=\sum_{\left(i,j\right)\in l}e_{i,j}, where ei,j=1pn=ij|ynf(n,ci,j)|pe_{i,j}=\frac{1}{p}\sum_{n=i}^{j}\left|y_{n}-f\left(n,c_{i,j}\right)\right|^{p} for p>0p>0 being the fit error within each interval (i,j)\left(i,j\right). The optimal model parameters ci,jc_{i,j} can be estimated using any statistical model-fitting procedure (Little, 2019). Thus, the input of the problem is the time series yny_{n}, and the output, after solving the DP problem, will be a set of indicators, from which a complete model of the time series can be recovered.

Similar to equation (1), we can specify the segmentation problem as a combinatorial optimization problem,

sseg=minl𝕃E(l),s_{\text{seg}}^{*}=\min_{l\in\mathbb{L}}E\left(l\right), (57)

where 𝕃\mathbb{L} is the all possible segmentation. For N=4N=4, the set of configurations is

𝕃=\displaystyle\mathbb{L}= {[(1,4)],[(1,1),(2,4)],[(1,2),(3,4)],[(1,3),(4,4)],\displaystyle\left\{\left[\left(1,4\right)\right],\left[\left(1,1\right),\left(2,4\right)\right],\left[\left(1,2\right),\left(3,4\right)\right],\left[\left(1,3\right),\left(4,4\right)\right],\right.
[(1,1),(2,3),(4,4)],[(1,1),(2,2),(3,4)],[(1,1),(2,2),(3,3),(4,4)]}.\displaystyle\left.\left[\left(1,1\right),\left(2,3\right),\left(4,4\right)\right],\left[\left(1,1\right),\left(2,2\right),\left(3,4\right)\right],\left[\left(1,1\right),\left(2,2\right),\left(3,3\right),\left(4,4\right)\right]\right\}. (58)

An O(N2)O\left(N^{2}\right) DP algorithm for this problem was devised by Richard Bellman as follows (Kleinberg and Tardos, 2005). The optimal segmentation ending at index jj can be obtained by combining all the “smaller” optimal segmentations (,i1)\left(\ldots,i-1\right) with the following segments (i,j)\left(i,j\right), for all i{1,2,,j}i\in\left\{1,2,\ldots,j\right\}. This gives rise to the following Bellman recursion,

f0\displaystyle f_{0} =0\displaystyle=0 (59)
fj\displaystyle f_{j} =mini{1,2,,j}(fi1+ei,j)j{1,2,,N}.\displaystyle=\min_{i\in\left\{1,2,\ldots,j\right\}}\left(f_{i-1}+e_{i,j}\right)\quad\forall j\in\left\{1,2,\ldots,N\right\}.

This recursion efficiently solves the problem (57), so that sseg=fNs_{\text{seg}}^{*}=f_{N}.

Using the theory in Section 2, by abstracting this recursion over an arbitrary semiring 𝒮\mathcal{S} and semiring map function ww, we obtain the polymorphic segmentation generator algorithm,

f0𝒮,w\displaystyle f_{0}^{\mathcal{S},w} =i\displaystyle=i_{\otimes} (60)
fj𝒮,w\displaystyle f_{j}^{\mathcal{S},w} =i{1,2,,j}(fi1𝒮,ww(i,j))j{1,2,,N}.\displaystyle=\bigoplus_{i\in\left\{1,2,\ldots,j\right\}}\left(f_{i-1}^{\mathcal{S},w}\otimes w\left(i,j\right)\right)\quad\forall j\in\left\{1,2,\ldots,N\right\}.

Using this polymorphic recursion, we can, for example, obtain the optimal configuration lsegl_{\mathrm{seg}}^{*} as well sseg=Es_{\text{seg}}^{*}=E^{*}, using the tupled selection semiring, see Section 2.5.

Since the segment fit errors are, generally, all non-negative, ei,j>0e_{i,j}>0 and shorter segments are typically more accurately modelled than larger segments (given the same model structure across segments), the problem as stated above usually has a “degenerate” optimal solution with only the ‘diagonal’ segments (i,i),i{1,2,,N}\left(i,i\right),i\in\left\{1,2,\ldots,N\right\} of length 1, selected. To avoid the collapse onto this degenerate solution, we can regularize the sum (Little, 2019),

ssegreg=minl𝕃(E(l)+λ#(l)),s_{\text{segreg}}^{*}=\min_{l\in\mathbb{L}}\left(E\left(l\right)+\lambda\#\left(l\right)\right), (61)

for the regularization constant λ>0\lambda>0 where #(l)=(i,j)l1\#\left(l\right)=\sum_{\left(i,j\right)\in l}1 counts the number of segments in the configuration ll. It is simple to modify the semiring polymorphic DP recursion (60) to include this regularization term simply by choosing the semiring mapping w(i,j)=ei,j+λw\left(i,j\right)=e_{i,j}+\lambda, so that ssegreg=fN,ws_{\text{segreg}}^{*}=f_{N}^{\mathcal{R},w} using the min-sum semiring \mathcal{R}.

While this regularization approach is simple, it does not offer much control over the segmentation quality, as the appropriate choice of the single parameter λ\lambda can be difficult to obtain. For example, some choices lead to over and under-fitting in different parts of the same signal, see Figure 2(a). Instead, a more effective level of control can be obtained by directly constraining the segmentation to a fixed number of segments, which we can express with the specification,

sseglen=minl𝕃:#(l)=LE(l).\begin{aligned} s_{\text{seglen}}^{*}&=\min_{l\in\mathbb{L}:\#\left(l\right)=L}E\left(l\right)\end{aligned}. (62)

This is a constrained semiring problem of the form (9) where c(l)=Tc\left(l\right)=T if the size of ll is LL, over the min-plus semiring \mathcal{R}. To apply semiring lifting, the constraint algebra needs to count the number of segments up to the fixed number of segments LL, which implies we need the lifting algebra =({1,2,,L},+,0)\mathcal{M}=\left(\left\{1,2,\ldots,L\right\},+,0\right) and constraint mapping function v(i,j)=1v\left(i,j\right)=1, with acceptance condition a(m)=Ta\left(m\right)=T if m=Lm=L. Next, inserting the corresponding lifted semiring into (60) we obtain,

f0,m𝒮,w\displaystyle f_{0,m}^{\mathcal{S},w} =(i)m\displaystyle=\left(i_{\otimes_{\mathcal{M}}}\right)_{m} (63)
fj,m𝒮,w\displaystyle f_{j,m}^{\mathcal{S},w} =(i{1,2,j} (fi1𝒮,ww(i,j)))mj{1,2,,N}.\displaystyle=\left(\underset{i\in\text{$\left\{1,2,\ldots j\right\}$ }}{\oplus_{\mathcal{M}}}\left(f_{i-1}^{\mathcal{S},w}\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(i,j\right)\right)\right)_{m}\quad\forall j\in\left\{1,2,\ldots,N\right\}.

The first line above simplifies to f0,m𝒮,w=if_{0,m}^{\mathcal{S},w}=i_{\otimes} for m=0m=0 and ii_{\oplus} otherwise, and the second line becomes,

fj,m𝒮,w\displaystyle f_{j,m}^{\mathcal{S},w} =i{1,2,,j}(fi1𝒮,ww(i,j))m\displaystyle=\bigoplus_{i\in\left\{1,2,\ldots,j\right\}}\left(f_{i-1}^{\mathcal{S},w}\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(i,j\right)\right)_{m} (64)
=i{1,2,,j}{im1𝕄fi1,m1𝒮,ww(i,j)otherwise\displaystyle=\bigoplus_{i\in\left\{1,2,\ldots,j\right\}}\begin{cases}i_{\oplus}&m-1\notin\mathbb{M}\\ f_{i-1,m-1}^{\mathcal{S},w}\otimes w\left(i,j\right)&\textrm{otherwise}\end{cases}
={im=0i{1,2,,j}fi1,m1𝒮,ww(i,j)otherwise,\displaystyle=\begin{cases}i_{\oplus}&m=0\\ \bigoplus_{i\in\left\{1,2,\ldots,j\right\}}f_{i-1,m-1}^{\mathcal{S},w}\otimes w\left(i,j\right)&\textrm{otherwise},\end{cases}

using the group product simplification (23) in the second step. Finally, to solve (62), we apply the acceptance condition in the min-sum semiring \mathcal{R}, to obtain

sseglen=π,a(fN,w)=fN,L,w,s_{\text{seglen}}^{*}=\pi^{\mathcal{R},a}\left(f_{N}^{\mathcal{R},w}\right)=f_{N,L}^{\mathcal{R},w}, (65)

which computes the result in O(N2L)O\left(N^{2}L\right) time with O(NL)O\left(N\,L\right) memory. In practice, this algorithm produces much more predictable results than the basic algorithm, see Figure 2(b). Interestingly, it is well-known in machine learning circles that the ubiquitous KK-means clustering problem (Little, 2019), which is computationally intractable for non-scalar data items and therefore approximated using heuristic algorithms, can be solved exactly using the algorithm derived above for scalar data (Gronlund et al., 2018). However, existing presentations in the literature are not formally proven correct and thus our presentation here is, to our knowledge, the first formally correct derivation from specification.

Furthermore, it is trivial to adapt the acceptance criteria aa above to e.g. solve constraints of the form L#(l)LL^{\prime}\leq\#\left(l\right)\leq L, giving an upper and lower bound on the number of segments, by modifying a(m)=Ta\left(m\right)=T when LmLL^{\prime}\leq m\leq L. The corresponding DP algorithm is derived from the specification as follows,

ssegrange\displaystyle s_{\text{segrange}}^{*} =minl𝕃:L#(l)LE(l)\displaystyle=\min_{l\in\mathbb{L}:L^{\prime}\leq\#\left(l\right)\leq L}E\left(l\right) (66)
=π,a(fN,w)\displaystyle=\pi^{\mathcal{R},a}\left(f_{N}^{\mathcal{R},w}\right)
=minLmLfN,m,w.\displaystyle=\min_{L^{\prime}\leq m\leq L}f_{N,m}^{\mathcal{R},w}.

where here, ff is the semiring polymorphic generator given in (64).

The segment count constraint above is fairly straightforward and has been (re)-invented in an ad-hoc manner before (Terzi and Tsaparas, 2006). We will next show how to derive a segmentation algorithm for segmentation problems with much more elaborate constraints which would be far more difficult to derive without systematic analytical tools such as we describe in this paper. While the segment count constraint is certainly very practical, there are other ways to control the segmentation since we may not know the number of segments in advance. The length, d(i,j)=ji+1d\left(i,j\right)=j-i+1, of each segment is a property of key practical importance. For example, it would be extremely useful in many applications to control the minimum length of each segment,

ssegminlen\displaystyle s_{\text{segminlen}}^{*} =minl𝕃:min(i,j)ld(i,j)=LE(l)\displaystyle=\min_{l\in\mathbb{L}:\min_{\left(i,j\right)\in l}d\left(i,j\right)=L}E\left(l\right) (67)

which is in the form of (9) using the constraint c(l)=Tc\left(l\right)=T if (minxld(x))=L\left(\min_{x\in l}d\left(x\right)\right)=L, i.e. the set of lengths of all the segments in ll is at least LL, over the min-sum semiring \mathcal{R}. Following the procedure above, we have the lifting algebra =({1,2,,N},min,N)\mathcal{M}=\left(\left\{1,2,\ldots,N\right\},\min,N\right) and lift mapping function v(i,j)=ji+1v\left(i,j\right)=j-i+1. For the semiring lifted segmentation recursion (63), the first line becomes,

f0,m={im=Niotherwise.f_{0,m}=\begin{cases}i_{\otimes}&m=N\\ i_{\oplus}&\textrm{otherwise}.\end{cases} (68)

We also need the product (20), which becomes,

(uw(i,j))m=(m{1,2,,N}min(m,d(i,j))=mum)w(i,j).\begin{aligned} \left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(i,j\right)\right)_{m}&=\left(\bigoplus_{\begin{subarray}{c}m^{\prime}\in\left\{1,2,\ldots,N\right\}\\ \min\left(m^{\prime},d\left(i,j\right)\right)=m\end{subarray}}u_{m^{\prime}}\right)\otimes w\left(i,j\right)\end{aligned}. (69)

This lifting algebra is a monoid without analytical (and unique) inverses, so, to make progress, we need to find an explicit expression for the set {min(m,d(i,j))=m}\left\{\min\left(m^{\prime},d\left(i,j\right)\right)=m\right\} for m{1,2,,N}m^{\prime}\in\left\{1,2,\ldots,N\right\}. There are three cases to consider,

{m:min(m,d(i,j))=m}\displaystyle\left\{m^{\prime}:\min\left(m^{\prime},d\left(i,j\right)\right)=m\right\} ={{m}m<d(i,j){m,m+1,,N}m=d(i,j),m>d(i,j)\displaystyle=\begin{cases}\left\{m\right\}&m<d\left(i,j\right)\\ \left\{m,m+1,\ldots,N\right\}&m=d\left(i,j\right),\\ \emptyset&m>d\left(i,j\right)\end{cases} (70)

and upon inserting this into the product above, we arrive at,

(uw(i,j))m\displaystyle\left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(i,j\right)\right)_{m} =({umm<d(i,j)m=mNumm=d(i,j)im>d(i,j))w(i,j)\displaystyle=\left(\begin{cases}u_{m}&m<d\left(i,j\right)\\ \oplus_{m^{\prime}=m}^{N}u_{m^{\prime}}&m=d\left(i,j\right)\\ i_{\oplus}&m>d\left(i,j\right)\end{cases}\right)\otimes w\left(i,j\right) (71)

so that the second line of the lifted segmentation recursion (63) can be simplified,

fj,m𝒮,w\displaystyle f_{j,m}^{\mathcal{S},w} =i{1,2,,j}(fi1w(i,j))m\displaystyle=\bigoplus_{i\in\left\{1,2,\ldots,j\right\}}\left(f_{i-1}\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(i,j\right)\right)_{m} (72)
=i{1,2,,j}({fi1,m𝒮,wm<d(i,j)m=mNfi1,m𝒮,wm=d(i,j)im>d(i,j))w(i,j)\displaystyle=\bigoplus_{i\in\left\{1,2,\ldots,j\right\}}\left(\begin{cases}f_{i-1,m}^{\mathcal{S},w}&m<d\left(i,j\right)\\ \oplus_{m^{\prime}=m}^{N}f_{i-1,m^{\prime}}^{\mathcal{S},w}&m=d\left(i,j\right)\\ i_{\oplus}&m>d\left(i,j\right)\end{cases}\right)\otimes w\left(i,j\right)
=i{1,2,,j}{fi1,m𝒮,ww(i,j)m<d(i,j)(m{m,m+1,,N}fi1,m𝒮,w)w(i,j)m=d(i,j)im>d(i,j)\displaystyle=\bigoplus_{i\in\left\{1,2,\ldots,j\right\}}\begin{cases}f_{i-1,m}^{\mathcal{S},w}\otimes w\left(i,j\right)&m<d\left(i,j\right)\\ \left(\oplus_{m^{\prime}\in\left\{m,m+1,\ldots,N\right\}}f_{i-1,m^{\prime}}^{\mathcal{S},w}\right)\otimes w\left(i,j\right)&m=d\left(i,j\right)\\ i_{\oplus}&m>d\left(i,j\right)\end{cases}

for all j{1,2,,N}j\in\left\{1,2,\ldots,N\right\}. Using the acceptance condition a(m)=Ta\left(m\right)=T if m=Lm=L we have an O(N3)O\left(N^{3}\right) time DP algorithm to find the required solution, ssegminlen=π,a(fN,w)=fN,L,ws_{\text{segminlen}}^{*}=\pi^{\mathcal{R},a}\left(f_{N}^{\mathcal{R},w}\right)=f_{N,L}^{\mathcal{R},w} using the derived recurrence above and min-plus semiring \mathcal{R}. As previously, a simple modification of the acceptance function a(m)a\left(m\right) allows, for example, computing optimal segmentations across a range Lmin(i,j)ld(i,j)LL^{\prime}\leq\min_{\left(i,j\right)\in l}d\left(i,j\right)\leq L of minimum segment lengths. Applied to the scalar KK-means problem, this modification would be a viable approach to avoiding the problem of degenerate clusters assigned few or no items (Little, 2019).

We find that constrained DP segmented regression, derived using the algebraic methods introduced here, usually produces very interpretable results, even for problems where the segmentation boundaries may be quite difficult to determine using other methods, particularly when the signal-to-noise ratio is low, see Figure 1. For example, methods such as L1 trend filtering (Kim et al., 2009) suffer from the problem that there is often no single, unambiguous segmentation, see for example Figure 1(d) and Figure 2(d). This is because it is better to consider such L1-based methods as smoothing algorithms arising from a convex relaxation of the combinatorial segmentation problem. This clearly shows the advantage of constrained, exact combinatorial optimization in applications such statistical time series analysis, made practical by the algebraic approach described in this paper.

Refer to caption
Figure 1: DP segmentation algorithms derived using our novel algebraic framework for solving constrained, 1D segmented, least-squares linear regression, applied to synthetic, piecewise linear time series with i.i.d. Gaussian noise, standard deviation σ\sigma. Input data yny_{n} (grey dots), underlying piecewise constant signal (grey line), segmentation result (red line). (a) Unconstrained segmentation with regularization λ=15\lambda=15, noise σ=15\sigma=15, (b) with fixed number segments L=3L=3, noise σ=30\sigma=30, (c) with minimum segment length M=70M=70, noise σ=60\sigma=60, and (d) for comparison, L1 trend filtering with regularization λ=103\lambda=10^{3}.
Refer to caption
Figure 2: DP segmentation algorithms derived using our novel algebraic framework for solving constrained, 1D segmented, least-squares linear regression, applied to a sample of logarithmically-transformed S&P500 financial index daily values. Input data yny_{n} (grey lines), segmentation result (red line). (a) Unconstrained segmentation with regularization λ=1.78×105\lambda=1.78\times 10^{-5} , (b) with fixed number segments L=4L=4, (c) with minimum segment length M=50M=50 days, and (d) for comparison, L1 trend filtering with regularization λ=100\lambda=100.

3.2 Sequence alignment

Our next application focus is sequence alignments, a problem of central importance to computational biology, natural language processing and signal processing. For example, in genomic sequence analysis, we are often interested in knowing how closely related two DNA or RNA base pair sequences are, and this can be assessed by computing the most plausible series of mutations (insertions and deletions) needed in order to bring the two sequences uu and vv into alignment. This alignment problem is specified as an optimal matching problem finding the minimum cost transformation from sequence uu into sequence vv,

salign=minl𝕃(i,j)lw(i,j),s_{\text{align}}^{*}=\min_{l\in\mathbb{L}}\underset{\left(i,j\right)\in l}{\sum}w\left(i,j\right), (73)

where configurations set 𝕃\mathbb{L} consists of all possible sequence transformations. These transformations are restricted to insertions, deletions or simple matches (no transformation required) for pair sequences at positions (i,j)\left(i,j\right) in the sequences, uiu_{i} and vjv_{j}. More specifically, an insert operation aligns the sequences at position pair (i,j)\left(i,j\right) with the pair at (i1,j)\left(i-1,j\right) incurring a cost w(0,j)w\left(0,j\right), a deletion aligns the sequences at position pair (i,j)\left(i,j\right) with the pair at (i,j1)\left(i,j-1\right) at cost w(i,0)w\left(i,0\right), and a match aligns the sequences at pair (i,j)\left(i,j\right) with the pair at (i1,j1)\left(i-1,j-1\right) incurring cost w(i,j)w\left(i,j\right). For example, for sequences of length #(u)=2\#\left(u\right)=2 and #(v)=1\#\left(v\right)=1, there are 5 possible ways of transforming uu into vv, so that the set of transformations is,

𝕃\displaystyle\mathbb{L} ={[(1,0),(2,1)],[(1,1),(0,1)],\displaystyle=\left\{\left[\left(1,0\right),\left(2,1\right)\right],\left[\left(1,1\right),\left(0,1\right)\right],\right. (74)
=[(0,1),(0,1),(0,1)],[(1,0),(1,0),(0,1)],[(1,0),(2,0),(2,0)]}\displaystyle=\left.\left[\left(0,1\right),\left(0,1\right),\left(0,1\right)\right],\left[\left(1,0\right),\left(1,0\right),\left(0,1\right)\right],\left[\left(1,0\right),\left(2,0\right),\left(2,0\right)\right]\right\}

One of the earliest and most widely used methods for minimizing this cost is the Needleman-Wunsch (NW) DP algorithm (Pachter and Sturmfels, 2005), the usual presentation of which is given in the min-sum semiring,

f0,0\displaystyle f_{0,0} =0\displaystyle=0 (75)
fi,0\displaystyle f_{i,0} =fi1,0+w(i,0)\displaystyle=f_{i-1,0}+w\left(i,0\right)
f0,j\displaystyle f_{0,j} =f0,j1+w(0,j)\displaystyle=f_{0,j-1}+w\left(0,j\right)
fi,j\displaystyle f_{i,j} =min(fi1,j1+w(i,j),fi1,j+w(0,j),fi,j1+w(i,0))\displaystyle=\min\left(f_{i-1,j-1}+w\left(i,j\right),f_{i-1,j}+w\left(0,j\right),f_{i,j-1}+w\left(i,0\right)\right)

for all i{1,2,,N}i\in\left\{1,2,\ldots,N\right\} and j{1,2,,M}j\in\left\{1,2,\ldots,M\right\}, where w(i,j)w\left(i,j\right) is the cost of the alignment of the first sequence at position ii, with the second sequence at position jj. The optimal alignment is obtained at salign=fN,Ms_{\text{align}}^{*}=f_{N,M}.

To apply our results, we construct a semiring polymorphic abstraction of the above,

f0,0𝒮,w\displaystyle f_{0,0}^{\mathcal{S},w} =i\displaystyle=i_{\otimes} (76)
fi,0𝒮,w\displaystyle f_{i,0}^{\mathcal{S},w} =fi1,0𝒮,ww(i,0)\displaystyle=f_{i-1,0}^{\mathcal{S},w}\otimes w\left(i,0\right)
f0,j𝒮,w\displaystyle f_{0,j}^{\mathcal{S},w} =f0,j1𝒮,ww(0,j)\displaystyle=f_{0,j-1}^{\mathcal{S},w}\otimes w\left(0,j\right)
fi,j𝒮,w\displaystyle f_{i,j}^{\mathcal{S},w} =(fi1,j1𝒮,ww(i,j))(fi1,j𝒮,ww(0,j))(fi,j1𝒮,ww(i,0))\displaystyle=\left(f_{i-1,j-1}^{\mathcal{S},w}\otimes w\left(i,j\right)\right)\oplus\left(f_{i-1,j}^{\mathcal{S},w}\otimes w\left(0,j\right)\right)\oplus\left(f_{i,j-1}^{\mathcal{S},w}\otimes w\left(i,0\right)\right)

We can now put this semiring polymorphic generator to use for various purposes. One application is counting all possible alignments. Semiring fusion (8) tells us that saligncount=l𝕃(i,j)l1=fN,M𝒩,ws_{\mathrm{aligncount}}^{*}=\sum_{l\in\mathbb{L}}\prod_{\left(i,j\right)\in l}1=f_{N,M}^{\mathcal{N},w} using the count semiring 𝒩=(,+,×,0,1)\mathcal{N}=\left(\mathbb{N},+,\times,0,1\right) with w(i,j)=1w\left(i,j\right)=1. A closed-form formula for this number of alignments, denoted D(N,M)D\left(N,M\right) is not known, but by expanding f𝒩,wf^{\mathcal{N},w} above we find f0,0=1f_{0,0}=1, fi,0=fi1,0f_{i,0}=f_{i-1,0}, f0,j=f0,j1f_{0,j}=f_{0,j-1} and fi,j=fi1,j1+fi1,j+fi,j1f_{i,j}=f_{i-1,j-1}+f_{i-1,j}+f_{i,j-1}, which simplifies to the following recurrence,

D(n,m)\displaystyle D\left(n,m\right) ={1(n=0)(m=0)D(n1,m1)+D(n1,m)+D(n,m1)otherwise.\displaystyle=\begin{cases}1&\left(n=0\right)\vee\left(m=0\right)\\ D\left(n-1,m-1\right)+D\left(n-1,m\right)+D\left(n,m-1\right)&\textrm{otherwise}.\end{cases} (77)

This recursion describes the well-known Delannoy numbers which for M=NM=N is the integer sequence Sloane (2021, sequence A001850), with leading order asymptotic approximation D(N,N)5.8ND\left(N,N\right)\approx 5.8^{N}. Thus, semiring polymorphism allows us to show that brute-force computation of all alignments would be intractable as it requires exponential time complexity, whereas the factorized DP NW implementation has O(NM)O\left(N\,M\right) e.g. quadratic, computational cost (Figure 3).

One practical problem with the standard NW algorithm (75) is that it places no constraint on how far the sequences can become out of alignment. After all, any two DNA/RNA sequences are related by an arbitrary number of insertions/deletions, but this has no biological significance in general. It would be useful to bound e.g. the sum of the absolute difference in sequence positions, so that we can exclude spurious alignments between sequences which bear no meaningful relationship to each other. We can specify this constrained sequence alignment problem as,

salignsumdiff=minl𝕃:(i,j)l|ij|L(i,j)lw(i,j),s_{\text{alignsumdiff}}^{*}=\min_{l\in\mathbb{L}:\sum_{\left(i,j\right)\in l}\left|i-j\right|\leq L}\underset{\left(i,j\right)\in l}{\sum}w\left(i,j\right), (78)

which is in the form of (9) when c(l)=Tc\left(l\right)=T if (i,j)l|ij|L\sum_{\left(i,j\right)\in l}\left|i-j\right|\leq L in the min-plus semiring \mathcal{R}.

To solve (78), we can set up the simple constraint algebra v(i,j)=|ij|v\left(i,j\right)=\left|i-j\right| and =(,+,0)\mathcal{M}=\left(\mathbb{N},+,0\right) with acceptance criteria a(m)La\left(m\right)\leq L. As this algebra is a group, we can insert this into (23) to obtain,

(uw(i,j))m\displaystyle\left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(i,j\right)\right)_{m} ={im<|ij|um|ij|w(i,j)otherwise,\displaystyle=\begin{cases}i_{\oplus}&m<\left|i-j\right|\\ u_{m-\left|i-j\right|}\otimes w\left(i,j\right)&\textrm{otherwise},\end{cases} (79)

which we write as (uw(i,j))m\left(u\circledast w\left(i,j\right)\right)_{m} for convenience. Inserting this into (76), we arrive at,

f0,0,m𝒮,w\displaystyle f_{0,0,m}^{\mathcal{S},w} ={im=0iotherwise\displaystyle=\begin{cases}i_{\otimes}&m=0\\ i_{\oplus}&\textrm{otherwise}\end{cases} (80)
fi,0,m𝒮,w\displaystyle f_{i,0,m}^{\mathcal{S},w} =(fi1,0𝒮,ww(i,0))m\displaystyle=\left(f_{i-1,0}^{\mathcal{S},w}\circledast w\left(i,0\right)\right)_{m}
f0,j,m𝒮,w\displaystyle f_{0,j,m}^{\mathcal{S},w} =(f0,j1𝒮,ww(0,j))m\displaystyle=\left(f_{0,j-1}^{\mathcal{S},w}\circledast w\left(0,j\right)\right)_{m}
fi,j,m𝒮,w\displaystyle f_{i,j,m}^{\mathcal{S},w} =(fi1,j1𝒮,ww(i,j))m(fi1,j𝒮,ww(0,j))m(fi,j1𝒮,ww(i,0))m,\displaystyle=\left(f_{i-1,j-1}^{\mathcal{S},w}\circledast w\left(i,j\right)\right)_{m}\oplus\left(f_{i-1,j}^{\mathcal{S},w}\circledast w\left(0,j\right)\right)_{m}\oplus\left(f_{i,j-1}^{\mathcal{S},w}\circledast w\left(i,0\right)\right)_{m},

for all i{1,2,,N}i\in\left\{1,2,\ldots,N\right\} and j{1,2,,M}j\in\left\{1,2,\ldots,M\right\}. Thus, salignsumdiff=π,a(fN,m,w)=minm:mLfN,M,m,ws_{\text{alignsumdiff}}^{*}=\pi^{\mathcal{R},a}\left(f_{N,m}^{\mathcal{R},w}\right)=\min_{m\in\mathbb{N}:m\leq L}f_{N,M,m}^{\mathcal{R},w}.

Further rearrangements of (80) based on case analysis are possible and may improve the readability of the algorithm, but as they do not generally improve implementation efficiency, we do not explore further here. The length of alignments, lying between max(N,M)\max\left(N,M\right) and N+MN+M, should be taken into account when choosing the acceptance function and thereby bounding the alignment difference sum. The result is an O(NML)O\left(N\,M\,L\right) time complexity algorithm for maximum sum of absolute alignment differences LL.

Although the alignment difference sum is convenient algebraically, another constraint which may be useful is the maximum absolute alignment difference. Bounding this quantity gives more precise control over the extent to which the sequences can become misaligned before the sequences are considered not to be matched at all. To implement this using the algebraic theory developed above, we need the constraint algebra v(i,j)=|ij|v\left(i,j\right)=\left|i-j\right| and algebra =({0,1,,N},max,0)\mathcal{M}=\left(\left\{0,1,\ldots,N^{\prime}\right\},\max,0\right), where N=max(N,M)N^{\prime}=\max\left(N,M\right) is the upper bound on the possible sequence misalignment. Because \mathcal{M} is a monoid, we need to modify the general lifted product (20) as follows,

(uw(i,j))m=(m{0,1,,N}max(m,|ij|)=mum)w(i,j).\begin{aligned} \left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(i,j\right)\right)_{m}&=\left(\bigoplus_{\begin{subarray}{c}m^{\prime}\in\left\{0,1,\ldots,N^{\prime}\right\}\\ \max\left(m^{\prime},\left|i-j\right|\right)=m\end{subarray}}u_{m^{\prime}}\right)\otimes w\left(i,j\right)\end{aligned}. (81)

Now, we need to find an explicit expression for the set {max(m,|ij|)=m}\left\{\max\left(m^{\prime},\left|i-j\right|\right)=m\right\} for m{0,1,,N}m^{\prime}\in\left\{0,1,\ldots,N^{\prime}\right\}. Similar to the situation with constrained segmentations above, there are three cases to consider:

{m:max(m,|ij|)=m}\displaystyle\left\{m^{\prime}:\max\left(m^{\prime},\left|i-j\right|\right)=m\right\} ={{m}m>|ij|{0,1,,m}m=|ij|,m<|ij|\displaystyle=\begin{cases}\left\{m\right\}&m>\left|i-j\right|\\ \left\{0,1,\ldots,m\right\}&m=\left|i-j\right|,\\ \emptyset&m<\left|i-j\right|\end{cases} (82)

which gives rise the following general lifted product,

(uw(i,j))m\displaystyle\left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(i,j\right)\right)_{m} ={umw(i,j)m>|ij|(m{0,1,,m}um)w(i,j)m=|ij|im<|ij|\displaystyle=\begin{cases}u_{m}\otimes w\left(i,j\right)&m>\left|i-j\right|\\ \left(\oplus_{m^{\prime}\in\left\{0,1,\ldots,m\right\}}u_{m^{\prime}}\right)\otimes w\left(i,j\right)&m=\left|i-j\right|\\ i_{\oplus}&m<\left|i-j\right|\end{cases} (83)

which we also denote by (uw(i,j))m\left(u\circledast w\left(i,j\right)\right)_{m} for convenience. Inserting this into (80) gives us a novel, O(NMmax(M,N))O\left(N\,M\,\max\left(M,N\right)\right) time DP algorithm for NW sequence alignments with an (arbitrary) constraint on the maximum absolute difference of misalignments (Figure 3).

Refer to caption
Figure 3: Computational time (black line) required to solve the Needleman-Wunsch DP sequence alignment algorithm (left) without constraints and (right) with lifted constraint. The horizontal axis is the length of both sequences and also the size of the constraint algebra (e.g. N=M=||N=M=\left|\mathcal{M}\right|). The vertical axis is on a quadratic (left) and cubic (right) scale such that exact O(N2)O\left(N^{2}\right) and O(N3)O\left(N^{3}\right) complexities correspond to a straight line (grey line). Python language implementation on a quad-core Intel Core i7 3.2GHz, 16Gb DRAM.

3.3 Discrete event combinations

As a final application exposition, in many contexts, it is important to be able to compute quantities over sequences of discrete events. An important application from reliability engineering is computing the probability of a combination of components in a complex engineered system failing, when each failure has a given probability. A common specification for such discrete event combinations as the semiring problem,

sevents=l𝕃(i,n)lw(i,n),s_{\text{events}}^{*}=\sum_{l\in\mathbb{L}}\underset{\left(i,n\right)\in l}{\prod}w\left(i,n\right), (84)

where 𝕃\mathbb{L} is the set of all possible sequences of events, each event having probability w(i,n)w\left(i,n\right) we will focus on the basic failure/non-failure case where i=0i=0 for survival and i=1i=1 for failure and where j{1,2,,N}j\in\left\{1,2,...,N\right\} is the size of sequences of events being considered. Thus seventss_{\text{events}}^{*} is the total probability over all possible sequences of NN fail/non-fail events. This specification is in the form of (1) over the probability semiring 𝒫=([0,1],+,×,0,1)\mathcal{P}=\left(\left[0,1\right],+,\times,0,1\right) for semiring mapping w:()[0,1]w:\left(\mathbb{N}\to\mathbb{N}\right)\to\left[0,1\right]. For N=2N=2, the set of all possible sequences of events is,

𝕃={[(0,1),(0,2)],[(0,1),(1,2)],[(1,1),(0,2)],[(1,1),(1,2)]}.\mathbb{L}=\left\{\left[\left(0,1\right),\left(0,2\right)\right],\left[\left(0,1\right),\left(1,2\right)\right],\left[\left(1,1\right),\left(0,2\right)\right],\left[\left(1,1\right),\left(1,2\right)\right]\right\}. (85)

There are 2N2^{N} such sequences. A simple semiring polymorphic generator recursion for all possible sequences of fail/non-fail events, is the following,

f0𝒮,w\displaystyle f_{0}^{\mathcal{S},w} =i\displaystyle=i_{\otimes} (86)
fn𝒮,w\displaystyle f_{n}^{\mathcal{S},w} =fn1𝒮,w(w(0,n)w(1,n))n{1,2,,N},\displaystyle=f_{n-1}^{\mathcal{S},w}\otimes\left(w\left(0,n\right)\oplus w\left(1,n\right)\right)\quad\forall n\in\left\{1,2,\ldots,N\right\},

where (0,n)\left(0,n\right) represents component non-failure and (1,n)\left(1,n\right) component failure at event number nn.

In practice, engineers are interested in the more complex case of the total probability of all possible sequences where MM failures in NN events occurs. That means we need to constrain event sequences l𝕃l\in\mathbb{L} so that exactly MM failures, coded as (1,n)\left(1,n\right), appear in each sequence. The corresponding constrained discrete event combinations problem is,

seventcombs=l𝕃:fails(l)=M(i,n)lw(i,n),s_{\text{eventcombs}}^{*}=\sum_{l\in\mathbb{L}:\mathrm{fails}\left(l\right)=M}\underset{\left(i,n\right)\in l}{\prod}w\left(i,n\right), (87)

which is in the form of (9) where c(l)=Tc\left(l\right)=T if (fails(l)=(i,n)li)=M\left(\mathrm{fails}\left(l\right)=\sum_{\left(i,n\right)\in l}i\right)=M. For M=2M=2 and N=3N=3, we have,

𝕃\displaystyle\mathbb{L}^{\prime} ={l𝕃:(i,n)li=2}\displaystyle=\left\{l\in\mathbb{L}:\sum_{\left(i,n\right)\in l}i=2\right\} (88)
={[(1,1),(1,2),(0,3)],[(1,1),(0,2),(1,3)],[(0,1),(1,2),(1,3)]}.\displaystyle=\left\{\left[\left(1,1\right),\left(1,2\right),\left(0,3\right)\right],\left[\left(1,1\right),\left(0,2\right),\left(1,3\right)\right],\left[\left(0,1\right),\left(1,2\right),\left(1,3\right)\right]\right\}.

Note that this is similar to, but subtly different from, the problem of selecting subset size as the constraint, (30). Using our algebraic theory, we express this constraint using the simple algebra =(,+,0)\mathcal{M}=\left(\mathbb{N},+,0\right) which adds up the number of failures and constraint map v(i,n)=iv\left(i,n\right)=i where i{0,1}i\in\left\{0,1\right\} and acceptance criteria a(m)=Ta\left(m\right)=T if m=Mm=M and false otherwise. Since \mathcal{M} is a group, we insert this into (23) to obtain,

(uw(i,n))m={im<iumiw(i,n)otherwise\left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(i,n\right)\right)_{m}=\begin{cases}i_{\oplus}&m<i\\ u_{m-i}\otimes w\left(i,n\right)&\textrm{otherwise}\end{cases} (89)

which we can then immediately insert into (86) to get

fn,m\displaystyle f_{n,m} =(fn1(w(0,n)w(1,n)))m\displaystyle=\left(f_{n-1}\otimes_{\mathcal{M}}\left(w\left(0,n\right)\oplus_{\mathcal{M}}w\left(1,n\right)\right)\right)_{m} (90)
=((fn1w(0,n))(fn1w(1,n)))m\displaystyle=\left(\left(f_{n-1}\otimes_{\mathcal{M}}w\left(0,n\right)\right)\oplus_{\mathcal{M}}\left(f_{n-1}\otimes_{\mathcal{M}}w\left(1,n\right)\right)\right)_{m}
=({im<0fn1,mw(0,n)otherwise)({im<1fn1,m1w(1,n)otherwise)\displaystyle=\left(\begin{cases}i_{\oplus}&m<0\\ f_{n-1,m}\otimes w\left(0,n\right)&\textrm{otherwise}\end{cases}\right)\oplus\left(\begin{cases}i_{\oplus}&m<1\\ f_{n-1,m-1}\otimes w\left(1,n\right)&\textrm{otherwise}\end{cases}\right)
=(fn1,mw(0,n))({im=0fn1,m1w(1,n)otherwise)\displaystyle=\left(f_{n-1,m}\otimes w\left(0,n\right)\right)\oplus\left(\begin{cases}i_{\oplus}&m=0\\ f_{n-1,m-1}\otimes w\left(1,n\right)&\textrm{otherwise}\end{cases}\right)

suppressing semiring superscripts of ff for clarity. This simplifies to the following semiring polymorphic generator recursion,

f0,0𝒮,w\displaystyle f_{0,0}^{\mathcal{S},w} =i\displaystyle=i_{\otimes} (91)
f0,m𝒮,w\displaystyle f_{0,m}^{\mathcal{S},w} =i\displaystyle=i_{\oplus}
fn,0𝒮,w\displaystyle f_{n,0}^{\mathcal{S},w} =fn1,0𝒮,ww(0,n)\displaystyle=f_{n-1,0}^{\mathcal{S},w}\otimes w\left(0,n\right)
fn,m𝒮,w\displaystyle f_{n,m}^{\mathcal{S},w} =(fn1,m𝒮,ww(0,n))(fn1,m1𝒮,ww(1,n)),\displaystyle=\left(f_{n-1,m}^{\mathcal{S},w}\otimes w\left(0,n\right)\right)\oplus\left(f_{n-1,m-1}^{\mathcal{S},w}\otimes w\left(1,n\right)\right),

for all n{1,2,,N}n\in\left\{1,2,\ldots,N\right\} and m{1,2,,M}m\in\left\{1,2,\ldots,M\right\}. In the probability semiring 𝒫=([0,1],+,×,0,1)\mathcal{P}=\left(\left[0,1\right],+,\times,0,1\right) then w(1,n)=pnw\left(1,n\right)=p_{n} and w(0,n)=1pnw\left(0,n\right)=1-p_{n} correspond to probabilities of failure and non-failure on event nn, respectively. An application of constrained semiring fusion (19) obtains the algorithm seventcombs=π𝒫,a(fN𝒫,w)=fN,M𝒫,ws_{\text{eventcombs}}^{*}=\pi^{\mathcal{P},a}\left(f_{N}^{\mathcal{P},w}\right)=f_{N,M}^{\mathcal{P},w}. This O(NM)O\left(N\,M\right) time complexity algorithm is extremely similar to that of Radke and Evanoff (1994) which was derived through special, ad-hoc reasoning. Of course, being semiring polymorphic, we can put the generator recursion (91) to other uses such as determining the most probable component failure combination (max-product semiring) or using this as a differential component in a machine learning system (softmax semiring).

4 Related work

Several formal approaches to DP exist in the literature, at various levels of abstraction. The seminal work of Karp and Held (1967) is based on representing DP recurrences as discrete sequential decision processes, where monotonicity justifies optimizing an associated global objective function. This framework is not polymorphic. The work of de Moor (1991) and others (Bird and de Moor, 1996) bases an abstraction of DP on category theory and relations such as inequalities, for combinatorial problems over arbitrary algebraic data types. It remains to be explored how to apply this relational approach to important DP problems which are not focussed on optimization problems, such as computing complete likelihoods for hidden Markov models (the forward-backward algorithm) (Little, 2019), or expectations for parameter estimation in natural language processing problems (Li and Eisner, 2009), for which semirings are a natural formalism.

An interesting precursor is the model of DP described in Helman and Rosenthal (1985). This describes restricted forms of some of the ideas which are precisely formulated and stated in full generality here, including the key role of the separation of computational structure from the values which are computed, and a special kind of homomorphic map over structural operators, into “choice-product” operators. It is not polymorphic. Implicit semiring polymorphism features in DP algorithms found in many specialized application domains, such as natural language processing over graphs and hypergraphs (Goodman, 1999; Li and Eisner, 2009; Huang, 2008) and more recently in differentiable algorithms for machine learning (Mensch and Blondel, 2018). These studies refer to special DP algorithms and do not address the general DP algorithm derivation problem, as we do here.

Perhaps most closely related to our approach is the semiring filter fusion model of Emoto et al. (2012), which, while not explicitly aimed at DP, covers some algorithms which our framework addresses. It does not address the important relationship between specification and correct algorithm. While polymorphic, it is restricted to sequential decision processes which can be expressed as free homomorphisms over associative list joins. To our knowledge, this article was first to introduce algebraic lifting, albeit lacking proof details and in a limited form restricted to monoids, which we expand in much greater generality and depth here. These limitations of Emoto et al. (2012) appear to rule out non-sequential DP algorithms e.g. sequence alignment, edit distance and dynamic time warping, and algorithms requiring constraints based on more complex lifting algebras such as ordered subsequences.

5 Discussion and conclusions

The paper can be summarised as follows:

  • A method is presented for deriving efficient and correct algorithms which solve DP problems specified as semiring objectives evaluated over combinatorial configurations. This makes use of semiring polymorphism and shortcut fusion from constructive algorithmics.

  • We show that more complex problems with multiple, simultaneous combinatorial constraints can be solved within the same semiring formalism through semiring lifting, when these constraints can be given in separable algebraic form.

  • We show broadly applicable algebraic simplifications which can substantially improve the computational time complexity of these derived algorithms, where access to the form of the semiring generator is available, such as in the case of widely available, explicit DP Bellman recursions.

  • Through the use of the tupling trick, we demonstrate a problem-agnostic alternative to backtracing in DP algorithms which does not require any knowledge of the specific DP algorithm.

  • Finally, derivations of multiple problems across a range of application areas are given, which show the effectiveness of our framework for a very wide class of problems to which DP is applicable.

While we can always express DP recurrences in a general computer language, to do so over semirings requires special programming effort and overhead. Modern languages which generalize classical computation to various settings such as probabilistic or weighted logic exist and it would be interesting to see how to implement the DP framework of this paper in those languages. For example, semiring programming is a proposed overarching framework which can be considered as a strict generalization of the polymorphic recurrences presented here (Belle and de Raedt, 2020), although this work does not address algorithm derivation. Similarly, we can view our polymorphic DP algorithms as special kinds of sum-product function evaluations (Friesen and Domingos, 2016), although, as with semiring programming, this only describes a representation framework.

Our approach to the derivation of algorithms for constrained combinatorial problems, requires writing these constraints in separable form using algebras such as groups, monoids or semigroups. While this is a very broad formalism, there will be some constraints which cannot be written in this form. Future work may be able to provide similar algebraic derivations when the separability requirement is relaxed. Furthermore, the precise mapping between the approach developed in this paper and e.g. DP algorithms specified as general discrete-continuous mathematical optimization problems, remains an open topic.

Another issue which has not been raised is that of parallel DP implementations. Similar approaches based on constructive algorithmics demonstrate how to produce algorithms which are inherently parallel in the MapReduce framework, but these rely on associative operators and are restricted to the setting of functional recurrences over free list semiring homomorphisms (Emoto et al., 2012). While DP algorithms derived using our framework are not immediately parallelisable in this way, our framework does not rule out exploiting existing inherently parallel recurrences in the form of free list homomorphisms, and for these recurrences, the constraint lifting algebra developed here retains this inherently parallel structure. The drawback is that, in some cases, it may not be possible to simplify the lifted semiring product down to constant time complexity as in (23). Future investigations may explore general DP parallelisation frameworks (Galil and Park, 1994).

A limitation of our approach as developed so far, is that it does not exploit some of the more “advanced” DP speed-up tricks which have been developed for special situations. A particular example of this is the situation where the mapping function ww in segmentation problems satisfies a special concavity/convexity property (Yao, 1980), enabling a reduction in computational complexity from O(N2)O\left(N^{2}\right) to O(NlogN)O\left(N\log N\right). It will be interesting future work to attempt to incorporate this and other tricks, in our framework.

Another limitation of our approach is that it does not provide a way to derive semiring polymorphic Bellman sequential decision process (SDP) recursions for arbitrary combinatorial generators, we must rely upon the existence of such recursions for specific combinatorial objects. A better approach would be to be able to derive these semiring-structured SDPs from specification, which is addressed to a certain extent in the constructive algorithmics literature (Bird and de Moor, 1996) and would be a valuable direction for future research leading from this paper.

As we hope we have been able to persuade, semiring polymorphism is not an abstract curiosity: it is an extremely useful tool for DP algorithm derivation, as it is in many other areas of computing (Belle and de Raedt, 2020; Friesen and Domingos, 2016; Goodman, 1999; Huang, 2008; Pachter and Sturmfels, 2005; Mensch and Blondel, 2018; Sniedovich, 2011). It offers a simple route to deriving correct DP algorithms from specifications and quantifying computational complexity and deriving novel algorithms in a simple, modular way through semiring lifting. It plays a central role in clarifying what we understand to be an essential conceptual principle of DP, which is the separation of combinatorial structure, combinatorial constraint and value computation.

Appendix A: Proof of DP semiring fusion

We use the automated free theorem generator Haskell package (Boehme, 2021) to prove (8). Assume that the function ff is implemented in some pure, lazy functional language (a language without side effects and without the empty type). The type of the parameters of ff consists of, respectively, two binary operators ,:𝕊𝕊𝕊\oplus,\otimes:\mathbb{S}\to\mathbb{S}\to\mathbb{S}, the mapping function w:𝕏𝕊w:\mathbb{X}\to\mathbb{S} where 𝕏\mathbb{X} is an arbitrary type, and the constants i,i:𝕊i_{\oplus},i_{\otimes}:\mathbb{S}, and produces a result of type 𝕊\mathbb{S},

f:(𝕊𝕊𝕊)(𝕊𝕊𝕊)𝕊𝕊(𝕏𝕊)𝕊,f:\left(\mathbb{S}\to\mathbb{S}\to\mathbb{S}\right)\to\left(\mathbb{S}\to\mathbb{S}\to\mathbb{S}\right)\to\mathbb{S}\to\mathbb{S}\to\left(\mathbb{X}\to\mathbb{S}\right)\to\mathbb{S}, (92)

where 𝕊\mathbb{S} is an arbitrary type. According to Wadler’s free theorem (Wadler, 1989), this type declaration above implies the following theorem.

Theorem 1.

For two algebraic structures 𝒮=(𝕊,,,i,i)\mathcal{S}=\left(\mathbb{S},\oplus,\otimes,i_{\oplus},i_{\otimes}\right) and 𝒮=(𝕊,,,i,i)\mathcal{S}^{\prime}=\left(\mathbb{S}^{\prime},\oplus^{\prime},\otimes^{\prime},i_{\oplus^{\prime}},i_{\otimes^{\prime}}\right), assume 𝕊,𝕊\mathbb{S},\mathbb{S}^{\prime} are arbitrary types and function g:𝕊𝕊g:\mathbb{S}^{\prime}\to\mathbb{S} is a map between them. Assume also the existence of binary operators ,:𝕊𝕊𝕊\oplus^{\prime},\otimes^{\prime}:\mathbb{S}^{\prime}\to\mathbb{S}^{\prime}\to\mathbb{S}^{\prime} and ,:𝕊𝕊𝕊\oplus,\otimes:\mathbb{S}\to\mathbb{S}\to\mathbb{S}, constants i,i:𝕊i_{\oplus^{\prime}},i_{\otimes^{\prime}}:\mathbb{S}^{\prime} and i,i:𝕊i_{\oplus},i_{\otimes}:\mathbb{S} and mapping functions w:𝕏𝕊w^{\prime}:\mathbb{X}\to\mathbb{S}^{\prime}, w:𝕏𝕊w:\mathbb{X}\to\mathbb{S}. If, for all l1,l2:𝕊l_{1},l_{2}:\mathbb{S}^{\prime} and x:𝕏x:\mathbb{X}, the function gg satisfies:

g(l1l2)\displaystyle g\left(l_{1}\oplus^{\prime}l_{2}\right) =g(l1)g(l2)\displaystyle=g\left(l_{1}\right)\oplus g\left(l_{2}\right)
g(l1l2)\displaystyle g\left(l_{1}\otimes^{\prime}l_{2}\right) =g(l1)g(l2)\displaystyle=g\left(l_{1}\right)\otimes g\left(l_{2}\right)
g(i)\displaystyle g\left(i_{\oplus^{\prime}}\right) =i\displaystyle=i_{\oplus}
g(i)\displaystyle g\left(i_{\otimes^{\prime}}\right) =i\displaystyle=i_{\otimes}
g(w(x))\displaystyle g\left(w^{\prime}\left(x\right)\right) =w(x),\displaystyle=w\left(x\right),

then shortcut fusion applies to the function ff,

g(f𝒮,w)=f𝒮,w.g\left(f^{\mathcal{S}^{\prime},w^{\prime}}\right)=f^{\mathcal{S},w}. (93)

If \otimes left and right distributes over \oplus and i,ii_{\oplus},i_{\otimes} are the associated identity constants, the algebraic object 𝒮=(,,i,i)\mathcal{S}=\left(\oplus,\otimes,i_{\oplus},i_{\otimes}\right) is a semiring. We use the shorthand f𝒮,wf^{\mathcal{S},w} to denote f(,,i,i,w)f\left(\oplus,\otimes,i_{\oplus},i_{\otimes},w\right) and the semiring homomorphism g:{[𝕏]}𝕊g:\mathbb{\left\{\left[\mathbb{X}\right]\right\}}\to\mathbb{S} satisfying g({[x]})=w(x)g\left(\left\{\left[x\right]\right\}\right)=w\left(x\right) by g𝒮,wg^{\mathcal{S},w}. Theorem 8 is a corollary.

Corollary.

DP semiring fusion. Given the generator semiring 𝒢=({[𝕏]},,,,{[]})\mathcal{G}=\left(\left\{\left[\mathbb{X}\right]\right\},\cup,\circ,\emptyset,\left\{\left[\,\right]\right\}\right) with the mapping function w(x)={[x]}w^{\prime}\left(x\right)=\left\{\left[x\right]\right\} for all x𝕏x\in\mathbb{X}, and another, arbitrary semiring 𝒮\mathcal{S} with mapping function w:𝕏𝕊w:\mathbb{X}\to\mathbb{S}, if there exists a homomorphism g𝒮,wg^{\mathcal{S},w} mapping 𝒢𝒮\mathcal{G}\to\mathcal{S} which additionally satisfies g({[x]})=w(x)g\left(\left\{\left[x\right]\right\}\right)=w\left(x\right) for all x:[𝕏]x:\left[\mathbb{X}\right], then for a function ff with type given in (92):

g𝒮,w(f𝒢,w)=f𝒮,w.g^{\mathcal{S},w}\left(f^{\mathcal{G},w^{\prime}}\right)=f^{\mathcal{S},w}. (94)

Appendix B: Constraint lifting proofs

This section is a generalization of the arguments given in Emoto et al. (2012), whilst providing and clarifying essential proof details missing from that work. The formulation of DP constraints as single operator algebras over finite sets requires the use of (semiring) lifting or formal sums as a structural tool for deriving DP constrained fusion. This also provides a definition of the lifted semiring 𝒮=(𝕄𝕊,,,i,i)\mathcal{S}\mathcal{M}=\left(\mathbb{M}\to\mathbb{S},\oplus_{\mathcal{M}},\otimes_{\mathcal{M}},i_{\oplus_{\mathcal{M}}},i_{\otimes_{\mathcal{M}}}\right).

Given a semiring 𝒮=(𝕊,,,i,i)\mathcal{S}=\left(\mathbb{S},\oplus,\otimes,i_{\oplus},i_{\otimes}\right) and constraint algebra =(𝕄,,i)\mathcal{M}=\left(\mathbb{M},\odot,i_{\odot}\right), define semiring-valued formal sums x:𝕄𝕊x:\mathbb{M}\to\mathbb{S} as objects indicating that there are xm:𝕊x_{m}:\mathbb{S} “occurrences” of the element m:𝕄m:\mathbb{M}. By convention, elements xmx_{m} taking the value ii_{\oplus} are not listed. Accordingly, when two such formal sums are added, the summation acts much like vector addition in the semiring

(x+y)m=xmym,\begin{aligned} \left(x+y\right)_{m}&=x_{m}\oplus y_{m}\end{aligned}, (95)

for all x,y:𝕄𝕊x,y:\mathbb{M}\to\mathbb{S}. We take this to define the lifted semiring sum xyx\oplus_{\mathcal{M}}y elementwise, (xy)m=xmym\left(x\oplus_{\mathcal{M}}y\right)_{m}=x_{m}\oplus y_{m} for all m:𝕄m:\mathbb{M}. Clearly, this inherits all the properties of \oplus, including commutativity and idempotency. The left/right identity constant satisfying xi=ix=xx\oplus_{\mathcal{M}}i_{\oplus_{\mathcal{M}}}=i_{\oplus_{\mathcal{M}}}\oplus_{\mathcal{M}}x=x is just (i)m=i\left(i_{\oplus_{\mathcal{M}}}\right)_{m}=i_{\oplus}.

Next, we describe the generic change of variables (pushforward) formula for such formal sums. Consider an arbitrary function f:𝕄𝕄f:\mathbb{M}\to\mathbb{M} acting to transform values from the algebra \mathcal{M}. We can ask what happens to a lifted semiring object x:𝕄𝕊x:\mathbb{M}\to\mathbb{S} under this transformation. To do this, construct the product semiring object on 𝕄𝕄𝕊\mathbb{M}\to\mathbb{M}\to\mathbb{S},

xm1,m2=xm1δm2,f(m1),\begin{aligned} x_{m_{1},m_{2}}&=x_{m_{1}}\otimes\delta_{m_{2},f\left(m_{1}\right)}\end{aligned}, (96)

where the lifted semiring unit function δm:𝕄𝕊\delta_{m}:\mathbb{M}\to\mathbb{S} is defined as

δm,m={im=miotherwise.\delta_{m,m^{\prime}}=\begin{cases}i_{\otimes}&m^{\prime}=m\\ i_{\oplus}&\textrm{otherwise}.\end{cases} (97)

Then we can “marginalize out” the original variable to arrive at the change of variables formula (familiar to probability theory),

xm2\displaystyle x_{m_{2}} =m1:𝕄xm1δm2,f(m1)\displaystyle=\bigoplus_{m_{1}:\mathbb{M}}x_{m_{1}}\otimes\delta_{m_{2},f\left(m_{1}\right)} (98)
=m1:𝕄;m2=f(m1)xm1\displaystyle=\bigoplus_{\begin{subarray}{c}m_{1}:\mathbb{M};m_{2}=f\left(m_{1}\right)\end{subarray}}x_{m_{1}}
=xf1(m2),\displaystyle=x_{f^{-1}\left(m_{2}\right)},

where the last step holds if ff has a unique inverse.

A key step in proving the constrained version of DP semiring fusion, is to be able to fuse the composition of the constraint filtering followed by a semiring homomorphism, into a single semiring homomorphism. To do this, we will lift the constraint filtering over the set 𝕄\mathbb{M}. Assume the shorthand g:{[𝕏]}𝕄𝕊g^{\prime}:\left\{\left[\mathbb{X}\right]\right\}\to\mathbb{M}\to\mathbb{S} and ϕm=ϕ,v,δm\phi_{m}^{\prime}=\phi^{\mathcal{M},v,\delta_{m}} where the acceptance function δm(m)=T\delta_{m}\left(m^{\prime}\right)=T if m=mm^{\prime}=m and FF otherwise. We write

gm(x)=(g𝒮,wϕ,v,δm)(x).g_{m}^{\prime}\left(x\right)=\left(g^{\mathcal{S},w}\cdot\phi^{\mathcal{M},v,\delta_{m}}\right)\left(x\right). (99)

Thus, gm(x)g_{m}^{\prime}\left(x\right) denotes the result of first filtering the set of lists xx to retain any lists on which the constraint evaluates to mm, and then applying the homomorphism g𝒮,wg^{\mathcal{S},w} to the remaining lists. Now, for gmg_{m}^{\prime} to be a semiring homomorphism, it must preserve semiring structure. For it to be consistent with the filtering, it must also preserve the action of the filtering under ϕ,v,δm\phi^{\mathcal{M},v,\delta_{m}}.

Turning to the semiring sum, we have,

gm(xy)\displaystyle g_{m}^{\prime}\left(x\cup y\right) =g𝒮,wϕm(xy)\displaystyle=g^{\mathcal{S},w}\cdot\phi_{m}^{\prime}\left(x\cup y\right) (100)
=g𝒮,w(ϕm(x)ϕm(y))\displaystyle=g^{\mathcal{S},w}\cdot\left(\phi_{m}^{\prime}\left(x\right)\cup\phi_{m}^{\prime}\left(y\right)\right)
=(g𝒮,wϕm)(x)(g𝒮,wϕm)(y)\displaystyle=\left(g^{\mathcal{S},w}\cdot\phi_{m}^{\prime}\right)\left(x\right)\oplus\left(g^{\mathcal{S},w}\cdot\phi_{m}^{\prime}\right)\left(y\right)
=gm(x)gm(y).\displaystyle=g_{m}^{\prime}\left(x\right)\oplus g_{m}^{\prime}\left(y\right).

To explain the second step: note that forming the union of sets of lists has no effect on the computation of the constraint value which determines the result of filtering. Thus, the union of sets of lists is invariant under the action of the filter. The third step follows because g𝒮,wg^{\mathcal{S},w} is a semiring homomorphism.

Somewhat more complex is the semiring product, for which we have

gm(xy)\displaystyle g_{m}^{\prime}\left(x\circ y\right) =(g𝒮,wϕm)(xy)\displaystyle=\left(g^{\mathcal{S},w}\cdot\phi_{m}^{\prime}\right)\left(x\circ y\right) (101)
=g𝒮,w(m,m′′𝕄:mm′′=m(ϕm(x)ϕm′′(y)))\displaystyle=g^{\mathcal{S},w}\left(\bigcup_{\forall m^{\prime},m^{\prime\prime}\in\mathbb{M}:m^{\prime}\odot m^{\prime\prime}=m}\left(\phi_{m^{\prime}}^{\prime}\left(x\right)\circ\phi_{m^{\prime\prime}}^{\prime}\left(y\right)\right)\right)
=m,m′′𝕄:mm′′=mg𝒮,w(ϕm(x)ϕm′′(y))\displaystyle=\bigoplus_{\forall m^{\prime},m^{\prime\prime}\in\mathbb{M}:m^{\prime}\odot m^{\prime\prime}=m}g^{\mathcal{S},w}\cdot\left(\phi_{m^{\prime}}^{\prime}\left(x\right)\circ\phi_{m^{\prime\prime}}^{\prime}\left(y\right)\right)
=m,m′′𝕄:mm′′=m(g𝒮,wϕm)(x)(g𝒮,wϕm′′)(y)\displaystyle=\bigoplus_{\forall m^{\prime},m^{\prime\prime}\in\mathbb{M}:m^{\prime}\odot m^{\prime\prime}=m}\left(g^{\mathcal{S},w}\cdot\phi_{m^{\prime}}^{\prime}\right)\left(x\right)\otimes\left(g^{\mathcal{S},w}\cdot\phi_{m^{\prime\prime}}^{\prime}\right)\left(y\right)
=m,m′′𝕄:mm′′=mgm(x)gm′′(y).\displaystyle=\bigoplus_{\forall m^{\prime},m^{\prime\prime}\in\mathbb{M}:m^{\prime}\odot m^{\prime\prime}=m}g_{m^{\prime}}^{\prime}\left(x\right)\otimes g_{m^{\prime\prime}}^{\prime}\left(y\right).

Clearly, this motivates the definition of the lifted semiring product as (xy)m=m,m′′𝕄:mm′′=mxmym′′\left(x\otimes_{\mathcal{M}}y\right)_{m}=\bigoplus_{\forall m^{\prime},m^{\prime\prime}\in\mathbb{M}:m^{\prime}\odot m^{\prime\prime}=m}x_{m^{\prime}}\otimes y_{m^{\prime\prime}}.

The second step above deserves further explanation. We need to be able to push the filter ϕm\phi_{m}^{\prime} inside the cross-join, which is critical to defining a semiring homomorphism. Recall that the cross-join xyx\circ y of two sets of lists involves joining together each list in xx with each list of yy. For general lists l,l′′l^{\prime},l^{\prime\prime} whose constraints evaluate to mm^{\prime} and m′′m^{\prime\prime} respectively, then due to the separability of the constraint algebra, the constraint value of their join ll′′l^{\prime}\cup l^{\prime\prime} is mm′′m^{\prime}\odot m^{\prime\prime}. If we group together into one set ss^{\prime}, all those lists whose constraints evaluate to mm^{\prime} and into another set s′′s^{\prime\prime}, all those lists whose constraints evaluate to m′′m^{\prime\prime}, then their cross-join ss′′s^{\prime}\circ s^{\prime\prime} will consist of sets of lists, all of which have constraints evaluating to m=mm′′m=m^{\prime}\odot m^{\prime\prime}. Finally, for a given mm and without further information on the properties of \odot, we can find the values of m,m′′m^{\prime},m^{\prime\prime} such that mm′′=mm^{\prime}\odot m^{\prime\prime}=m by exhaustively considering all possible pairs. Clearly, if \odot is specialized in some way, particularly with regards to the existence of inverses, then this exhaustive search can be reduced, and this is the basis of our algebraic simplifications for special cases such as group lifting algebras.

A semiring homomorphism must map identities. For empty sets which are the identity for \cup, we simply require,

gm()=im𝕄.g_{m}^{\prime}\left(\emptyset\right)=i_{\oplus}\quad\forall m\in\mathbb{M}. (102)

Similarly, sets of empty lists act as identities for the cross-join operator. In this case, we must have gm({[]}x)=gm(x{[]})=gm(x)g_{m}^{\prime}\left(\left\{\left[\,\right]\right\}\circ x\right)=g_{m}^{\prime}\left(x\circ\left\{\left[\,\right]\right\}\right)=g_{m}^{\prime}\left(x\right). If we set gm({[]})=δi,mg_{m}^{\prime}\left(\left\{\left[\,\right]\right\}\right)=\delta_{i_{\odot},m}, then we have,

gm({[]}x)\displaystyle g_{m}^{\prime}\left(\left\{\left[\,\right]\right\}\circ x\right) =m,m′′𝕄:mm′′=mδi,mgm′′(x)\displaystyle=\bigoplus_{\forall m^{\prime},m^{\prime\prime}\in\mathbb{M}:m^{\prime}\odot m^{\prime\prime}=m}\delta_{i_{\odot},m^{\prime}}\otimes g_{m^{\prime\prime}}^{\prime}\left(x\right) (103)
=m′′𝕄:im′′=mδi,igm′′(x)\displaystyle=\bigoplus_{\forall m^{\prime\prime}\in\mathbb{M}:i_{\odot}\odot m^{\prime\prime}=m}\delta_{i_{\odot},i_{\odot}}\otimes g_{m^{\prime\prime}}^{\prime}\left(x\right)
=m′′𝕄:im′′=mgm′′(x)\displaystyle=\bigoplus_{\forall m^{\prime\prime}\in\mathbb{M}:i_{\odot}\odot m^{\prime\prime}=m}g_{m^{\prime\prime}}^{\prime}\left(x\right)
=gm(x),\displaystyle=g_{m}^{\prime}\left(x\right),

and similarly for gm(x{[]})g_{m}^{\prime}\left(x\circ\left\{\left[\,\right]\right\}\right). This shows the lifted semiring identity to be i=δii_{\otimes_{\mathcal{M}}}=\delta_{i_{\odot}}.

Finally, we need to consider the homomorphic mapping of sets with single element configurations e.g. terms like {[x]}\left\{\left[x\right]\right\}. Under the action of the filter ϕ,v,δm\phi^{\mathcal{M},v,\delta_{m}}, such terms are only retained if the constraint mapping v(s)=mv\left(s\right)=m, whereupon they contribute a value w(s)w\left(s\right) to the semiring value of the homomorphism g𝒮,wg^{\mathcal{S},w}. Otherwise, they do not contribute anything to the semiring sum. It follows that,

gm({[x]})\displaystyle g_{m}^{\prime}\left(\left\{\left[x\right]\right\}\right) =δv(x),mw(x)\displaystyle=\delta_{v\left(x\right),m}\otimes w\left(x\right) (104)
={w(x)m=v(x)iotherwise,\displaystyle=\begin{cases}w\left(x\right)&m=v\left(x\right)\\ i_{\oplus}&\textrm{otherwise},\end{cases}

which we write as the mapping w(x)mw_{\mathcal{M}}\left(x\right)_{m}. To summarize then, (100)-(104) show that gmg_{m}^{\prime} is a semiring homomorphism performing the lift mapping 𝒢𝒮\mathcal{G}\to\mathcal{S}\mathcal{M}:

g𝒮,w(ϕ,v,δm)=g𝒮,w.g^{\mathcal{S},w}\left(\phi^{\mathcal{M},v,\delta_{m}}\right)=g^{\mathcal{S}\mathcal{M},w_{\mathcal{M}}}. (105)

The next step is to reconstruct the result of DP constraint filtering ϕ,v,δm\phi^{\mathcal{M},v,\delta_{m}}, from the lifted result. This involves computing the effect of the transformation a:𝕄𝔹a:\mathbb{M}\to\mathbb{B} mapping the lifting algebra 𝕄\mathbb{M} into the value in 𝔹\mathbb{B} of the predicate aa, on an arbitrary lifted semiring object x:𝕄𝕊x:\mathbb{M}\to\mathbb{S}. The joint product function π\pi on 𝕄×𝔹\mathbb{M}\times\mathbb{B} is written using the Boolean-semiring unit function:

πm,b\displaystyle\pi^{m,b} (x)=xmδb,a(m)\displaystyle\left(x\right)=x_{m}\otimes\delta_{b,a\left(m\right)} (106)
δb,b\displaystyle\delta_{b,b^{\prime}} ={ib=biotherwise.\displaystyle=\begin{cases}i_{\otimes}&b^{\prime}=b\\ i_{\oplus}&\textrm{otherwise}.\end{cases}

We then project onto the second parameter of π\pi to obtain,

πb(x)\displaystyle\pi^{b}\left(x\right) =m𝕄xmδb,a(m)\displaystyle=\bigoplus_{\forall m^{\prime}\in\mathbb{M}}x_{m^{\prime}}\otimes\delta_{b,a\left(m^{\prime}\right)} (107)
=m𝕄:a(m)=bxm\displaystyle=\bigoplus_{\forall m^{\prime}\in\mathbb{M}:a\left(m^{\prime}\right)=b}x_{m^{\prime}}
=xa1(b),\displaystyle=x_{a^{-1}\left(b\right)},

where the last line holds if aa has a unique inverse. We use the notation π𝒮,a\pi^{\mathcal{S},a} as a shorthand for πT\pi^{T} over the semiring 𝒮\mathcal{S} and the acceptance criteria aa.

Putting everything above together, we can show the following:

g𝒮,w(ϕ,v,a(f𝒢,w))\displaystyle g^{\mathcal{S},w}\left(\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right)\right) =π𝒮,a(g𝒮,w(ϕ,v,a(f𝒢,w)))\displaystyle=\pi^{\mathcal{S},a}\left(g^{\mathcal{S},w}\left(\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right)\right)\right) (108)
=π𝒮,a(g𝒮,w(f𝒢,w))\displaystyle=\pi^{\mathcal{S},a}\left(g^{\mathcal{SM},w_{\mathcal{M}}}\left(f^{\mathcal{G},w^{\prime}}\right)\right)
=π𝒮,a(f𝒮,w)\displaystyle=\pi^{\mathcal{S},a}\left(f^{\mathcal{SM},w_{\mathcal{M}}}\right)

which constitutes a proof of theorem (19).

Theorem 2.

DP semiring constrained fusion. Given the generator semiring 𝒢\mathcal{G} with the mapping function ww^{\prime}, and another, arbitrary semiring 𝒮\mathcal{S} with mapping function ww, the constraint algebra =(𝕄,,i)\mathcal{M}=\left(\mathbb{M},\odot,i_{\odot}\right) with constraint mapping function vv, acceptance criteria a:𝕄𝔹a:\mathbb{M}\to\mathbb{B}, constraint filtering function ϕ:(𝕄𝕄𝕄)𝕄(𝕏𝕄)(𝕄𝔹)\phi:\left(\mathbb{M}\to\mathbb{M}\to\mathbb{M}\right)\to\mathbb{M}\to\left(\mathbb{X}\to\mathbb{M}\right)\to\left(\mathbb{M}\to\mathbb{B}\right) and projection function π:(𝕊𝕊𝕊)𝕊(𝕄𝔹)𝕊\pi:\left(\mathbb{S}\to\mathbb{S}\to\mathbb{S}\right)\to\mathbb{S}\to\left(\mathbb{M}\to\mathbb{B}\right)\to\mathbb{S}, then for a function ff with type (92):

g𝒮,w(ϕ,v,a(f𝒢,w))=π𝒮,a(f𝒮,w).\begin{aligned} g^{\mathcal{S},w}\left(\phi^{\mathcal{M},v,a}\left(f^{\mathcal{G},w^{\prime}}\right)\right)&=\pi^{\mathcal{S},a}\left(f^{\mathcal{SM},w_{\mathcal{M}}}\right)\end{aligned}. (109)

Appendix C: A selection of semirings

A table of some useful (numerical) semirings 𝒮=(𝕊,,,i,i)\mathcal{S}=\left(\mathbb{S},\oplus,\otimes,i_{\oplus},i_{\otimes}\right) is given below, for more details on these and other semirings, the book by Golan (1999) is an excellent reference.

Name Example application Set 𝕊\mathbb{S} Operations ,\oplus,\otimes Identities i,ii_{\oplus},i_{\otimes}
Arithmetic Solution counting \mathbb{N} +,×+,\times 0,10,1
Generator Exhaustive listing {[𝕏]}\left\{\left[\mathbb{X}\right]\right\} ,\cup,\circ ,{[]}\emptyset,\left\{\left[\,\right]\right\}
Boolean Solution existence 𝔹\mathbb{B} ,\vee,\wedge T,FT,F
Arithmetic Probabilistic likelihood \mathbb{R} +,×+,\times 0,10,1
Tropical Minimum negative log likelihood +\mathbb{R}^{+} min,+\min,+ ,0\infty,0
Softmax Differentiable minimum negative log likelihood +\mathbb{R}^{+} ln(ex+ey),+-\ln\left(e^{-x}+e^{-y}\right),+ ,0\infty,0
Viterbi Minimum negative log likelihood with optimal solution +×{+}\mathbb{R}^{+}\times\left\{\mathbb{R}^{+}\right\} (min,argmin),(+,)\left(\min,\arg\min\right),\left(+,\cup\right) (,),(0,)\left(\infty,\emptyset\right),\left(0,\emptyset\right)
Expectation Expectation-maximization ×+\mathbb{R}\times\mathbb{R}^{+} (x+y,p+q),(py+qx,pq)\left(x+y,p+q\right),\left(py+qx,pq\right) (0,0),(1,0)\left(0,0\right),\left(1,0\right)
Bottleneck Fuzzy constraint satisfaction [0,1]\left[0,1\right] max,min\max,\min 0,10,1
Relational Database queries 𝕊R𝕊\mathbb{S}R\mathbb{S} ,\cup,\bowtie ,1R\emptyset,1_{R}

Appendix D: Some useful constraint algebras

In this section we list some useful example constraints and simplified expressions for the resulting lifted semiring products, see (22), along with simplified expressions for the product against the lifted single value, see (23).

Example application Algebra =(𝕄,,i)\mathcal{M}=\left(\mathbb{M},\odot,i_{\odot}\right) (xy)m\left(x\otimes_{\mathcal{M}}y\right)_{m} (22) (uw(x))m\left(u\otimes_{\mathcal{M}}w_{\mathcal{M}}\left(x\right)\right)_{m} (23)
Subset size (,+,0)\left(\mathbb{N},+,0\right) m:(xmymm)\bigoplus_{m^{\prime}:\mathbb{N}}\left(x_{m^{\prime}}\otimes y_{m-m^{\prime}}\right) {im<v(x)umv(x)w(x)otherwise\begin{cases}i_{\oplus}&m<v\left(x\right)\\ u_{m-v\left(x\right)}\otimes w\left(x\right)&\textrm{otherwise}\end{cases}
Minimum count ({1,,M},min,M)\left(\left\{1,\ldots,M\right\},\min,M\right) m=mM(xmym)m=m+1M(xmym)\begin{array}[]{c}\bigoplus_{m^{\prime}=m}^{M}\left(x_{m^{\prime}}\otimes y_{m}\right)\oplus\\ \bigoplus_{m^{\prime}=m+1}^{M}\left(x_{m}\otimes y_{m^{\prime}}\right)\end{array} {umw(x)m<v(x)(m=mMum)w(x)m=v(x)im>v(x)\begin{cases}u_{m}\otimes w\left(x\right)&m<v\left(x\right)\\ \left(\oplus_{m^{\prime}=m}^{M}u_{m^{\prime}}\right)\otimes w\left(x\right)&m=v\left(x\right)\\ i_{\oplus}&m>v\left(x\right)\end{cases}
Maximum count ({1,,M},max,0)\left(\left\{1,\ldots,M\right\},\max,0\right) m=1m1(xmym)m=1m(xmym)\begin{array}[]{c}\bigoplus_{m^{\prime}=1}^{m-1}\left(x_{m^{\prime}}\otimes y_{m}\right)\oplus\\ \bigoplus_{m^{\prime}=1}^{m}\left(x_{m}\otimes y_{m^{\prime}}\right)\end{array} {umw(x)m>v(x)(m=1mum)w(x)m=v(x)im<v(x)\begin{cases}u_{m}\otimes w\left(x\right)&m>v\left(x\right)\\ \left(\oplus_{m^{\prime}=1}^{m}u_{m^{\prime}}\right)\otimes w\left(x\right)&m=v\left(x\right)\\ i_{\oplus}&m<v\left(x\right)\end{cases}
Absolute difference ({1,,M},\displaystyle\left(\left\{1,\ldots,M\right\},\right. |xy|,0)\displaystyle\left.\left|x-y\right|,0\right) m=m+1M1(xmymm)m=1Mm1(xmym+m)\begin{array}[]{c}\bigoplus_{m^{\prime}=m+1}^{M-1}\left(x_{m^{\prime}}\otimes y_{m^{\prime}-m}\right)\oplus\\ \bigoplus_{m^{\prime}=1}^{M-m-1}\left(x_{m^{\prime}}\otimes y_{m^{\prime}+m}\right)\end{array} ({im>v(x)1umv(x)w(x)otherwise)({im>Mv(x)um+v(x)w(x)otherwise)\begin{array}[]{c}\left(\begin{cases}i_{\oplus}&m>v\left(x\right)-1\\ u_{m-v\left(x\right)}\otimes w\left(x\right)&\textrm{otherwise}\end{cases}\right)\oplus\\ \left(\begin{cases}i_{\oplus}&m>M-v\left(x\right)\\ u_{m+v\left(x\right)}\otimes w\left(x\right)&\textrm{otherwise}\end{cases}\right)\end{array}
Existence (𝔹,,F)\left(\mathbb{B},\vee,F\right) {xFyFm=F(xFyT)(xTyF)m=T(xTyT)\begin{cases}x_{F}\otimes y_{F}&m=F\\ \left(x_{F}\otimes y_{T}\right)\oplus\left(x_{T}\otimes y_{F}\right)&m=T\\ \oplus\left(x_{T}\otimes y_{T}\right)\end{cases} {umw(x)v(x)=F(uFuT)w(x)(m=T)(v(x)=T)i(m=F)(v(x)=T)\begin{cases}u_{m}\otimes w\left(x\right)&v\left(x\right)=F\\ \left(u_{F}\oplus u_{T}\right)\otimes w\left(x\right)&\left(m=T\right)\wedge\left(v\left(x\right)=T\right)\\ i_{\oplus}&\left(m=F\right)\wedge\left(v\left(x\right)=T\right)\end{cases}
For all (𝔹,,T)\left(\mathbb{B},\wedge,T\right) {(xFyF)(xTyF)m=F(xFyT)xTyTm=T\begin{cases}\left(x_{F}\otimes y_{F}\right)\oplus\left(x_{T}\otimes y_{F}\right)&m=F\\ \oplus\left(x_{F}\otimes y_{T}\right)\\ x_{T}\otimes y_{T}&m=T\end{cases} {umw(x)v(x)=T(uFuT)w(x)(m=F)(v(x)=F)i(m=T)(v(x)=F)\begin{cases}u_{m}\otimes w\left(x\right)&v\left(x\right)=T\\ \left(u_{F}\oplus u_{T}\right)\otimes w\left(x\right)&\left(m=F\right)\wedge\left(v\left(x\right)=F\right)\\ i_{\oplus}&\left(m=T\right)\wedge\left(v\left(x\right)=F\right)\end{cases}
Sequential-value ordering ((,),\displaystyle\left(\left(\mathbb{N},\mathbb{R}\right),\right. ,z)\displaystyle\preceq,\left.z_{\preceq}\right) m𝕄:mm(xmym)\bigoplus_{m^{\prime}\in\mathbb{M}:m^{\prime}\preceq m}\left(x_{m^{\prime}}\otimes y_{m}\right) {(m𝕄:mmum)w(x)m=v(x)iotherwise\begin{cases}\left(\oplus_{m^{\prime}\in\mathbb{M}:m^{\prime}\preceq m}u_{m^{\prime}}\right)\otimes w\left(x\right)&m=v\left(x\right)\\ i_{\oplus}&\textrm{otherwise}\end{cases}

References

  • Belle and de Raedt [2020] V. Belle and Luc de Raedt. Semiring programming: A semantic framework for generalized sum product problems. International Journal of Approximate Reasoning, 126:181–201, 2020.
  • Bellman [1957] Richard Bellman. Dynamic Programming. Princeton University Press, 1957.
  • Bird and de Moor [1996] Richard S. Bird and Oege de Moor. Algebra of Programming. Prentice-Hall, 1996.
  • Boehme [2021] S. Boehme. free-theorems: Automatic generation of free theorems, 2021.
  • de Moor [1991] Oege de Moor. Categories, Relations and Dynamic Programming. PhD thesis, University of Oxford, 1991.
  • de Moor [1999] Oege de Moor. Dynamic programming as a software component. In CSCC 1999, Athens, Greece, 1999.
  • Emoto et al. [2012] Kento Emoto, Sebastian Fischer, and Zhenjiang Hu. Filter-embedding semiring fusion for progamming with MapReduce. Formal Aspects of Computing, (24):623–645, 2012.
  • Friesen and Domingos [2016] A.L. Friesen and P. Domingos. The sum-product theorem: A foundation for learning tractable models. In Proceedings of the 33rd International Conference on Machine Learning, ICML, volume 48, pages 1909–1918, New York, USA, 2016.
  • Galil and Park [1994] Z. Galil and K. Park. Parallel algorithms for dynamic programming recurrences with more than O(1) dependency. Journal of Parallel and Distributed Computing, 21(2), 1994.
  • Golan [1999] Jonathan S. Golan. Semirings and their Applications. Springer Netherlands, 1 edition, 1999.
  • Goodman [1999] J. Goodman. Semiring parsing. Computational Linguistics, 25(4), 1999.
  • Gronlund et al. [2018] A. Gronlund, K.G. Larsen, A. Mathiasen, J.S. Nielsen, S. Schneider, and M. Song. Fast exact k-means, k-medians and Bregman divergence clustering in 1D. arXiv, page arXiv:1701.07204v4, 2018.
  • Helman and Rosenthal [1985] P. Helman and A. Rosenthal. A comprehensive model of dynamic programming. SIAM Journal on Algebraic Discrete Methods, 6(2):319–333, 1985.
  • Huang [2008] L. Huang. Advanced dynamic programming in semiring and hypergraph frameworks. In COLING 2008, pages 1–18, Manchester, UK, 2008. Coling 2008 Organizing Committee.
  • Jeuring [1993] J.T. Jeuring. Theories for Algorithm Calculation. PhD thesis, Utrecht University, 1993.
  • Karp and Held [1967] R.M. Karp and M. Held. Finite-state processes and dynamic programming. SIAM Journal on Applied Mathematics, 15(3):693–718, 1967.
  • Kim et al. [2009] S.-J. Kim, K. Koh, S. Boyd, and D. Gorinevsky. L1 trend filtering. SIAM Rreview, 51(2):339–360, 2009.
  • Kleinberg and Tardos [2005] Jon Kleinberg and Eva Tardos. Algorithm Design. Addison-Wesley Longman Publishing Co., Inc., 2005.
  • Li and Eisner [2009] Z. Li and J. Eisner. First- and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 40–51, Singapore, 2009. ACL.
  • Little [2019] Max A. Little. Machine Learning for Signal Processing. Oxford University Press, Oxford, UK, 2019.
  • Mensch and Blondel [2018] A. Mensch and M. Blondel. Differentiable dynamic programming for structured prediction and attention. Proceedings of the 35th International Conference on Machine Learning, 80:3462–3471, 2018.
  • Pachter and Sturmfels [2005] L. Pachter and B. Sturmfels. Algebraic Statistics for Computational Biology. Cambridge University Press, 2005.
  • Radke and Evanoff [1994] G.E. Radke and J. Evanoff. A fast recursive algorithm to compute the probability of M-out-of-N events. In Proceedings of Annual Reliability and Maintainability Symposium (RAMS), pages 114–117, Anaheim, CA, USA, 1994. IEEE.
  • Sloane [2021] N.J.A. Sloane. The on-line encyclopedia of integer sequences, 2021.
  • Sniedovich [2011] M. Sniedovich. Dynamic Programming: Foundations and Principles. CRC Press, Boca Raton, US, 2nd edition, 2011.
  • Terzi and Tsaparas [2006] E. Terzi and P. Tsaparas. Efficient algorithms for sequence segmentation. In Proceedings of the 2006 SIAM International Conference on Data Mining. SIAM, 2006.
  • Wadler [1989] Philip Wadler. Theorems for free! In FPCA ’89: Proceedings of the fourth international conference on Functional programming languages and computer architecture, London, UK, 1989. Association for Computing Machinery.
  • Yao [1980] F.F. Yao. Efficient dynamic programming using quadrangle inequalities. In Proceedings of the 12th Annual ACM Symposium on the Theory of Computing, pages 429–435, 1980.
  • Zhang [2003] H. Zhang. Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm. Bioinformatics, 19(11):1391–1396, 2003.