This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: National University of Singapore, Singapore 22institutetext: GrabTaxi Holdings, Singapore

Scalable Probabilistic Routes

Suwei Yang 1122    Victor C. Liang 22    Kuldeep S. Meel 11
Abstract

Inference and prediction of routes have become of interest over the past decade owing to a dramatic increase in package delivery and ride-sharing services. Given the underlying combinatorial structure and the incorporation of probabilities, route prediction involves techniques from both formal methods and machine learning. One promising approach for predicting routes uses decision diagrams that are augmented with probability values. However, the effectiveness of this approach depends on the size of the compiled decision diagrams. The scalability of the approach is limited owing to its empirical runtime and space complexity. In this work, our contributions are two-fold: first, we introduce a relaxed encoding that uses a linear number of variables with respect to the number of vertices in a road network graph to significantly reduce the size of resultant decision diagrams. Secondly, instead of a stepwise sampling procedure, we propose a single pass sampling-based route prediction. In our evaluations arising from a real-world road network, we demonstrate that the resulting system achieves around twice the quality of suggested routes while being an order of magnitude faster compared to state-of-the-art.

0.1 Introduction

The past decade has witnessed an unprecedented rise of the service economy, best highlighted by the prevalence of delivery and ride-sharing services [9, 10]. For operational and financial efficiency, a fundamental problem for such companies is the inference and prediction of routes taken by the drivers. When a driver receives a job to navigate from location A to B, the ride-sharing app needs to predict the route in order to determine: (1) the trip time, which is an important consideration for the customer, (2) the fare, important consideration for both the driver and the customer, and (3) the trip experience since customers feel safe when the driver takes the route described in their app [2, 35]. However the reality is that drivers and customers have preferences, as such the trips taken are not always the shortest possible by distance or time [22]. To this end, delivery and ride-sharing service companies have a need for techniques that can infer the distribution of routes and efficiently predict the likely route a driver takes for a given start and end location.

Routing, a classic problem in computer science, has traditionally been approached without considering the learning of distributions [1, 30]. However, Choi, Shen, and Darwiche demonstrated through a series of papers that the distribution of routes can be conceptualized as a structured probability distribution (SPD) given the underlying combinatorial structure [6, 31, 32]. Decision diagrams, which are particularly well-suited for representing SPDs, have emerged as the state-of-the-art approach for probability guided routing. The decision diagram based approach allows for learning of SPDs through the use of decision diagrams augmented with probability values, followed by a stepwise process for uncovering the route node by node.

However, scalability remains a challenge when using decision diagrams to reason about route distributions, particularly for large road networks. Existing works address this concern in various ways, such as through the use of hierarchical diagrams [6] and Structured Bayesian Networks [31]. Choi et al. [6] partition the structured space into smaller subspaces, with each subspace’s SPD being represented by a decision diagram. Shen et al. used Structured Bayesian Networks to represent conditional dependencies between sets of random variables, with the distribution within each set of variables represented by a conditional Probabilistic Sentential Decision Diagram (PSDD) [31, 32]. Despite these efforts, the scalability of decision diagrams for routing, in terms of space complexity, remains an open issue [7].

The primary contribution of this work is to tackle the scalability challenges faced by the current state-of-the-art approaches. Our contributions are two-fold: first, we focus on minimizing the size of the compiled diagram by relaxation and refinement. In particular, instead of learning distributions over the set of all valid routes, we learn distributions over an over-approximation, perform sampling followed by refinement to output a valid route. Secondly, instead of a stepwise sampling procedure, we perform one-pass sampling by adapting existing sampling algorithm [37] to perform conditional sampling. Our extensive empirical evaluation over benchmarks arising from real-world publicly available road network data demonstrates that our approach, called 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}, is able to handle real-world instances that were clearly beyond the reach of the state-of-the-art. Furthermore, on instances that can be handled by prior state-of-the-art, 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} achieves a median of 10×10\times runtime performance improvements.

0.2 Background

In the remaining parts of this work we will discuss how to encode simple, more specifically simple trips, in a graph using Boolean formulas. In addition, we will also discuss decision diagrams and probabilistic reasoning with them. In this section, we introduce the preliminaries and background for the rest of the paper. To avoid ambiguity, we use vertices to refer to nodes of road network graphs and nodes to refer to nodes of decision diagrams.

0.2.1 Preliminaries

Simple Trip

Let GG be an arbitrary undirected graph, a path on GG is a sequence of connected vertices v1,v2,,vmv_{1},v_{2},...,v_{m} of GG where i=1m1vi+1N(vi)\forall_{i=1}^{m-1}v_{i+1}\in N(v_{i}), with N(vi)N(v_{i}) referring to neighbours of viv_{i}. A path π\pi does not contain loops if vi,vjπvivj\forall_{v_{i},v_{j}\in\pi}v_{i}\neq v_{j}. π\pi does not contain detour if vi,vj,vk,vlπvjN(vi)vkN(vi)vlN(vi)\forall_{v_{i},v_{j},v_{k},v_{l}\in\pi}v_{j}\not\in N(v_{i})\lor v_{k}\not\in N(v_{i})\lor v_{l}\not\in N(v_{i}). Path π\pi is a simple path if it does not contain loops. A simple path π\pi is a simple trip if it does not contain detours. We denote the set of all simple trips in GG as 𝖲𝗂𝗆𝗉𝗅𝖾𝖳𝗋𝗂𝗉(G)\mathsf{SimpleTrip}(G). In Figure 1, d-e-h is a simple trip whereas d-e-f-c-b-e-h and d-e-f-i-h are not because they contain a loop and a detour respectively. We use 𝖳𝖾𝗋𝗆(π)\mathsf{Term}(\pi) to refer to the terminal vertices of path π\pi.

aabbccddeeffgghhii

Figure 1: A 3 ×\times 3 grid graph

Probabilistic Routing Problem

In this paper, we tackle the probabilistic routing problem which we define as the following. Given a graph GG of an underlying road network, training and testing data Dtrain,DtestD_{train},D_{test}, start and end vertices s,ts,t, sample path π\pi from ss to tt such that ε\varepsilon-match rate with ground truth path πDtest\pi^{\prime}\in D_{test} is maximized. We define ε\varepsilon-match rate between π\pi and π\pi^{\prime} as |Uclose(π)|÷|U||U_{close(\pi)}|\div|U| where UU is the set of vertices of π\pi^{\prime} and Uclose(π)U_{close(\pi)} is the set of vertices of π\pi^{\prime} that are within ε\varepsilon euclidean distance away from the nearest vertex in π\pi. More details on ε\varepsilon will be discussed in Section 0.4.

Boolean Formula

A Boolean variable can take values true or false. A literal xx is a Boolean variable or its negation. Let FF be a Boolean formula. FF is in conjunctive normal form (CNF) if FF is a conjunction of clauses, where each clause is a disjunction of literals. FF is satisfiable if there exists an assignment τ\tau of the set of variables XX of FF such that FF evaluates to true. We refer to τ\tau as a satisfying assignment of FF and denote the set of all τ\tau as 𝖲𝗈𝗅(F)\mathsf{Sol}(F). In the remaining parts of this paper, all formulas and variables are Boolean unless stated otherwise.

Decision Diagrams

Decision diagrams are directed acyclic graph (DAG) representations of logical formulas under the knowledge compilation paradigm. Decision diagrams are designed to enable the tractable handling of certain types of queries, that is the queries can be answered in polynomial time with respect to the number of nodes in the diagram [12]. We use diagram size to refer to the number of nodes in a decision diagram. In this work we use the OBDD[\land[20] variant of OBDD, more specifically Probabilistic OBDD[\land] (𝖯𝖱𝖮𝖡\mathsf{PROB}[37], for which there are existing efficient sampling algorithm. We will discuss the 𝖯𝖱𝖮𝖡\mathsf{PROB} decision diagram in later sections.

0.2.2 Related Works

The continuous pursuit of compact representations by the research community has resulted in several decision diagram forms over the years. Some of the decision diagram forms include AOMDD for multi-valued variables, OBDD and SDD for binary variables [5, 11, 24]. Both OBDD and SDD are canonical representations of Boolean formulas given variable ordering for OBDD and Vtree for SDD respectively. OBDD [5] is comprised of internal nodes that correspond to variables and leaf nodes that correspond to \top or \bot. Each internal node of OBDD have exactly two child and represents the Shannon decomposition [4] on the variable represented by that internal node. SDDs are comprised of elements and nodes [11]. Elements represent conjunction of a prime and a sub, each of which can either be a terminal SDD or a decomposition SDD. A decomposition SDD is represented by a node with child elements representing the decomposition. A terminal SDD can be a literal, \top or \bot. The decompositions of SDDs follow that of the respective Vtree, which is a full binary decision tree of Boolean variables in the formula. In this work, we use the OBDD[\land[20] variant of OBDD, which is shown to be theoretically incomparable but empirically more succinct than SDDs [20].

A related development formulates probabilistic circuits [8], based on sum-product networks [28] and closely related to decision diagrams, as a special class of neural networks known as Einsum networks [27]. In the Einsum network structure, leaf nodes represent different gaussian distributions. By learning from data, Einsum networks are able to represent SPDs as weighted sums and mixtures of gaussian distributions. Einsum networks address scalability by utilizing tensor operations implemented in existing deep learning frameworks such as PyTorch [26]. Our work differs from the Einsum network structure, we require the determinism property for decision diagrams whereas the underlying structure for Einsum network lacks this property. We will introduce the determinism property in the following section.

Various Boolean encodings have been proposed for representing paths within a graph, including Absolute, Compact, and Relative encodings [29]. These encodings capture both the vertices comprising the path and the ordering information of said vertices. However, these encodings necessitate the use of polynomial number of Boolean variables, specifically |V|2|V|^{2}, |V|log2|V||V|log_{2}|V|, and 2|V|22|V|^{2} variables for Absolute, Compact, and Relative encoding respectively. While these encodings accurately represent the space of paths within a graph, they are not efficient and lead to high space and time complexity for downstream routing tasks.

Choi, Shen, and Darwiche demonstrated over a series of papers that the distribution of routes can be conceptualized as a structured probability distribution (SPD) given the underlying combinatorial structure [6, 31, 32]. This approach, referred to as the ‘CSD’ approach in the rest of this paper, builds on top of existing approaches that represents paths using zero-surpressed decision diagrams [19, 17, 18]. The CSD approach utilizes sentential decision diagrams to represent the SPD of paths and employs a stepwise methodology for handling path queries. Specifically, at each step, the next vertex to visit is determined to be the one with the highest probability, given the vertices already visited and the start and destination vertices. While the CSD approach has been influential in its incorporation of probabilistic elements in addressing the routing problem, it is not without limitations. In particular, there are two main limitations (1) there are no guarantees of completion, meaning that even if a path exists between a given start and destination vertex, it may not be returned using the CSD approach [6]. (2) the stepwise routing process necessitates the repeated computation of conditional probabilities, resulting in runtime inefficiencies.

In summary, the limitations of prior works are Boolean encodings that require a high number of variables, lack of routing task completion guarantees, and numerous conditional probability computations.

0.2.3 𝖯𝖱𝖮𝖡\mathsf{PROB}: Probabilistic OBDD[\land]

In this subsection, we will introduce the 𝖯𝖱𝖮𝖡\mathsf{PROB} (Probabilistic OBDD[\land]) decision diagram structure and properties. We adopt the same notations as prior work [37] for consistency.

Notations

We use nodes to refer to nodes in 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi and vertices to refer to nodes in graph G(V,E)G(V,E) to avoid ambiguity. 𝖢𝗁𝗂𝗅𝖽(n)\mathsf{Child}(n) refers to the children of node nn. 𝖧𝗂(n)\mathsf{Hi}(n) refers to the hi child of decision node nn and 𝖫𝗈(n)\mathsf{Lo}(n) refers to the lo child of nn. θ𝖧𝗂(n)\mathsf{\theta_{Hi}}(n) and θ𝖫𝗈(n)\mathsf{\theta_{Lo}}(n) refer to the parameters associated with the edge connecting decision node nn with 𝖧𝗂(n)\mathsf{Hi}(n) and 𝖫𝗈(n)\mathsf{Lo}(n) respectively in a 𝖯𝖱𝖮𝖡\mathsf{PROB}. 𝖵𝖺𝗋(n)\mathsf{Var}(n) refers to the associated variable of decision node nn. 𝖵𝖺𝗋𝖲𝖾𝗍(n)\mathsf{VarSet}(n) refers to the set of variables of FF represented by a 𝖯𝖱𝖮𝖡\mathsf{PROB} with nn as the root node. 𝖲𝗎𝖻𝖽𝗂𝖺𝗀𝗋𝖺𝗆(n)\mathsf{Subdiagram}(n) refers to the 𝖯𝖱𝖮𝖡\mathsf{PROB} starting at node nn. 𝖯𝖺𝗋𝖾𝗇𝗍(n)\mathsf{Parent}(n) refers to the immediate parent nodes of nn in 𝖯𝖱𝖮𝖡\mathsf{PROB}.

𝖯𝖱𝖮𝖡\mathsf{PROB} Structure

Let ψ\psi be a 𝖯𝖱𝖮𝖡\mathsf{PROB} which represents a Boolean formula FF. ψ\psi is a DAG comprising of four types of nodes - conjunction node, decision node, true node, and false node.

A conjunction node (or \land-node) ncn_{c} splits Boolean formula FF into different sub-formulas, i.e. F=F1F2FjF=F_{1}\land F_{2}\land...\land F_{j}. Each sub-formula is represented by a 𝖯𝖱𝖮𝖡\mathsf{PROB} rooted at the corresponding child node of ncn_{c}, such that the set of variables in each of F1F_{1}, F2F_{2}, …, FjF_{j} are mutually disjoint.

A decision node ndn_{d} corresponds to a Boolean variable xx and has exactly two child nodes, 𝖧𝗂(nd)\mathsf{Hi}(n_{d}) and 𝖫𝗈(nd)\mathsf{Lo}(n_{d}). 𝖧𝗂(nd)\mathsf{Hi}(n_{d}) represents the decision to set xx to true and 𝖫𝗈(nd)\mathsf{Lo}(n_{d}) represents otherwise. We use 𝖵𝖺𝗋(nd)\mathsf{Var}(n_{d}) to refer to the Boolean variable xx that decision node ndn_{d} is associated with in FF. Each branch of ndn_{d} has an associated parameter, and the branch parameters of ndn_{d} sum up to 1.

The leaf nodes of 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi are true and false nodes. An assignment τ\tau of Boolean formula FF is a traversal of the 𝖯𝖱𝖮𝖡\mathsf{PROB} from the root node to the leaf node, we denote such a traversal as 𝖱𝖾𝗉ψ(τ)\mathsf{Rep}_{\psi}(\tau). At each decision node ndn_{d}, the traversal follows the value of variable 𝖵𝖺𝗋(nd)\mathsf{Var}(n_{d}) in τ\tau. At each conjunction node, all child branches are traversed. A satisfying assignment of FF will result in a traversal from root to leaf nodes where only the true nodes are visited. If a traversal leads to a false node at any point, then the assignment is not a satisfying assignment.

xxyyzz\top\botn1n1n2n2n3n3n4n4n5n5θ𝖧𝗂(n2)\mathsf{\theta_{Hi}}(n2)θ𝖫𝗈(n2)\mathsf{\theta_{Lo}}(n2)θ𝖧𝗂(n3)\mathsf{\theta_{Hi}}(n3)θ𝖫𝗈(n3)\mathsf{\theta_{Lo}}(n3)θ𝖫𝗈(n1)\mathsf{\theta_{Lo}}(n1)θ𝖧𝗂(n1)\mathsf{\theta_{Hi}}(n1)

Figure 2: A 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ1\psi_{1} representing F=(xy)(¬x¬z)F=(x\lor y)\land(\neg x\lor\neg z)

An assignment of Boolean formula FF is represented by a top-down traversal of a 𝖯𝖱𝖮𝖡\mathsf{PROB} compiled from FF. For example, we have a Boolean formula F=(xy)(¬x¬z)F=(x\lor y)\land(\neg x\lor\neg z), represented by the 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ1\psi_{1} in Figure 2. When xx is assigned true and zz is assigned false, FF will evaluate to true. If we have a partial assignment τ\tau, we can perform inference conditioned on τ\tau if we visit only the branches of decision nodes in ψ\psi that agree with τ\tau. This allows for conditional sampling, which we discuss in Section 0.3.

𝖯𝖱𝖮𝖡\mathsf{PROB} inherits important properties of OBDD[\land] that are useful to our algorithms in later sections. The properties are - determinism, decomposability, and smoothness.

Property 1 (Determinism).

For every decision node ndn_{d}, the set of satisfying assignments represented by 𝖧𝗂(nd)\mathsf{Hi}(n_{d}) and 𝖫𝗈(nd)\mathsf{Lo}(n_{d}) are logically disjoint

Property 2 (Decomposability).

For every conjunction node ncn_{c}, 𝖵𝖺𝗋𝖲𝖾𝗍(ci)𝖵𝖺𝗋𝖲𝖾𝗍(cj)=,ci,cj𝖢𝗁𝗂𝗅𝖽(nc),cicj\mathsf{VarSet}(c_{i})\cap\mathsf{VarSet}(c_{j})=\emptyset,\forall c_{i},c_{j}\in\mathsf{Child}(n_{c}),c_{i}\neq c_{j}

Property 3 (Smoothness).

For every decision node ndn_{d}, 𝖵𝖺𝗋𝖲𝖾𝗍(𝖧𝗂(nd))=𝖵𝖺𝗋𝖲𝖾𝗍(𝖫𝗈(nd))\mathsf{VarSet}(\mathsf{Hi}(n_{d}))=\mathsf{VarSet}(\mathsf{Lo}(n_{d}))

In the rest of this paper, all mentions of 𝖯𝖱𝖮𝖡\mathsf{PROB} refer to smooth 𝖯𝖱𝖮𝖡\mathsf{PROB}. Smoothness can be achieved via a smoothing algorithm introduced in prior work [37]. We defer the smoothing algorithm to the appendix.

0.3 Approach

In this section, we introduce our approach, 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}, which addresses the aforementioned limitations of existing methods using (1) a novel relaxed encoding that requires a linear number of Boolean variables and (2) a single-pass sampling and refinement approach which provides completeness guarantees. The flow of 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} is shown in Figure 3.

Encode (Sec 0.3.1) Compile into OBDD[\land] Learn parameters (Sec 0.3.2) Sample and refinement (Sec 0.3.3) GraphDataQuerySampled Trip

Figure 3: Flow of 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}, with red rectangles indicating this work. For compilation, we use existing off-the-shelf techniques.

In our approach, we first use our relaxed encoding to encode 𝖲𝗂𝗆𝗉𝗅𝖾𝖳𝗋𝗂𝗉(G)\mathsf{SimpleTrip}(G) of graph GG into a Boolean formula. Next, we compile the Boolean formula into OBDD[\land]. In order to learn from historical trip data, we convert the data into assignments. Subsequently, the OBDD[\land] is parameterized into 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi and the parameters are learned from data. Finally to sample trips from start vertex vsv_{s} to destination vertex vtv_{t}, we create a partial assignment τ\tau^{\prime} with the variables that indicate vsv_{s} and vtv_{t} are terminal vertices set to true. The 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} algorithm, algorithm 2, takes τ\tau^{\prime} as input and samples a set of satisfying assignments. Finally, in the refinement step, a simple trip π\pi is extracted from each satisfying assignment τ\tau using depth-first search to remove disjoint loop components.

0.3.1 Relaxed Encoding

In this work, we present a novel relaxed encoding that only includes vertex membership and terminal information. Our encoding only requires a linear (2|V|2|V|) number of Boolean variables, resulting in more succinct decision diagrams and improved runtime performance for downstream tasks. In relation to prior encodings, we observed that the ordering information of vertices can be inferred from the graph given a set of vertices and the terminal vertices, thus enabling us to exclude ordering information in our relaxed encoding. Our relaxed encoding represents an over-approximation of trips in 𝖲𝗂𝗆𝗉𝗅𝖾𝖳𝗋𝗂𝗉(G)\mathsf{SimpleTrip}(G) for graph G(V,E)G(V,E) using a linear number of Boolean variables with respect to |V||V|. We discuss the over-approximation in later parts of this section.

Our encoding uses two types of Boolean variables, nn-type and ss-type variables. Each vertex vVv\in V in graph G(V,E)G(V,E) has a corresponding nn-type and ss-type variable. The nn-type variable indicates if vertex vv is part of a trip and ss-type variable indicates if vv is a terminal vertex of the trip. Our encoding is the conjunction of the five types of clauses over all vertices in graph GG as follows.

iVsi\displaystyle\bigvee_{i\in V}s_{i} (H1)
iV[nijadj(i)nj]\displaystyle\bigwedge_{i\in V}[n_{i}\longrightarrow\bigvee_{j\in adj(i)}n_{j}] (H2)
i,j,kV,ijk(¬si¬sj¬sk)\displaystyle\bigwedge_{\begin{subarray}{c}i,j,k\in V,\\ i\not=j\not=k\end{subarray}}(\neg s_{i}\lor\neg s_{j}\lor\neg s_{k}) (H3)
iVsinij,kadj(i),jk(¬nj¬nk)\displaystyle\bigwedge_{i\in V}s_{i}\longrightarrow n_{i}\land\bigwedge_{j,k\in adj(i),j\not=k}(\neg n_{j}\lor\neg n_{k}) (H4)
i,jV,jadj(i)[ninjsi[(kadj(i),kjink)l,madj(i),l,mj(¬nl¬nm)]]\displaystyle\bigwedge_{i,j\in V,j\in adj(i)}[n_{i}\land n_{j}\longrightarrow s_{i}\lor[(\bigvee_{\begin{subarray}{c}k\in adj(i),\\ k\neq j\neq i\end{subarray}}n_{k})\land\bigwedge_{\begin{subarray}{c}l,m\in adj(i),\\ l,m\not=j\end{subarray}}(\neg n_{l}\lor\neg n_{m})]] (H5)

A simple trip π\pi in graph GG has at least one terminal vertex and at most two terminal vertices, described by encoding components H1 and H3 respectively. At each terminal vertex viv_{i} of π\pi, there can only be at most 1 adjacent vertex of viv_{i} that is also part of π\pi and this is encoded by H4. For each vertex viv_{i} in π\pi, at least one of their adjacent vertices is in π\pi regardless if viv_{i} is a terminal vertex or otherwise, this is captured by H2. Finally, H5 encodes that if a given vertex viv_{i} and one of its adjacent vertices are part of π\pi, then either another neighbour vertex of viv_{i} is part of π\pi or viv_{i} is a terminal vertex.

Definition 1.

Let :𝖲𝗂𝗆𝗉𝗅𝖾𝖳𝗋𝗂𝗉(G)𝖲𝗈𝗅(F)\mathcal{M}:\mathsf{SimpleTrip}(G)\mapsto\mathsf{Sol}(F) such that for a given trip π𝖲𝗂𝗆𝗉𝗅𝖾𝖳𝗋𝗂𝗉(G)\pi\in\mathsf{SimpleTrip}(G), τ=(π)\tau=\mathcal{M}(\pi) is the assignment whereby the nn-type variables of all vertices vπv\in\pi and the ss-type variables of v𝖳𝖾𝗋𝗆(π)v\in\mathsf{Term}(\pi) are set to true. All other variables are set to false in τ\tau.

We refer to our encoding as relaxed encoding because the solution space of constraints over-approximates the space of simple trips in the graph. Notice that while all simple trips correspond to a satisfying assignment of the encoding, they are not the only satisfying assignments. Assignments corresponding to a simple trip π\pi with disjoint loop component β\beta also satisfy the constraints. The intuition is that β\beta introduces no additional terminal vertices, hence H1, H3, and H4 remain satisfied. Since the vertices in β\beta always have nn-type variables of exactly two of its neighbours set to true, H5 and H2 remain satisfied. Thus, a simple trip with a disjoint loop component also corresponds to a satisfying assignment of our encoding.

0.3.2 Learning Parameters from Data

Input: 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi, τ\tau - complete assignment of data instance

1:  n𝗋𝗈𝗈𝗍𝖭𝗈𝖽𝖾(ψ)n\leftarrow\mathsf{rootNode(\psi)}
2:  if nn is \land-node then
3:     for cc in 𝖢𝗁𝗂𝗅𝖽(n)\mathsf{Child}(n) do
4:        𝖯𝗋𝗈𝖻𝖫𝖾𝖺𝗋𝗇(c,τ)\mathsf{ProbLearn}(c,\tau)
5:  if nn is decision node then
6:     l𝗀𝖾𝗍𝖫𝗂𝗍𝖾𝗋𝖺𝗅(τ,𝖵𝖺𝗋(𝗇))l\leftarrow\mathsf{getLiteral(\tau,\mathsf{Var}(n))}
7:     if ll is positive literal then
8:        𝖧𝗂#(n)\mathsf{{Hi}_{\#}}(n) += 1
9:        𝖯𝗋𝗈𝖻𝖫𝖾𝖺𝗋𝗇(𝖧𝗂(𝗇),τ)\mathsf{ProbLearn(\mathsf{Hi}(n),\tau)}
10:     else
11:        𝖫𝗈#(n)\mathsf{{Lo}_{\#}}(n) += 1
12:        𝖯𝗋𝗈𝖻𝖫𝖾𝖺𝗋𝗇(𝖫𝗈(𝗇),τ)\mathsf{ProbLearn(\mathsf{Lo}(n),\tau)}
Algorithm 1 𝖯𝗋𝗈𝖻𝖫𝖾𝖺𝗋𝗇\mathsf{ProbLearn} - updates counters of ψ\psi from data

We introduce algorithm 1, 𝖯𝗋𝗈𝖻𝖫𝖾𝖺𝗋𝗇\mathsf{ProbLearn}, for updating branch counters of 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi from assignments. In order to learn branch parameters θ𝖧𝗂(n)\mathsf{\theta_{Hi}}(n) and θ𝖫𝗈(n)\mathsf{\theta_{Lo}}(n) of decision node nn, we require a counter for each of its branches, 𝖧𝗂#(n)\mathsf{{Hi}_{\#}}(n) and 𝖫𝗈#(n)\mathsf{{Lo}_{\#}}(n) respectively. In the learning process, we have a dataset of assignments for Boolean variables in the Boolean formula represented by 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi. For each assignment τ\tau in the dataset, we perform a top-down traversal of ψ\psi following Algorithm 1. In the traversal, we visit all child branches of conjunction nodes (line 4) and the child branch of decision node nn corresponding to the assignment of 𝖵𝖺𝗋(n)\mathsf{Var}(n) in τ\tau (lines 6 to 12), and increment the corresponding counters in the process. Subsequently, the branch parameters for node nn are updated according to the following formulas.

θ𝖧𝗂(n)=𝖧𝗂#(n)+1𝖧𝗂#(n)+𝖫𝗈#(n)+2θ𝖫𝗈(n)=𝖫𝗈#(n)+1𝖧𝗂#(n)+𝖫𝗈#(n)+2\mathsf{\theta_{Hi}}(n)=\frac{\mathsf{{Hi}_{\#}}(n)+1}{\mathsf{{Hi}_{\#}}(n)+\mathsf{{Lo}_{\#}}(n)+2}\qquad\hfill\mathsf{\theta_{Lo}}(n)=\frac{\mathsf{{Lo}_{\#}}(n)+1}{\mathsf{{Hi}_{\#}}(n)+\mathsf{{Lo}_{\#}}(n)+2}

While we add 1 to numerator and 2 to denominator as a form of Laplace smoothing [23], other forms of smoothing to account for division by zero is possible. Notice that the learnt branch parameters of node nn are in fact approximations of conditional probabilities according to Proposition 1 and Remark 1 as follows.

Proposition 1.

Let n1n1 and n2n2 be decision nodes where n1=𝖯𝖺𝗋𝖾𝗇𝗍(n2)n1={\mathsf{Parent}(n2)} and 𝖫𝗈(n1)=n2\mathsf{Lo}(n1)=n2, θ𝖫𝗈(n2)=𝖫𝗈#(n2)+1𝖫𝗈#(n1)+2\mathsf{\theta_{Lo}}(n2)=\frac{\mathsf{{Lo}_{\#}}(n2)+1}{\mathsf{{Lo}_{\#}}(n1)+2} and θ𝖧𝗂(n2)=𝖧𝗂#(n2)+1𝖫𝗈#(n1)+2\mathsf{\theta_{Hi}}(n2)=\frac{\mathsf{{Hi}_{\#}}(n2)+1}{\mathsf{{Lo}_{\#}}(n1)+2}.

Proof.

Recall that the Lo branch parameter of n2n2 is:

θ𝖫𝗈(n2)=𝖫𝗈#(n2)+1𝖧𝗂#(n2)+𝖫𝗈#(n2)+2\displaystyle\mathsf{\theta_{Lo}}(n2)=\frac{\mathsf{{Lo}_{\#}}(n2)+1}{\mathsf{{Hi}_{\#}}(n2)+\mathsf{{Lo}_{\#}}(n2)+2}

Notice that 𝖧𝗂#(n2)+𝖫𝗈#(n2)=𝖫𝗈#(n1)\mathsf{{Hi}_{\#}}(n2)+\mathsf{{Lo}_{\#}}(n2)=\mathsf{{Lo}_{\#}}(n1) as all top-down traversals of ψ\psi that pass through n2n2 will have to pass through the Lo branch of n1n1.

θ𝖫𝗈(n2)=𝖫𝗈#(n2)+1𝖫𝗈#(n1)+2\displaystyle\mathsf{\theta_{Lo}}(n2)=\frac{\mathsf{{Lo}_{\#}}(n2)+1}{\mathsf{{Lo}_{\#}}(n1)+2}

A similar argument can be made for θ𝖧𝗂(n2)\mathsf{\theta_{Hi}}(n2) by symmetry. In the general case if n2n2 has more than one parent, then the term 𝖧𝗂#(n2)+𝖫𝗈#(n2)\mathsf{{Hi}_{\#}}(n2)+\mathsf{{Lo}_{\#}}(n2) is the sum of counts of branch traversals of all parent nodes of n2n2 that leads to n2n2. Additionally, any conjunction node cc between n1n1 and n2n2 will not affect the proof because all children of cc will be traversed. For understanding, one can refer to the example in Figure 2 where n1n1 corresponds to the root node.

Remark 1.

Recall that 𝖵𝖺𝗋(n1)=x\mathsf{Var}(n1)=x and 𝖵𝖺𝗋(n2)=y\mathsf{Var}(n2)=y in 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ1\psi_{1} in Figure 2. Observe that 𝖫𝗈#(n2)𝖫𝗈#(n1)\frac{\mathsf{{Lo}_{\#}}(n2)}{\mathsf{{Lo}_{\#}}(n1)} for 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ1\psi_{1} in Figure 2 is the conditional probability Pr(¬y|¬x)Pr(\neg y|\neg x) as it represents the count of traversals that passed through Lo branch of n2n2 out of total count of traversals that passed through Lo branch of n1n1. A similar observation can be made for 𝖧𝗂(n2)\mathsf{Hi}(n2).

Notice that as the 𝖫𝗈#(n2)\mathsf{{Lo}_{\#}}(n2) and 𝖫𝗈#(n1)\mathsf{{Lo}_{\#}}(n1) becomes significantly large, that is 𝖫𝗈#(n2)>>1\mathsf{{Lo}_{\#}}(n2)>>1 and 𝖫𝗈#(n1)>>2\mathsf{{Lo}_{\#}}(n1)>>2:

θ𝖫𝗈(n2)=𝖫𝗈#(n2)+1𝖫𝗈#(n1)+2𝖫𝗈#(n2)𝖫𝗈#(n1)=Pr(¬y|¬x)\displaystyle\mathsf{\theta_{Lo}}(n2)=\frac{\mathsf{{Lo}_{\#}}(n2)+1}{\mathsf{{Lo}_{\#}}(n1)+2}\approx\frac{\mathsf{{Lo}_{\#}}(n2)}{\mathsf{{Lo}_{\#}}(n1)}=Pr(\neg y|\neg x)

As such, the learnt branch parameters are approximate conditional probabilities.

0.3.3 Sampling Trip Query Answers

Input: 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi, partial assignment τ\tau^{\prime}

Output: complete assignment that agrees with τ\tau^{\prime}

1:  caches ω,γ\omega,\gamma \longleftarrow initCache()
2:  for node nn in bottom-up ordering of ψ\psi do
3:     if nn is \top node then
4:        ω[n]\omega[n]\longleftarrow\emptyset, γ[n]1\gamma[n]\longleftarrow 1
5:     else if nn is \bot node then
6:        ω[n]\omega[n]\longleftarrow 𝖨𝗇𝗏𝖺𝗅𝗂𝖽\mathsf{Invalid}, γ[n]0\gamma[n]\longleftarrow 0
7:     else if nn is \land node then
8:        ω[n]\omega[n]\longleftarrow 𝗎𝗇𝗂𝗈𝗇𝖢𝗁𝗂𝗅𝖽\mathsf{unionChild}(𝖢𝗁𝗂𝗅𝖽(n),ω{\mathsf{Child}(n)},\omega)
9:        γ[n]c𝖢𝗁𝗂𝗅𝖽(n)γ[c]\gamma[n]\longleftarrow\prod_{c\in\mathsf{Child}(n)}\gamma[c]
10:     else
11:        if 𝖵𝖺𝗋(n){\mathsf{Var}(n)} in τ\tau^{\prime} then
12:           if ω[τ[𝖵𝖺𝗋(n)]]\omega[\tau^{\prime}[{\mathsf{Var}(n)}]] is 𝖨𝗇𝗏𝖺𝗅𝗂𝖽\mathsf{Invalid} then
13:              ω[n]\omega[n]\longleftarrow 𝖨𝗇𝗏𝖺𝗅𝗂𝖽\mathsf{Invalid}, γ[n]0\gamma[n]\longleftarrow 0
14:           else
15:              ω[n]\omega[n]\longleftarrow 𝖿𝗈𝗅𝗅𝗈𝗐𝖠𝗌𝗌𝗂𝗀𝗇\mathsf{followAssign}(τ\tau)
16:              if τ[𝖵𝖺𝗋(n)]\tau^{\prime}[{\mathsf{Var}(n)}] is ¬𝖵𝖺𝗋(n)\neg\mathsf{Var}(n) then
17:                 γ[n]θ𝖫𝗈(n)×γ[𝖫𝗈(n)]\gamma[n]\longleftarrow\mathsf{\theta_{Lo}}(n)\times\gamma[\mathsf{Lo}(n)]
18:              else
19:                 γ[n]θ𝖧𝗂(n)×γ[𝖧𝗂(n)]\gamma[n]\longleftarrow\mathsf{\theta_{Hi}}(n)\times\gamma[\mathsf{Hi}(n)]
20:        else
21:           lθ𝖫𝗈(n)×γ[𝖫𝗈(n)]l\longleftarrow\mathsf{\theta_{Lo}}(n)\times\gamma[\mathsf{Lo}(n)]
22:           hθ𝖧𝗂(n)×γ[𝖧𝗂(n)]h\longleftarrow\mathsf{\theta_{Hi}}(n)\times\gamma[\mathsf{Hi}(n)]
23:           γ[n]l+h\gamma[n]\longleftarrow l+h
24:           α𝖡𝗂𝗇𝗈𝗆𝗂𝖺𝗅(hl+h)\alpha\longleftarrow\mathsf{Binomial}(\frac{h}{l+h})
25:           if α\alpha is 1 then
26:              ω[n]ω[𝖧𝗂(n)]𝖵𝖺𝗋(n)\omega[n]\longleftarrow\omega[\mathsf{Hi}(n)]\cup\mathsf{Var}(n)
27:           else
28:              ω[n]ω[𝖫𝗈(n)]¬𝖵𝖺𝗋(n)\omega[n]\longleftarrow\omega[\mathsf{Lo}(n)]\cup\neg\mathsf{Var}(n)
29:  return ω\omega[rootnode(ψ\psi)]
Algorithm 2 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} - returns sampled assignment

The ability to conditionally sample trips is critical to handling trip queries for arbitrary start-end vertices, for which a trip is to be sampled conditioned on the given start and end vertices. In this work, we adapted the weighted sampling algorithm using 𝖯𝖱𝖮𝖡\mathsf{PROB}, which was introduced by prior work [37], to support conditional sampling and denote it as 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample}.

Algorithm 2, 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample}, performs sampling of satisfying assignments from a 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi in a bottom-up manner. 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} takes an input 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi and partial assignment τ\tau^{\prime} and returns a sampled complete assignment that agrees with τ\tau^{\prime}. The input τ\tau^{\prime} specifies the terminal vertices for a given trip query by assigning the ss-type variables. 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} employs two caches ω\omega and γ\gamma, for partially sampled assignment at each node and joint probabilities during the sampling process. In the process, 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} performs calculations of joint probabilities at each node. In addition, 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} stores the partial samples at each node in ω\omega. The partial sample for a false node would be 𝖨𝗇𝗏𝖺𝗅𝗂𝖽\mathsf{Invalid} as it means that an assignment is unsatisfiable. On the other hand, the partial sample for a true node is \emptyset which will be incremented with variable assignments during the processing of internal nodes of ψ\psi. The partially sampled assignment at every \land-node cc is the union of the samples of all its child nodes, as the child nodes have mutually disjoint variable sets due to decomposability property. For a decision node dd, if 𝖵𝖺𝗋(d){\mathsf{Var}(d)} is in τ\tau^{\prime}, the partial sample at dd will be the union of the literal in τ\tau^{\prime} and the partial sample at the corresponding child node (lines 11 to 19) to condition on τ\tau^{\prime}. Otherwise, the partial assignment at dd is sampled according to the weighted joint probabilities ll and hh (lines 21 to 28). Finally, the output of 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} would be the sampled assignment at the root node of ψ\psi. To extend 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} to sample kk complete assignments, one has to keep kk partial assignments in ω\omega at each node during the sampling process and sample kk independent partial assignments at each decision node.

Proposition 2.

Let 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi represent Boolean formula FF, 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} samples τ𝖲𝗈𝗅(F)\tau\in\mathsf{Sol}(F) according to the joint branch parameters, that is n𝖱𝖾𝗉ψ(τ)[(1In)θ𝖫𝗈(n)+Inθ𝖧𝗂(n)]\prod_{n\in\mathsf{Rep_{\psi}}(\tau)}[(1-I_{n})\mathsf{\theta_{Lo}}(n)+I_{n}\mathsf{\theta_{Hi}}(n)] where InI_{n} is 1 if 𝖧𝗂(n)𝖱𝖾𝗉ψ(τ)\mathsf{Hi}(n)\in\mathsf{Rep_{\psi}}(\tau) and 0 otherwise.

Proof.

Let ψ\psi be a 𝖯𝖱𝖮𝖡\mathsf{PROB} that only consists of decision nodes as internal nodes. At each decision node dd in the bottom-up sampling pass, assignment of 𝖵𝖺𝗋(d)\mathsf{Var}(d) is sampled proportional to θ𝖫𝗈(d)×γ[𝖫𝗈(d)]\mathsf{\theta_{Lo}}(d)\times\gamma[\mathsf{Lo}(d)] and θ𝖧𝗂(d)×γ[𝖧𝗂(d)]\mathsf{\theta_{Hi}}(d)\times\gamma[\mathsf{Hi}(d)] to be false and true respectively. Without loss of generality, we focus on the term θ𝖫𝗈(d)×γ[𝖫𝗈(d)]\mathsf{\theta_{Lo}}(d)\times\gamma[\mathsf{Lo}(d)], a similar argument would follow for the other branch by symmetry.

Let d2d2 denote 𝖫𝗈(d)\mathsf{Lo}(d). Notice that γ[d2]\gamma[d2] is θ𝖫𝗈(d2)×γ[𝖫𝗈(d2)]+θ𝖧𝗂(d2)×γ[𝖧𝗂(d2)]\mathsf{\theta_{Lo}}(d2)\times\gamma[\mathsf{Lo}(d2)]+\mathsf{\theta_{Hi}}(d2)\times\gamma[\mathsf{Hi}(d2)]. Expanding the equation, the probability of sampling ¬𝖵𝖺𝗋(d)\neg\mathsf{Var}(d) is θ𝖫𝗈(d)×θ𝖫𝗈(d2)×γ[𝖫𝗈(d2)]+θ𝖫𝗈(d)×θ𝖧𝗂(d2)×γ[𝖧𝗂(d2)]\mathsf{\theta_{Lo}}(d)\times\mathsf{\theta_{Lo}}(d2)\times\gamma[\mathsf{Lo}(d2)]+\mathsf{\theta_{Lo}}(d)\times\mathsf{\theta_{Hi}}(d2)\times\gamma[\mathsf{Hi}(d2)]. If we expand γ[𝖫𝗈(d)]\gamma[\mathsf{Lo}(d)] recursively, we are considering all possible satisfying assignments of 𝖵𝖺𝗋𝖲𝖾𝗍(𝖫𝗈(d))\mathsf{VarSet}(\mathsf{Lo}(d)), more specifically we would be taking the sum of the product of corresponding branch parameters of each possible satisfying assignment of 𝖵𝖺𝗋𝖲𝖾𝗍(𝖫𝗈(d))\mathsf{VarSet}(\mathsf{Lo}(d)).

Observe that 𝖵𝖺𝗋(d)\mathsf{Var}(d) is sampled to be assigned false with probability θ𝖫𝗈(d)×θ𝖫𝗈(d2)×γ[𝖫𝗈(d2)]+θ𝖫𝗈(d)×θ𝖧𝗂(d2)×γ[𝖧𝗂(d2)]\mathsf{\theta_{Lo}}(d)\times\mathsf{\theta_{Lo}}(d2)\times\gamma[\mathsf{Lo}(d2)]+\mathsf{\theta_{Lo}}(d)\times\mathsf{\theta_{Hi}}(d2)\times\gamma[\mathsf{Hi}(d2)]. Similarly, 𝖵𝖺𝗋(d2)\mathsf{Var}(d2) is sampled to be assigned false with probability θ𝖫𝗈(d2)×γ[𝖫𝗈(d2)]\mathsf{\theta_{Lo}}(d2)\times\gamma[\mathsf{Lo}(d2)]. Notice that if we view the bottom-up process in reverse, the probability of sampling ¬𝖵𝖺𝗋(d)\neg\mathsf{Var}(d) and ¬𝖵𝖺𝗋(d2)\neg\mathsf{Var}(d2) is θ𝖫𝗈(d)×θ𝖫𝗈(d2)×γ[𝖫𝗈(d2)]\mathsf{\theta_{Lo}}(d)\times\mathsf{\theta_{Lo}}(d2)\times\gamma[\mathsf{Lo}(d2)]. In the general case, it then follows that a satisfying assignment would reach the true node which has γ\gamma value set to 1. It then follows that for each τ𝖲𝗈𝗅(F)\tau\in\mathsf{Sol}(F), τ\tau is sampled with probability P=n𝖱𝖾𝗉ψ(τ)[(1In)θ𝖫𝗈(n)+Inθ𝖧𝗂(n)]P=\prod_{n\in\mathsf{Rep}_{\psi}(\tau)}[(1-I_{n})\mathsf{\theta_{Lo}}(n)+I_{n}\mathsf{\theta_{Hi}}(n)]. Notice that \land-nodes have no impact on the sampling probability as no additional terms are introduced in the product of branch parameters. ∎

Remark 2.

Recall in Remark 1 that θ𝖧𝗂(n)\mathsf{\theta_{Hi}}(n) and θ𝖫𝗈(n)\mathsf{\theta_{Lo}}(n) are approximately conditional probabilities. By Proposition 2, assignment τ𝖲𝗈𝗅(F)\tau\in\mathsf{Sol}(F) is sampled with probability proportional to n𝖱𝖾𝗉ψ(τ)[(1In)θ𝖫𝗈(n)+Inθ𝖧𝗂(n)]\prod_{n\in\mathsf{Rep}_{\psi}(\tau)}[(1-I_{n})\mathsf{\theta_{Lo}}(n)+I_{n}\mathsf{\theta_{Hi}}(n)]. Notice that if we rewrite the product of branch parameters as the product of approximate conditional probability, it is approximately the joint probability of sampling τ\tau.

Refinement

In the refinement step, we extract a trip from sampled assignment τ\tau by removing spurious disjoint loop components using depth-first search.

Definition 2.

Let :𝖲𝗈𝗅(F)𝖲𝗂𝗆𝗉𝗅𝖾𝖳𝗋𝗂𝗉(G)\mathcal{M^{\prime}}:\mathsf{Sol}(F)\mapsto\mathsf{SimpleTrip}(G) be the mapping function of the refinement process, for a given graph GG and its relaxed encoding FF. For an assignment τ𝖲𝗈𝗅(F)\tau\in\mathsf{Sol}(F), let VτV_{\tau} be the set of vertices in GG that have their nn-type variables set to true in τ\tau. A depth-first search is performed from the starting vertex on VτV_{\tau}, removing disjoint components. The resultant simple path is π=(G)\pi=\mathcal{M^{\prime}}(G).

Although ()\mathcal{M^{\prime}}(\cdot) is a many-to-one (i.e. surjective) mapping function, it is not a concern in practice as trips with disjoint loop components are unlikely to occur in real-world or synthetic trip data from which probabilities can be learned.

Theorem 1.

Given vs,vtGv_{s},v_{t}\in G, let πs,t𝖲𝗂𝗆𝗉𝗅𝖾𝖳𝗋𝗂𝗉(G)\pi_{s,t}\in\mathsf{SimpleTrip}(G) be a trip between vsv_{s} and vtv_{t}. Let Rπs,t={τ(τ𝖲𝗈𝗅(F))((τ)=πs,t)}R_{\pi_{s,t}}=\{\tau\mid(\tau\in\mathsf{Sol}(F))\wedge(\mathcal{M^{\prime}}(\tau)=\pi_{s,t})\}. Then,

Pr[πs,t is sampled]τRπs,tn𝖱𝖾𝗉ψ(τ)[(1In)θ𝖫𝗈(n)+Inθ𝖧𝗂(n)]\Pr[\pi_{s,t}\text{ is sampled}]\propto\sum\limits_{\tau\in R_{\pi_{s,t}}}\prod_{n\in\mathsf{Rep}_{\psi}(\tau)}[(1-I_{n})\mathsf{\theta_{Lo}}(n)+I_{n}\mathsf{\theta_{Hi}}(n)]
Proof.

From Definition 1 and 2, one can say that given a graph GG and its relaxed encoding FF, π𝖲𝗂𝗆𝗉𝗅𝖾𝖳𝗋𝗂𝗉(G),τ𝖲𝗈𝗅(F)\forall\pi\in\mathsf{SimpleTrip}(G),\exists\tau\in\mathsf{Sol}(F) such that (τ)=π\mathcal{M^{\prime}}(\tau)=\pi.

Notice that sampling πs,t\pi_{s,t} amounts to sampling τRπs,t\tau\in R_{\pi_{s,t}}. As such, the probability of sampling πs,t\pi_{s,t} would be the sum over probability of sampling each member of Rπs,tR_{\pi_{s,t}}. Recall that the probability of sampling a single assignment τ\tau is proportional to n𝖱𝖾𝗉ψ(τ)[(1In)θ𝖫𝗈(n)+Inθ𝖧𝗂(n)]\prod_{n\in\mathsf{Rep}_{\psi}(\tau)}[(1-I_{n})\mathsf{\theta_{Lo}}(n)+I_{n}\mathsf{\theta_{Hi}}(n)] by Proposition 2. As such the probability Pr[πs,t is sampled]Pr[\pi_{s,t}\text{ is sampled}] is proportional to τRπs,tn𝖱𝖾𝗉ψ(τ)[(1In)θ𝖫𝗈(n)+Inθ𝖧𝗂(n)]\sum_{\tau\in R_{\pi_{s,t}}}\prod_{n\in\mathsf{Rep}_{\psi}(\tau)}[(1-I_{n})\mathsf{\theta_{Lo}}(n)+I_{n}\mathsf{\theta_{Hi}}(n)]. ∎

Remark 3.

It is worth noting that Pr[πs,t is sampled]>0Pr[\pi_{s,t}\text{ is sampled}]>0, as all branch parameters are greater than 0 by definition. Recall that branch parameters are computed with a ’+1’ in numerator and ’+2’ in denominator, and given that branch counters are 0 or larger, branch parameters are strictly positive.

0.4 Experiments

In order to evaluate the efficacy of 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}, we built a prototype in Python 3.8 with NumPy [16], toposort [33], OSMnx[3], and NetworkX [15] packages. We employ KCBox tool111https://github.com/meelgroup/KCBox for OBDD[\land] compilation [20]. The experiments were conducted on a cluster of machines with Intel Xeon Platinum 8272CL processors and 64GB of memory. In the experiments, we evaluated 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} against an adaptation of the state-of-the-art probabilistic routing approach [6] and an off-the-shelf non-probabilistic routing library, Pyroutelib3 [36], in terms of quality of trip suggestions and runtime performance. In particular, we adapted the state-of-the-art approach by Choi et al [6] to sample for trips instead of computing the most probable trip and refer to the adapted approach as ‘CSD’ in the rest of the section. In addition, we compared our relaxed encoding to existing path encodings across various graphs, specifically to absolute encoding and compact encoding [29].

Through the experiments, we investigate the following:

R1

Can 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} effectively learn from data and sample quality trips?

R2

How efficient is our relaxed encoding technique?

R3

What is the runtime performance of the 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}?

Data Generation

In this study, we use the real-world road network of Singapore obtained from OpenStreetMap [25] using OSMnx. The road network graph GrG_{r} consisted of 23522 vertices and 45146 edges along with their lengths. In addition, we also use an abstracted graph222We use the geohash system (geohash level 5) of partitioning the road network graph. For more information on the format http://geohash.org/site/tips.html#format of GrG_{r} which we denote as GaG_{a} for the remaining of this section. A vertex in GaG_{a} corresponds to a unique subgraph of GrG_{r}.

Synthetic trips were generated by deviating from shortest path given start and end vertices. A random pair of start and end vertices were selected in GrG_{r} and the shortest path π\pi was found. Next, the corresponding intermediate regions of π\pi in GaG_{a} are blocked in GrG_{r}, and a new shortest path π\pi^{\prime} was found and deemed to be the synthetic trip generated. We generated 11000 synthetic trips, 10000 for training and 1000 for evaluation. While we used GaG_{a} to keep the trip sampling time reasonable, it is possible to use more fine-grained regions for offline applications.

0.4.1 R1: 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}’s Ability to Learn Probabilities

To understand 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}’s ability to learn probabilities from data, we evaluate its ability to produce trips that closely resembles the ground truth. Both 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} and CSD, which are sampling-based approaches, were evaluated by sampling 20 trips and taking the median match rate for each instance. Recall that the ε\varepsilon-match rate is defined as the proportion of vertices in the ground truth trip that were within ε\varepsilon meters of euclidean distance from the closest vertex in the proposed trip. In the evaluation, we set the ε\varepsilon tolerance to be the median edge length of GrG_{r}, which is 64.359 meters, to account for parallel trips. To further emphasize the advantages of probabilistic approaches, we included an off-the-shelf routing library, Pyroutelib3 [36], in the comparison.

In order to conduct a fair comparison, we have adapted the CSD approach to utilize 𝖯𝖱𝖮𝖡\mathsf{PROB} derived from our relaxed encoding. Our evaluation utilizes this adapted approach to sample a trip on GaG_{a} in a stepwise manner, where the probability of the next step is conditioned on the previous step and destination. The conditional probabilities are computed in a similar manner to the computations of joint probabilities, which are the γ\gamma cache values, in the 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample}. The predicted trip on the road network GrG_{r} is determined by the shortest trip on the subgraph formed by the sequence of sampled regions. In contrast, 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} samples a trip on GaG_{a} in a single pass, and subsequently retrieves the shortest trip on the subgraph of the sampled regions as the predicted trip on GrG_{r}. It is worth noting that for sampling-based approaches, there may be instances where a trip cannot be found on GrG_{r} due to factors such as a region in GaG_{a} containing disconnected components. We incorporated a rejection sampling process with a maximum of 400 attempts and 5 minutes to account for such scenarios.

Stats Exact Match ε\varepsilon-Match
Pyroutelib CSD 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} Pyroutelib CSD 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}
25% 0.045 0.049 0.082 0.061 0.066 0.102
50% 0.088 0.160 0.310 0.107 0.172 0.316
75% 0.185 0.660 1.000 0.208 0.663 1.000
Mean 0.151 0.359 0.445 0.171 0.372 0.456
Table 1: Match rate statistics for completed benchmark instances by respective methods. The percentages under ‘Stats’ column refer to the corresponding percentiles. ‘Exact Match’ refers to match rate when ε=0\varepsilon=0, and ‘ε\varepsilon-Match’ refers to match rate when ε\varepsilon is set to median edge length of GrG_{r}.

Table 1 shows the match rate statistics of the respective methods. Under ε\varepsilon-Match setting, where ε\varepsilon is set as the median edge length of GrG_{r} to account for parallel trips, 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} has the highest match rate among the three approaches. In addition, 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} produced perfect matches for more than 25% of instances. 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} has 0.316 ε\varepsilon-match rate on median, significantly more than 0.172 for CSD and 0.107 for Pyroutelib. The trend is similar for exact matches, where ε\varepsilon is set to 0 as shown under the ‘Exact Match’ columns in Table 1. In the exact match setting, 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} achieved a median of 0.310 match rate, almost double that of CSD’s 0.160 median match rate. The evaluation results also demonstrate the usefulness of probabilistic approaches such as 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}, especially in scenarios where experienced drivers navigate according to their own heuristics which may be independent of the shortest trip. In particular, 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} would be able to learn and suggest trips that align with the unknown heuristics of driver preferences given start and destination locations. Thus, the results provide an affirmative answer to R1.

0.4.2 R2: Efficiency of Relaxed Encoding

Encoding Grid SGP
22 33 44 55
Absolute 99 1500 31768 1824769 TO
Compact 771 TO TO TO TO
Relaxed(Ours) 31 146 2368 20030 38318
Table 2: Comparison of OBDD[\land] size for different graphs, with 3600s timeout. Grid 2 refers to a 2x2 grid graph. SGP refers to abstract graph (GaG_{a}) of Singapore road network.

We compared our relaxed encoding to existing path encodings across various graphs, specifically to absolute encoding and compact encoding [29]. In the experiment, we had to adapt compact encoding to CNF form with Tseitin transformation [34], as CNF is the standard input for compilation tools. We compiled the CNFs of the encodings into OBDD[\land] form with 3600s compilation timeout and compared the size of resultant diagrams. The results are shown in Table 2, with rows corresponding to the different encodings used and columns corresponding to different graphs being encoded. Entries with TO indicate that the compilation has timed out. Table 2 shows that our relaxed encoding consistently results in smaller decision diagrams, up to 91×\times smaller. It is also worth noting that relaxed encoding is the only encoding that leads to compilation times under 3600s for the abstracted Singapore graph. The results strongly support our claims about the significant improvements that our relaxed encoding brings.

0.4.3 R3: 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}’s Runtime Performance

Stats CSDPyroutelib×103\frac{\text{CSD}}{\text{Pyroutelib}}\times 10^{3} 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾Pyroutelib×103\frac{\text{{$\mathsf{ProbRoute}$}}}{\text{Pyroutelib}}\times 10^{3}
25% 6.33 1.40
50% 21.64 2.00
75% 47.90 3.03
Mean 36.16 2.62
Table 3: Relative runtime statistics (lower is better) for completed instances by CSD and 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}. Under column ‘CSDPyroutelib\frac{\text{CSD}}{\text{Pyroutelib}}’ and row ‘50%’, CSD approach takes a median of 21.64×10321.64\times 10^{3} times the runtime of Pyroutelib.

For wide adoption of new routing approaches, it is crucial to be able to handle the runtime demands of existing applications. As such, we measured the relative runtimes of probabilistic approaches, that is 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} and CSD, with respect to existing routing system Pyroutelib and show the relative runtimes in Table 3. From the result, 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} is more than one order of magnitude faster on median than the existing probabilistic approach CSD. The result also shows that 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} is also on median more than a magnitude closer to Pyroutelib’s runtime using the same 𝖯𝖱𝖮𝖡\mathsf{PROB} as compared to CSD approach. In addition, CSD approach timed out on 650 of the 1000 test instances, while 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} did not time out. Additionally, as mentioned in [6], CSD does not guarantee being able to produce a complete trip from start to destination. The results in Table 3 highlight the progress made by 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} in closing the gap between probabilistic routing approaches and existing routing systems.

0.5 Conclusion

Whilst we have demonstrated the efficiency of our approach, there are possible extensions to make our approach more appealing for wide adoption. In terms of runtime performance, our approach is three orders of magnitude slower than existing probability agnostic routing systems. As such, there is still room for runtime improvements for our approach to be functional replacements of existing routing systems. Additionally, our relaxed encoding only handles undirected graphs at the moment and it would be of practical interest to extend the encoding to directed graphs to handle one-way streets. Furthermore, it would also be of interest to incorporate ideas to improve runtime performance from existing hierarchical path finding algorithms such as contractual hierarchies, multi-level dijkstra and other related works [13, 14, 21].

In summary, we focused on addressing the scalability barrier for reasoning over route distributions. To this end, we contribute two techniques: a relaxation and refinement approach that allows us to efficiently and compactly compile routes corresponding to real-world road networks, and a one-pass route sampling technique. We demonstrated the effectiveness of our approach on a real-world road network and observed around 91×91\times smaller 𝖯𝖱𝖮𝖡\mathsf{PROB}, 10×10\times faster trip sampling runtime and almost 2×2\times the match rate of state-of-the-art probabilistic approach, bringing probabilistic approaches closer to achieving comparable runtime to traditional routing tools.

Acknowledgments

We sincerely thank Yong Lai for the insightful discussions. We sincerely thank reviewers for the constructive feedback. Suwei Yang is supported by the Grab-NUS AI Lab, a joint collaboration between GrabTaxi Holdings Pte. Ltd., National University of Singapore, and the Industrial Postgraduate Program (Grant: S18-1198-IPP-II) funded by the Economic Development Board of Singapore. Kuldeep S. Meel is supported in part by National Research Foundation Singapore under its NRF Fellowship Programme(NRF-NRFFAI1-2019-0004), Ministry of Education Singapore Tier 2 grant (MOE-T2EP20121-0011), and Ministry of Education Singapore Tier 1 Grant (R-252-000-B59-114).

References

  • [1] Ravindra K Ahuja, Kurt Mehlhorn, James Orlin, and Robert E Tarjan. Faster algorithms for the shortest path problem. Journal of the ACM (JACM), 37(2):213–223, 1990.
  • [2] Siddhartha Banerjee, Carlos Riquelme, and R. Johari. Pricing in ride-share platforms: A queueing-theoretic approach. Econometrics: Econometric & Statistical Methods - Special Topics eJournal, 2015.
  • [3] Geoff Boeing. Osmnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Econometrics: Computer Programs & Software eJournal, 2017.
  • [4] George Boole. An investigation of the laws of thought: On which are founded the mathematical theories of logic and probabilities. 1854.
  • [5] Randal E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. Comput., 35(8):677–691, 1986.
  • [6] Arthur Choi, Yujia Shen, and Adnan Darwiche. Tractability in structured probability spaces. In NeurIPS, volume 30, pages 3477–3485, 2017.
  • [7] Arthur Choi, Guy Van den Broeck, and Adnan Darwiche. Probability distributions over structured spaces. In AAAI, 2015.
  • [8] YooJung Choi, A. Vergari, and Guy Van den Broeck. Probabilistic circuits: A unifying framework for tractable probabilistic models. 2020.
  • [9] Regina R. Clewlow and G. Mishra. Disruptive transportation: The adoption, utilization, and impacts of ride-hailing in the united states. 2017.
  • [10] Jack Collison. The impact of online food delivery services on restaurant sales. 2020.
  • [11] Adnan Darwiche. Sdd: A new canonical representation of propositional knowledge bases. In IJCAI, 2011.
  • [12] Adnan Darwiche and P. Marquis. A knowledge compilation map. J. Artif. Intell. Res., 17:229–264, 2002.
  • [13] Daniel Delling, Andrew Goldberg, Thomas Pajor, and Renato Werneck. Customizable route planning. In Proceedings of the 10th International Symposium on Experimental Algorithms, 2011.
  • [14] Robert Geisberger, Peter Sanders, Dominik Schultes, and Christian Vetter. Exact routing in large road networks using contraction hierarchies. Transp. Sci., 2012.
  • [15] Aric A. Hagberg, Daniel A. Schult, and Pieter Swart. Exploring network structure, dynamics, and function using networkx. In Proceedings of the 7th Python in Science Conference, pages 11 – 15, 2008.
  • [16] C. Harris, K. J. Millman, S. Walt, Ralf Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. Smith, R. Kern, Matti Picus, S. Hoyer, M. Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fern’andez del R’io, Mark Wiebe, P. Peterson, Pierre G’erard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, Christoph Gohlke, and T. E. Oliphant. Array programming with numpy. Nature, 585:357 – 362, 2020.
  • [17] Takeru Inoue, Hiroaki Iwashita, Jun Kawahara, and Shin ichi Minato. Graphillion: software library for very large sets of labeled graphs. International Journal on Software Tools for Technology Transfer, 2016.
  • [18] Jun Kawahara, Takeru Inoue, Hiroaki Iwashita, and Shin ichi Minato. Frontier-based search for enumerating all constrained subgraphs with compressed representation. IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2017.
  • [19] Donald Ervin Knuth. The art of computer programming, volume 4, fascicle 2: Generating all tuples and permutations. 2005.
  • [20] Yong Lai, Dayou Liu, and Minghao Yin. New canonical representations by augmenting obdds with conjunctive decomposition. Journal of Artificial Intelligence Research, 58:453–521, 2017.
  • [21] Ken C. K. Lee, Wang-Chien Lee, Baihua Zheng, and Yuan Tian. Road: A new spatial object search framework for road networks. IEEE Transactions on Knowledge and Data Engineering, 2012.
  • [22] J. Letchner, John Krumm, and E. Horvitz. Trip router with individualized preferences (trip): Incorporating personalization into route planning. In AAAI, 2006.
  • [23] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. 2008.
  • [24] R. Mateescu, R. Dechter, and Radu Marinescu. And/or multi-valued decision diagrams (aomdds) for graphical models. J. Artif. Intell. Res., 2008.
  • [25] OpenStreetMap contributors. Planet dump retrieved from https://planet.osm.org . https://www.openstreetmap.org, 2017.
  • [26] Adam Paszke, S. Gross, Francisco Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, Alban Desmaison, Andreas Köpf, E. Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, B. Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
  • [27] Robert Peharz, Steven Lang, A. Vergari, Karl Stelzner, Alejandro Molina, M. Trapp, Guy Van den Broeck, K. Kersting, and Zoubin Ghahramani. Einsum networks: Fast and scalable learning of tractable probabilistic circuits. In ICML, 2020.
  • [28] Hoifung Poon and Pedro M. Domingos. Sum-product networks: A new deep architecture. 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pages 689–690, 2011.
  • [29] S. Prestwich. Sat problems with chains of dependent variables. Discret. Appl. Math., 130:329–350, 2003.
  • [30] Daniel J Rosenkrantz, Richard Edwin Stearns, and Philip M Lewis. Approximate algorithms for the traveling salesperson problem. In 15th Annual Symposium on Switching and Automata Theory (swat 1974), pages 33–42. IEEE, 1974.
  • [31] Yujia Shen, Arthur Choi, and Adnan Darwiche. Conditional psdds: Modeling and learning with modular knowledge. In AAAI, 2018.
  • [32] Yujia Shen, Anchal Goyanka, Adnan Darwiche, and Arthur Choi. Structured bayesian networks: From inference to learning with routes. In AAAI, 2019.
  • [33] Eric V. Smith. toposort, 2022.
  • [34] G. S. Tseitin. On the complexity of derivation in propositional calculus. 1983.
  • [35] Zheng Wang, Kun Fu, and Jieping Ye. Learning to estimate the travel time. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.
  • [36] Oliver White and Mikolaj Kuranowski. Pyroutelib3, 2017.
  • [37] Suwei Yang, Victor Liang, and Kuldeep S. Meel. Inc: A scalable incremental weighted sampler. In FMCAD 2022, page 205, 2022.

Appendix

Smoothing

Input: 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi

1:  for node nn in bottom-up pass of ψ\psi do
2:     if nn is \top or \bot node then
3:        𝖵𝖺𝗋𝖲𝖾𝗍(n)\mathsf{VarSet}(n)\longleftarrow\emptyset
4:     else if nn is \land node then
5:        𝖵𝖺𝗋𝖲𝖾𝗍(n)nc𝖢𝗁𝗂𝗅𝖽(n)𝖵𝖺𝗋𝖲𝖾𝗍(nc)\mathsf{VarSet}(n)\longleftarrow\bigcup_{n_{c}\in\mathsf{Child}(n)}\mathsf{VarSet}(n_{c})
6:     else
7:        δ𝖼𝗋𝖾𝖺𝗍𝖾𝖣𝖾𝖼𝗂𝗌𝗂𝗈𝗇𝖭𝗈𝖽𝖾𝗌(𝖵𝖺𝗋𝖲𝖾𝗍(𝖫𝗈(n))𝖵𝖺𝗋𝖲𝖾𝗍(𝖧𝗂(n)))\delta\longleftarrow\mathsf{createDecisionNodes}(\mathsf{VarSet}(\mathsf{Lo}(n))\setminus\mathsf{VarSet}(\mathsf{Hi}(n)))
8:        c𝖼𝗈𝗇𝗃𝗎𝗇𝖼𝗍𝗂𝗈𝗇𝖭𝗈𝖽𝖾()c\longleftarrow\mathsf{conjunctionNode}()
9:        𝖢𝗁𝗂𝗅𝖽(c){𝖧𝗂(n)δ}\mathsf{Child}(c)\longleftarrow\{\mathsf{Hi}(n)\cup\delta\}
10:        𝖧𝗂(n)c\mathsf{Hi}(n)\longleftarrow c
11:        δ𝖼𝗋𝖾𝖺𝗍𝖾𝖣𝖾𝖼𝗂𝗌𝗂𝗈𝗇𝖭𝗈𝖽𝖾𝗌(𝖵𝖺𝗋𝖲𝖾𝗍(𝖧𝗂(n))𝖵𝖺𝗋𝖲𝖾𝗍(𝖫𝗈(n)))\delta\longleftarrow\mathsf{createDecisionNodes}(\mathsf{VarSet}(\mathsf{Hi}(n))\setminus\mathsf{VarSet}(\mathsf{Lo}(n)))
12:        c𝖼𝗈𝗇𝗃𝗎𝗇𝖼𝗍𝗂𝗈𝗇𝖭𝗈𝖽𝖾()c\longleftarrow\mathsf{conjunctionNode}()
13:        𝖢𝗁𝗂𝗅𝖽(c){𝖫𝗈(n)δ}\mathsf{Child}(c)\longleftarrow\{\mathsf{Lo}(n)\cup\delta\}
14:        𝖫𝗈(n)c\mathsf{Lo}(n)\longleftarrow c
15:        𝖵𝖺𝗋𝖲𝖾𝗍(n)𝖵𝖺𝗋𝖲𝖾𝗍(𝖫𝗈(()n))𝖵𝖺𝗋𝖲𝖾𝗍(𝖧𝗂(n))\mathsf{VarSet}(n)\longleftarrow{\mathsf{VarSet}(\mathsf{Lo}(()n))\cup\mathsf{VarSet}(\mathsf{Hi}(n))}
Algorithm 3 𝖲𝗆𝗈𝗈𝗍𝗁\mathsf{Smooth} - performs smoothing on a 𝖯𝖱𝖮𝖡\mathsf{PROB}

An important property to enable a 𝖯𝖱𝖮𝖡\mathsf{PROB} to learn the correct distribution is smoothness. A non-smooth 𝖯𝖱𝖮𝖡\mathsf{PROB} could be missing certain parameters. An example would be if we have an assignment τ1=[¬x,y,z]\tau_{1}=[\neg x,y,-z], ψ1\psi_{1} in Figure 2 (main paper) will not have a counter for ¬z\neg z as the traversal ends after reaching the true node from the decision node representing variable yy. Observe that the above-mentioned issue would not occur in a smooth 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ2\psi_{2} in Figure 4. If a 𝖯𝖱𝖮𝖡\mathsf{PROB} is not smooth, we can perform smoothing. For a given decision node nn (for consistency with algorithm 3), if 𝖵𝖺𝗋𝖲𝖾𝗍(𝖫𝗈(n))𝖵𝖺𝗋𝖲𝖾𝗍(𝖧𝗂(n))\mathsf{VarSet}(\mathsf{Lo}({n}))\setminus\mathsf{VarSet}(\mathsf{Hi}({n}))\neq\emptyset we can augment the hi branch of nn with the missing variables as shown in lines 7 - 10 of algorithm 3. We create a new conjunction node cc with decision nodes representing each missing variable in 𝖵𝖺𝗋𝖲𝖾𝗍(𝖫𝗈(n))𝖵𝖺𝗋𝖲𝖾𝗍(𝖧𝗂(n))\mathsf{VarSet}(\mathsf{Lo}({n}))\setminus\mathsf{VarSet}(\mathsf{Hi}({n})) and 𝖧𝗂(n)\mathsf{Hi}({n}) as children. For each additional decision node created for 𝖵𝖺𝗋𝖲𝖾𝗍(𝖫𝗈(n))𝖵𝖺𝗋𝖲𝖾𝗍(𝖧𝗂(n))\mathsf{VarSet}(\mathsf{Lo}({n}))\setminus\mathsf{VarSet}(\mathsf{Hi}({n})), both branches lead to the true node. We reassign 𝖧𝗂(n)\mathsf{Hi}({n}) to be cc. Similarly, we repeat the operation for 𝖫𝗈(n)\mathsf{Lo}(n). Once the smoothing operation is performed on every decision node in ψ\psi, ψ\psi will have the smoothness property.

xx\land\landyyzzzzyy\top\botn1n1n2n2n3n3n4n4n5n5n6n6n7n7n8n8n9n9
Figure 4: A smooth 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ2\psi_{2} with 9 nodes, n1,,n9n1,...,n9, representing F=(xy)(¬x¬z)F=(x\lor y)\land(\neg x\lor\neg z). Branch parameters are omitted

Probability Computation

Input: 𝖯𝖱𝖮𝖡\mathsf{PROB} ψ\psi, Assignment τ\tau

Output: probability of τ\tau

1:  cache γ\gamma \longleftarrow initCache()
2:  for node nn in bottom-up pass of ψ\psi do
3:     if nn is \top node then
4:        γ[n]1\gamma[n]\longleftarrow 1
5:     else if nn is \bot node then
6:        γ[n]0\gamma[n]\longleftarrow 0
7:     else if nn is \land node then
8:        γ[n]\gamma[n]\longleftarrow c𝖢𝗁𝗂𝗅𝖽(n)γ[c]\prod_{c\in{\mathsf{Child}(n)}}\gamma[c]
9:     else
10:        if 𝖵𝖺𝗋(n){\mathsf{Var}(n)} in τ\tau then
11:           γ[n]\gamma[n]\longleftarrow 𝖺𝗌𝗌𝗂𝗀𝗇𝖯𝗋𝗈𝖻\mathsf{assignProb}(θ𝖧𝗂(n),θ𝖫𝗈(n),γ\mathsf{\theta_{Hi}}(n),\mathsf{\theta_{Lo}}(n),\gamma)
12:        else
13:           γ[n]θ𝖫𝗈(n)×γ[𝖫𝗈(n)]+θ𝖧𝗂(n)×γ[𝖧𝗂(n)]\gamma[n]\longleftarrow\mathsf{\theta_{Lo}}(n)\times\gamma[{\mathsf{Lo}(n)}]+\mathsf{\theta_{Hi}}(n)\times\gamma[{\mathsf{Hi}(n)}]
14:  return γ\gamma[rootnode(ψ\psi)]
Algorithm 4 𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝖯𝗋𝗈𝖻\mathsf{ComputeProb} - returns probability of τ\tau

Probability computations are typically performed on decision diagrams in a bottom-up manner, processing child nodes before parent nodes. In this work, we implement a probability computation algorithm (𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝖯𝗋𝗈𝖻\mathsf{ComputeProb}), which is the joint probability computation component of 𝖯𝗋𝗈𝖻𝖲𝖺𝗆𝗉𝗅𝖾\mathsf{ProbSample} in the main paper. In line 11, the γ\gamma cache value is the product of branch parameter and child γ\gamma cache value of the corresponding assignment of 𝖵𝖺𝗋(n)\mathsf{Var}(n) instead of both possible possible assignments in line 13.

Additional results

We show in Table 4 additional ε\varepsilon-match rate statistics on how well 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} performs when provided different amount of data to learn probabilities. As we increase the amount of data provided for learning, in increments of 2000 instances (20% of 10000 total), there is a general improvement in the match rate of the trips produced by 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}. A similar trend is observed when ε=0\varepsilon=0, with corresponding stats shown in Table 5.

Stats 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}
0% 20% 40% 60% 80% 100%
Mean 0.210 0.416 0.434 0.451 0.466 0.456
Std 0.192 0.360 0.373 0.376 0.383 0.386
25% 0.081 0.102 0.095 0.098 0.098 0.102
50% 0.149 0.286 0.297 0.318 0.349 0.316
75% 0.257 0.715 0.854 0.964 1.000 1.000
Table 4: ε\varepsilon-match rate statistics for 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} where ε\varepsilon is set as median edge length of road network graph GrG_{r}. The percentages under ‘Stats’ column refer to the percentiles, for example ‘25%’ row refer to the 25th percentile match rate for various methods. The percentages under 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} header indicates the percentage of data that 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} has learned from, out of the 10000 learning instances in total.
Stats 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute}
0% 20% 40% 60% 80% 100%
Mean 0.192 0.404 0.422 0.440 0.455 0.445
Std 0.192 0.365 0.379 0.382 0.389 0.391
25% 0.063 0.080 0.075 0.078 0.077 0.082
50% 0.132 0.275 0.282 0.308 0.334 0.305
75% 0.243 0.702 0.848 0.963 1.000 1.000
Table 5: Exact match rate statistics for 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} where ε=0\varepsilon=0. The percentages under ‘Stats’ column refer to the percentiles, for example ‘25%’ row refer to the 25th percentile match rate for various methods. The percentages under 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} header indicates the percentage of data that 𝖯𝗋𝗈𝖻𝖱𝗈𝗎𝗍𝖾\mathsf{ProbRoute} has learned from, out of the 10000 learning instances in total.