This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

MPE inference using an Incremental Build-Infer-Approximate Paradigm

Shivani Bathla Electrical Department
IIT Madras
Chennai, India
Vinita Vasudevan Electrical Department
IIT Madras
Chennai, India
Abstract

Exact inference of the most probable explanation (MPE) in Bayesian networks is known to be NP-complete. In this paper, we propose an algorithm for approximate MPE inference that is based on the incremental build-infer-approximate (IBIA) framework. We use this framework to obtain an ordered set of partitions of the Bayesian network and the corresponding max-calibrated clique trees. We show that the maximum belief in the last partition gives an estimate of the probability of the MPE assignment. We propose an iterative algorithm for decoding, in which the subset of variables for which an assignment is obtained is guaranteed to increase in every iteration. There are no issues of convergence, and we do not perform a search for solutions. Even though it is a single shot algorithm, we obtain valid assignments in 100 out of the 117 benchmarks used for testing. The accuracy of our solution is comparable to a branch and bound search in majority of the benchmarks, with competitive run times.

1 Introduction

Bayesian Networks (BN) are directed graphical models that provide a compact representation of joint probability distribution over a set of random variables. An important task in bayesian inference is to find the Most Probable Explanation (MPE) given a set of evidence variables. It involves finding a complete assignment for all non-evidence variables that gives the maximum probability. The probability of the MPE assignment is same as the max-marginal obtained after maximizing the joint distribution over all non-evidence variables. Exact methods include variable elimination and max-product belief propagation on join trees followed by a decoding step to get the MPE assignment and integer linear programming (ILP)  [Koller and Friedman, 2009]. However, MPE inference is known to be an NP-complete problem [Shimony and Charniak, 1990, Park and Darwiche, 2004] and exact methods are not viable in most cases.

One approach to get the MPE assignment is to use search based algorithms that traverse the model’s search space. A commonly used framework for graphical models is the AND/OR search tree [Dechter and Mateescu, 2007]. Several search algorithms have been used to find MPE estimates from these models including depth-first Branch and Bound (AOBB) [Marinescu and Dechter, 2005, Marinescu et al., 2014, 2018], best-first (AOBF) [Marinescu and Dechter, 2007, Marinescu et al., 2020], Breadth-Rotating AND/OR Branch and Bound (BRAOBB) [Otten and Dechter, 2012]. These methods use upper bounds of max-marginals computed using different variants of Mini-Bucket Elimination (MBE) to guide the search. Weighted Mini-Bucket elimination improves the accuracy of the bound of traditional MBE using the Holder’s inequality [Liu and Ihler, 2011]. Another technique to tighten the resulting bound is to use cost-shifting schemes. Two such algorithms are Mini-Bucket Elimination with Moment Matching (MBE-MM) [Flerova et al., 2011] and Join Graph Linear Programming (JGLP) [Ihler et al., 2012]. While MBE-MM uses a single pass of cost-shifting locally within the mini-buckets, JGLP performs cost-shifting updates to the entire mini-bucket join graph iteratively.

Another class of methods are based on LP relaxations. LP relaxations based on message passing include tree re-weighted max-product [Wainwright et al., 2005, Kolmogorov, 2005, Werner, 2007, Jojic et al., 2010, Hazan and Shashua, 2010] and max-product linear programming [Globerson and Jaakkola, 2007, Sontag et al., 2008]. The other class of methods are the dual decomposition solvers [Komodakis et al., 2007, Sontag et al., 2011, 2012, Ping et al., 2015, Martins et al., 2015]. The connection between the solutions obtained using two methods is explored in Bauer et al. [2019]

In this work, we propose a different approach for MPE inference. Our method is based on an Incremental Build-Infer-Approximate paradigm (IBIA) that returns a set of bounded clique size partitions of the BN along with the calibrated forest of of clique trees (CTFs) corresponding to each partition [Bathla and Vasudevan, 2022]. Instead of the sum-product calibration used in Bathla and Vasudevan [2022], we use a max-product calibration along with suitable modifications to the Approximate step. Based on this, we propose a single search algorithm for MPE inference with the following features.

  1. 1.

    We show that the max-marginal can be estimated by finding the maximum belief from any clique in the max-calibrated CT of the last partition. Based on experiments, we find that our algorithm gives very good estimates with an average error of ±0.27\pm 0.27 in the log probability over 79 benchmarks for which the exact solution is available.

  2. 2.

    We propose a single shot algorithm for decoding the MPE assignment. It is an iterative method in which the subset of variables for which an assignment is obtained is guaranteed to increase in every iteration. There are no issues of convergence. Even though it is a single shot algorithm, we were able to get a non-zero probability assignment with accuracy comparable to AOBB in majority of the testcases.

  3. 3.

    Our code is written in Python, with Numpy and NetworkX libraries. The runtimes of our algorithm are very competitive and better in some cases, than the compiled C++ code for AOBB.

2 Background

A Bayesian network is a directed acyclic graph (DAG), 𝒢\mathcal{G}. Its nodes represent random variables 𝒴={X1,X2,Xn}\mathcal{Y}=\{X_{1},X_{2},\cdots X_{n}\} with associated domains D={D1,D2,Dn}D=\{D_{1},D_{2},\cdots D_{n}\}. It has directed edges from a subset of nodes PaXi={Xk,ki}\textrm{Pa}_{X_{i}}=\{X_{k},k\neq i\} to XiX_{i}, representing a conditional probability distribution (CPD) ϕi=P{Xi|PaXi}\phi_{i}=P\{X_{i}|\textrm{Pa}_{X_{i}}\}. The BN represents the joint probability distribution (JPD) of 𝒴\mathcal{Y}, given by P(𝒴)=i=1nϕiP(\mathcal{Y})=\prod_{i=1}^{n}\phi_{i}. Let EE denote the set of instantiated or evidence variables, with 𝒴={𝒳,E}\mathcal{Y}=\{\mathcal{X},E\}. The task is to find the most probable explanation (MPE), defined as

MPE=argmax𝒳P(𝒳,E=e)\displaystyle\textrm{MPE}=\operatorname*{arg\,max}\limits_{\mathcal{X}}P(\mathcal{X},E=e)

In low treewidth networks, MPE can be obtained using the clique tree (CT) representation, which is a derived undirected graphical model obtained after moralization and chordal completion of the BN. The CT, which is also called the join tree or junction tree, is a hypertree with nodes {C1,C2,,Cn}\{C_{1},C_{2},\cdots,C_{n}\} that are the set of maximal cliques in the chordal graph. An edge between CiC_{i} and CjC_{j} is associated with a sepset Si,j=CiCjS_{i,j}=C_{i}\cap C_{j}. Each factor ϕi\phi_{i} is associated with a single clique CiC_{i} such that Scope(ϕi)𝒞i\textrm{Scope}(\phi_{i})\subseteq\mathcal{C}_{i}. The task of computing MPE is split into two parts. First, the max-product belief propagation (BP) algorithm [Chapter 10 of Koller and Friedman, 2009] is used to compute the maximum marginal probability, MaxMargMaxMarg, defined as

MaxMarg\displaystyle MaxMarg =max𝒳P(𝒳,E=e)\displaystyle=\max\limits_{\mathcal{X}}P(\mathcal{X},E=e)

This algorithm has two rounds of message passing along the edges of the CT, an upward pass (from the leaf nodes to the root node) and a downward pass (from the root node to the leaves). The CT obtained as a result of this algorithm is max-calibrated, which is defined as follows. Let β(Ci)\beta(C_{i}), β(Cj)\beta(C_{j}) denote the beliefs associated with adjacent cliques CiC_{i} and CjC_{j} and μij\mu_{ij}, the belief of the corresponding SijS_{ij}. The CT is said to be max-calibrated if all pairs of adjacent cliques satisfy the following.

maxCiSi,jβ(Ci)=maxCjSi,jβ(Cj)=μ(Si,j)\max\limits_{C_{i}\setminus S_{i,j}}\beta(C_{i})=\max\limits_{C_{j}\setminus S_{i,j}}\beta(C_{j})=\mu(S_{i,j})

The MaxMargMaxMarg can be simply obtained by finding the maximum clique belief in any clique CC in the max-calibrated CT since,

MaxMarg=maxCmax𝒳CP(𝒳,e)=maxCβ(C)\displaystyle MaxMarg=\max\limits_{C}\max\limits_{\mathcal{X}\setminus C}P(\mathcal{X},e)=\max\limits_{C}\beta(C)

Since a network could have disjoint DAGs, we use the term Clique Tree Forest (CTF) to denote the collection of corresponding CTs. The max-marginal for a BN with multiple DAGs is the product of max-marginals of all CTs CTF\in CTF.

Once MaxMargMaxMarg is computed, the decode step finds an MPE assignment using the max-calibrated CTF. Algorithm 1 has the traceback procedure. Set SuS_{u} tracks the set of unassigned variables. For each CT in the CTF, any clique can be chosen as the root node (CrootC_{root}). To start with all variables in the root clique are unassigned. Since the global MPE assignment also satisfies the local optimality property [Koller and Friedman, 2009], a partial assignment can be obtained by finding the argmax of the clique belief. The overall MPE assignment is updated and all assigned variables are removed from set SuS_{u}. All other cliques are iterated through in a pre-order traversal. For any clique CC, the state of the variables on the sep-sets with the preceding cliques are known. After reducing belief β(C)\beta(C) over the known variable states, we find an assignment of the unassigned variables that maximizes the reduced clique belief.

Algorithm 1 Traceback (CTFCTF)
1:CTFCTF: Max-calibrated CTF
2:Initialize:
MPE=<>=\leavevmode\nobreak\ <> \triangleright Hash table <var:var.assignment><var:var.assignment>
SuS_{u}\leftarrow
Variables CTF\in CTF \triangleright Set of unassigned variables
3:for CTCTFCT\in CTF do
4:    CrootC_{root}\leftarrow Choose any clique in CT
5:    for CCT.Preorder_Traversal(Croot)C\in CT.Preorder\_Traversal(C_{root}) do
6:       SucCSuS_{u_{c}}\leftarrow C\cap S_{u} \triangleright Set of unassigned variables in clique C
7:       if SucS_{u_{c}}\neq\varnothing then
8:          β(C)\beta(C) \leftarrow Reduce belief β(C)\beta(C) over known states \in MPE
9:          A \leftarrow argmax β(C)\beta(C)
10:          MPE[v] = A[v] vSuc\forall v\in S_{u_{c}} \triangleright Update MPE assignment
11:          SuS_{u}.remove(SucS_{u_{c}})
12:       end if
13:    end for
14:end for
15:return MPE

2.1 IBIA framework

Refer to caption
Figure 1: Bayesian inference using the IBIA framework

The IBIA framework proposed in Bathla and Vasudevan [2021, 2022] presents a three-step approach to perform inference on a BN. The overall approach is illustrated in Figure 1. Each DAG GiBNG_{i}\in BN is built, inferred and approximated separately. IBIA starts with a clique forest over the nodes that have no parents (referred to as primary inputs (PI)). An incremental approach is used to add as many nodes as possible until a preset bound (mcspmcs_{p}) on the clique size is reached. In this work, we define clique-size for a given clique CiC_{i} as,

csi=log2(vCi|Dv|)cs_{i}=\log_{2}\leavevmode\nobreak\ (\prod\limits_{\forall\leavevmode\nobreak\ v\leavevmode\nobreak\ \in\leavevmode\nobreak\ C_{i}}|D_{v}|)

where |Dv||D_{v}| is the cardinality or the number of states in the domain of the variable vv. When the clique-size bound mcspmcs_{p} is reached, the CTF is calibrated using the standard join tree message passing algorithm [Koller and Friedman, 2009]. The CTF contains two sets of variables - variables that are parents to nodes that have not yet been added (referred to as interface nodes) and other variables (non-interface nodes). A combination of exact and approximate marginalization is used to lower clique sizes to a preset bound mcsimmcs_{im}, thus allowing for addition of new nodes. This approximate CTF is used as the starting point for incremental construction of the CT for the next partition. This process is continued until all the nodes in the BN have been added to some CTF.

The algorithm returns an ordered set of partitions {R1,R2,RP}\{R_{1},R_{2},\cdots R_{P}\} and the corresponding calibrated CTFs for each DAG BN\in BN. The authors show that the CTFs obtained after the build and approximate stage contain valid CTs. Also, the approximation algorithm maintains consistency of within-clique beliefs. A trade-off between runtime and accuracy can be achieved using the input parameters, mcspmcs_{p} and mcsimmcs_{im}.

3 Framework for MPE queries

The IBIA framework has been used to infer the partition function and the prior and posterior marginal probabilities of non-evidence variables in Bathla and Vasudevan [2022]. We have modified this framework for MPE queries. We use an iterative procedure that has the build, infer and approximate steps and a partial decode step that gives an MPE assignment for a subset of variables. In each iteration, the subset for which an assignment is obtained is extended. The iteration is continued until either all variables are assigned or an assignment is not possible. In the following sections, each of the steps are explained in more detail.

Step 1: Incremental Build: The CTF can be built incrementally using either of the algorithms detailed in Flores, Gámez, and Olesen [2002], Bathla and Vasudevan [2022]. We use the approach described in Bathla and Vasudevan [2022] since it works directly with the clique tree model and eliminates the need for computation of any other intermediate representation. Typically, it chooses a smaller graph chosen for re-triangulation than Flores et al. [2002].

Step 2: Inference: We use the two-pass Max-product Belief Propagation algorithm to calibrate the partition CTF. After calibration, all cliques in a CT will have the same maximum probability, and the beliefs of adjacent clique beliefs agree over the common variables [Koller and Friedman, 2009].

Step 3: Approximation:

Refer to caption
(a) Exact max-marginalization
Refer to caption
(b) Local max-marginalization
Figure 2: Illustration of exact and local max-marginalization of variable nn present in cliques C1,C2C_{1},C_{2} in first figure and cliques C1,C2,C3,C4C_{1},C_{2},C_{3},C_{4} in the second.

As discussed, the nodes present in the CTF corresponding to a partition can divided into two sets: interface (IV) and non-interface nodes (NIV). The interface nodes are parents of the variables that have not yet been added. Therefore, these are required for building the next partition. We adopt a similar approximation methodology as suggested in Bathla and Vasudevan [2022] to reduce the clique sizes. We also denote the calibrated tree obtained after BP as CTFinCTF_{in} and the approximated CTF as CTFaCTF_{a}. The minimal subgraph of CTFinCTF_{in} that connects the interface variables forms the initial CTFaCTF_{a}. This graph may also contain some non-interface variables. We use a combination of exact and local max-marginalization to reduce the clique sizes to a preset bound mcsimmcs_{im}. We now discuss both these steps in detail.

  1. 1.

    Exact max-marginalization: If a non-interface variable nn is present in a single clique, it is simply removed from the clique scope and the belief is modified as: β(C)=maxn.statesβ(C)\beta(C^{{}^{\prime}})=\max\limits_{n.states}\beta(C) where, C=CnC^{\prime}=C\setminus n. On the other hand, if nn is present in multiple cliques, the subtree over all containing cliques (STnST_{n}) is collapsed into a single clique (CcC_{c}). Figure 2(a) shows an example. Cliques C1,C2C_{1},C_{2} containing nn are collapsed to form clique CcC_{c}. Since this step could potentially result in a large clique, this is only performed if the size of the resultant clique mcsim\leq mcs_{im}. The belief of collapsed clique is computed as shown below.

    β(Cc)=maxn.states(CSTnβ(C)SPSTnμ(SP))\beta(C_{c})=\max_{n.states}\left(\frac{\prod_{C\in{ST_{n}}}\beta(C)}{\prod_{SP\in{ST_{n}}}\mu(SP)}\right)

    where, SPSP denotes the sepsets in STnST_{n}.
    This step preserves the joint beliefs i.e. variables in CTFaCTF_{a} have the same belief as in CTFinCTF_{in}.

  2. 2.

    Local max-marginalization: In this step, variables present in cliques with size >mcsim>mcs_{im} are max-marginalized from individual cliques while ensuring (a) All CTs CTFa\in CTF_{a} are valid CTs. (b) A connected CT CTFin\in CTF_{in} remains connected in CTFaCTF_{a}.

    Let, nn be a variable present in large cliques. The subgraph of CTFinCTF_{in} over all cliques containing nn is a connected tree since the input CTF is valid. To ensure that the Running Intersection Property (RIP) [Koller and Friedman, 2009] is satisfied after approximation, nn is retained in a connected subtree containing smaller cliques with sizes mcsim\leq mcs_{im}. It is individually max-marginalized out from all other cliques. Figure 2(b) shows an example where variable nn is removed from cliques C1,C2C_{1},C_{2} and retained in cliques C3,C4C_{3},C_{4}. Local marginalization of nn from cliques CiC_{i} and CjC_{j}, and the corresponding sepset Si,jS_{i,j} results in cliques Ci=CinC_{i}^{\prime}=C_{i}\setminus n and Cj=CjnC_{j}^{\prime}=C_{j}\setminus n with sepset Si,j=Si,jnS_{i,j}^{\prime}=S_{i,j}\setminus n. The corresponding beliefs are β(Ci)=maxn.statesβ(Ci)\beta(C_{i}^{\prime})=\max\limits_{n.states}\beta(C_{i}), β(Cj)=maxn.statesβ(Cj)\beta(C_{j}^{\prime})=\max\limits_{n.states}\beta(C_{j}), μ(Si,j)=maxn.statesμ(Si,j)\mu(S_{i,j}^{\prime})=\max\limits_{n.states}\mu(S_{i,j}). While updating the beliefs in this manner preserves the joint beliefs for variables present within the clique, the joint beliefs of variables present in disjoint cliques are approximated.

Once the clique sizes are reduced, we re-assign the clique factors based on calibrated clique and sep-set beliefs.

Proposition 1.

If all CTs in CTFinCTF_{in} are valid and max-calibrated, then all CTs in CTFaCTF_{a} are also valid and max-calibrated.

Proposition 2.

Approximation algorithm preserves the maximum belief and the within-clique beliefs of all cliques in CTFaCTF_{a}.

We discuss the proofs for Propositions 1,  2 in Section 6.

Algorithm 2 shows the overall algorithm. At all points in the method, we remove any non-maximal cliques if generated and re-connect their neighbors. We start by identifying the set of interface and non-interface variables present in CTFinCTF_{in}. We initialize CTFaCTF_{a} as the minimal subgraph of CTFinCTF_{in} that connects the interface variables. We first attempt to reduce clique sizes using exact max-marginalization of non-interface variables. NIVs present in a single clique are removed, and the corresponding clique beliefs are max-marginalized. If any resultant clique is non-maximal, it is removed, and the neighbors are re-connected. Wherever possible within the clique size constraints, NIVs present in multiple cliques are max-marginalized after collapsing the cliques (lines 4-7).

Next, we find the set of large-sized cliques (LcL_{c}) with size >mcsim>mcs_{im}. All variables in these cliques are grouped into NIVs and IVs and stored in NN. We first try to locally max-marginalize non-interface variables. This is done as follows. First, both sets of variables are individually sorted in the increasing order of the number of containing cliques. Starting with NIVs, we pop a variable (nn) from this list and identify the subgraph of CTFaCTF_{a} over cliques containing this variable (STnST_{n}). Since the input CTF satisfies RIP, STnST_{n} is a connected tree. STnST_{n} can be further divided into disjoint subtrees over cliques with size mcsim\leq mcs_{im}. We choose the largest subtree (STrST_{r}) and retain the variable nn in all cliques in this subtree (lines 13-15). Since we want a connected CT to remain connected, the variable is removed only if no sepset size becomes zero (lines 15-18).

If the clique sizes remain above mcsimmcs_{im} after processing NIVs, we start with the IVs. If nn is an IV, it must be retained in atleast one clique. Therefore, we ignore interface variables that are only present in large cliques (lines 19-20). Otherwise, we max-marginalize all clique and sep-set beliefs associated with the cliques that are not present in STrST_{r}. The list LcL_{c} is updated by removing cliques where the size is now within the preset limit. This process is repeated until either clique list LcL_{c} or the variable set NN becomes empty.

Once clique-sizes are reduced, we reassign the clique factors (lines 26-27). For each clique tree in CTFaCTF_{a}, we pick a root node and perform a pre-order traversal from the root node to all other nodes in the CT. The factor associated with the root node is the corresponding clique belief. For all other nodes, the factor for an unvisited neighbor CjC_{j} of clique CiC_{i} is assigned as β(Cj)=β(Ci)μ(Sij)\beta(C_{j})=\frac{\beta(C_{i})}{\mu(S_{ij})}.

Algorithm 2 ApproximateCTF (CTF,mcsimCTF,\leavevmode\nobreak\ mcs_{im})
1: CTFinCTF_{in}: Input clique tree forest
mcsimmcs_{im}: Maximum clique size limit for interface CTFinCTF_{in}
2:CTFaCTF_{a}: Approximate CTFinCTF_{in}
3:IV,NIVIV,NIV\leftarrow Divide nets CTFin\in CTF_{in} into interface and non-interface variables
4:CTFaCTF_{a}\leftarrow Minimal subgraph of CTFinCTF_{in} connecting IVIV
5:NIVNetsCTFaIVNIV\leftarrow Nets\in CTF_{a}\setminus IV \triangleright Updated set of Non-interface variables
6:\triangleright Exact max-marginalization  
7:Max-marginalize NIVs present in a single clique in CTFaCTF_{a}
8:Remove resultant non-maximal cliques, reconnect neighbors
9:Collapse cliques to max-marginalize NIVs in multiple cliques if cs<mcsimcs<mcs_{im}
10:\triangleright Local max-marginalization to trim cliques with size >mcsim>mcs_{im}  
11:LcL_{c}\leftarrow List of cliques with size >mcsim>mcs_{im}
12:N{NIV,IV}N\leftarrow\{NIV,IV\} \triangleright Sorted lists of variables present in cliques in LcL_{c}
13:while LcL_{c}\neq\varnothing or NN\neq\varnothing do
14:    n=N.n=N.pop net present in least number of cliques
15:    STnST_{n}\leftarrow Subgraph of CTFaCTF_{a} over cliques containing nn
16:    STrST_{r}\leftarrow Choose connected subtree STn\in ST_{n} s.t. max-clique-size mcsim\leq mcs_{im}
17:    \triangleright Check if CT will remain connected if nn is removed  
18:    LmL_{m}\leftarrowCliques STn\in ST_{n}\leavevmode\nobreak\ \setminus\leavevmode\nobreak\ Cliques STr\in ST_{r}
19:    msms\leftarrow Minimum sep-set size for cliques in LmL_{m}
20:    if ms==1ms==1 then  continue
21:    \triangleright Check if interface variables are retained in atleast one clique  
22:    if STr.isEmpty()&&nIVST_{r}.isEmpty()\leavevmode\nobreak\ \&\&\leavevmode\nobreak\ n\in IV then  continue
23:    Locally max-marginalize beliefs corresponding to all cliques in LmL_{m}.
24:    Remove resultant non-maximal cliques, reconnect neighbors
25:    LcL_{c}.remove(CC) if CC.size mcsim\leq mcs_{im} CLm\forall\leavevmode\nobreak\ C\in L_{m}
26:end while
27:\triangleright Re-parameterize the joint distribution  
28:Re-assign clique factors in CTFaCTF_{a} using clique and sep-set beliefs
29:return CTFaCTF_{a}

Step 4: Decode: The assignment of variables in the last partition is used as the partial assignment of MPE. This assignment is obtained from the last partition CT using Algorithm 1.

3.1 Inference of MaxMargMaxMarg

Let CTFiCTF_{i} denote the CTF for the ithi^{th} DAG in the BN. The nin_{i} partitions of the ithi^{th} DAG are denoted {Ri1,Ri2,,Rini}\{R_{i1},R_{i2},\ldots,R_{i{n_{i}}}\}, with corresponding CTFs CTFi={CTFi,1,CTFi,2,,CTFi,ni}CTF_{i}=\{CTF_{i,1},CTF_{i,2},\ldots,CTF_{i,{n_{i}}}\}.

Proposition 3.

The max-belief of any clique in CTCTFi,kCT\in CTF_{i,k} is the estimate of the maximum probability in the joint distribution over all variables added to CTCT in partitions Rij,jkR_{ij},j\leq k.

Proof.

Let, CC be a clique in the calibrated CTCTFi,1CT\in CTF_{i,1} and 𝒳1\mathcal{X}_{1} be the set of variables in CTCT. Cliques in CTCT are assigned CPDs {ϕv,v𝒳1}\{\phi_{v},v\in\mathcal{X}_{1}\} as factors. Therefore, the following holds true once CTCT is calibrated.

β(C)\displaystyle\beta(C) =max𝒳1Cv𝒳1ϕv=max𝒳1CP(𝒳1)\displaystyle=\max\limits_{\mathcal{X}_{1}\setminus C}\prod\limits_{v\in\mathcal{X}_{1}}\phi_{v}=\max\limits_{\mathcal{X}_{1}\setminus C}P(\mathcal{X}_{1})
\displaystyle\implies maxCβ(C)=max𝒳1P(𝒳1)\displaystyle\max\limits_{C}\beta(C)=\max\limits_{\mathcal{X}_{1}}P(\mathcal{X}_{1})

Let CTFk,1CTF_{k,1}^{{}^{\prime}} be the CTF obtained after approximating CTFk,1CTF_{k,1}. While the within clique beliefs are preserved while local max-marginalization (by Proposition 2), the joint beliefs of variables belonging to different cliques are approximated. Let, 𝒳12\mathcal{X}_{12} be the set of variables in CTFk,1CTF_{k,1}^{{}^{\prime}} and P(𝒳12)P^{\prime}(\mathcal{X}_{12}) denote the joint belief obtained from the approximate beliefs in CTFk,1CTF_{k,1}^{{}^{\prime}} as follows. Let, Q(𝒳1)Q(\mathcal{X}_{1}) denote the distribution such that

max𝒳1𝒳12Q(𝒳1)=P(𝒳12)=βμ\displaystyle\max\limits_{\mathcal{X}_{1}\setminus\mathcal{X}_{12}}Q(\mathcal{X}_{1})=P^{\prime}(\mathcal{X}_{12})=\dfrac{\prod\beta^{{}^{\prime}}}{\prod\mu^{{}^{\prime}}}

Let, 𝒳2\mathcal{X}_{2}, denote the set of new variables added to the corresponding CTCTFi,2CT\in CTF_{i,2} in the second partition. Factors assigned to cliques in CT are the CPDs of the new nodes {ϕv,v𝒳2}\{\phi_{v},\leavevmode\nobreak\ \forall v\in\mathcal{X}_{2}\} and re-parameterized beliefs {β1,β2μ12,}\{\beta^{{}^{\prime}}_{1},\frac{\beta^{{}^{\prime}}_{2}}{\mu^{{}^{\prime}}_{12}},\ldots\}. Therefore, after calibration of CT, the following holds true for any clique CCTC\in CT.

β(C)\displaystyle\beta(C) =max{𝒳2,𝒳12}Cv𝒳2ϕvβμ\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{12}\}\setminus C}\prod\limits_{v\in\mathcal{X}_{2}}\phi_{v}\cdot\frac{\prod\beta^{{}^{\prime}}}{\prod\mu^{{}^{\prime}}}
=max{𝒳2,𝒳12}CP(𝒳2|Par𝒳2)P(𝒳12)\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{12}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ |\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\cdot P^{\prime}(\mathcal{X}_{12})
=max{𝒳2,𝒳12}CP(𝒳2|Par𝒳2)max𝒳1𝒳12Q(𝒳1)\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{12}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ |\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\cdot\max\limits_{\mathcal{X}_{1}\setminus\mathcal{X}_{12}}Q(\mathcal{X}_{1})
=max{𝒳2,𝒳1}CP(𝒳2|Par𝒳2)Q(𝒳1)\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{1}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ |\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\cdot Q(\mathcal{X}_{1})

Since variables in 𝒳1\mathcal{X}_{1} are non-descendants of 𝒳2\mathcal{X}_{2} in the BN, 𝒳2𝒳1|Par𝒳2{\mathcal{X}_{2}\perp\mathcal{X}_{1}\leavevmode\nobreak\ |\leavevmode\nobreak\ Par_{\mathcal{X}_{2}}}. Therefore,

P(𝒳2|𝒳1,Par𝒳2)\displaystyle P(\mathcal{X}_{2}\leavevmode\nobreak\ |\leavevmode\nobreak\ \mathcal{X}_{1},Par_{\mathcal{X}_{2}}) =P(𝒳2|Par𝒳2)\displaystyle=P(\mathcal{X}_{2}\leavevmode\nobreak\ |\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})
P(𝒳2|𝒳1)\displaystyle P(\mathcal{X}_{2}\leavevmode\nobreak\ |\leavevmode\nobreak\ \mathcal{X}_{1}) =P(𝒳2|Par𝒳2)Par𝒳2𝒳1\displaystyle=P(\mathcal{X}_{2}\leavevmode\nobreak\ |\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\leavevmode\nobreak\ \leavevmode\nobreak\ \because\leavevmode\nobreak\ Par_{\mathcal{X}_{2}}\subset\mathcal{X}_{1}

Therefore, the clique belief can be written as,

β(C)\displaystyle\beta(C) =max{𝒳2,𝒳1}CP(𝒳2|𝒳1)Q(𝒳1)\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{1}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ |\leavevmode\nobreak\ \mathcal{X}_{1})\cdot Q(\mathcal{X}_{1})
=max{𝒳2,𝒳1}CP′′(𝒳2,𝒳1)\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{1}\}\setminus C}P^{\prime\prime}(\mathcal{X}_{2},\mathcal{X}_{1})
maxCβ(C)\displaystyle\max\limits_{C}\beta(C) =max𝒳2,𝒳1P′′(𝒳2,𝒳1)\displaystyle=\max\limits_{\mathcal{X}_{2},\mathcal{X}_{1}}P^{\prime\prime}(\mathcal{X}_{2},\mathcal{X}_{1})

A similar procedure is repeated in each successive partition. Therefore, the max-belief in the calibrated CT in partition RikR_{ik} approximates the maximum probability of the joint distribution over all variables added to it in partitions Rij,jkR_{ij},j\leq k. ∎

Theorem 1.

The product of the max-belief of the CTs corresponding to the last partition of all DAGs is the estimate of MaxMargMaxMarg.

Proof.

For each DAG, GiG_{i}, the last partition CTF contains a single CT since we never break a connected CT. Using Proposition3, the maximum probability (MaxMargiMaxMarg_{i}) for this CT is the estimate of the maximum probability of all variables in GiG_{i}. This value is exact when there is only one partition and may be approximate when there are multiple partitions. The product iMaxMargi\prod_{i}MaxMarg_{i} is the estimate of the overall max-marginal MaxMargMaxMarg. ∎

Based on Theorem 1, Algorithm 3 computes the overall max-marginal.

Algorithm 3 FindMaxMarg (LCTFL_{CTF})
1:LCTFL_{CTF}: List of clique tree forest corresponding to all DAGs \in BN
2:MaxMargMaxMarg: Estimated max-marginal over non-evidence variables \in BN
3:MaxMarg=1.0MaxMarg=1.0
4:for LiLCTFL_{i}\in L_{CTF} do \triangleright LiL_{i}: List of partition CTFs for the ith DAG
5:    CT Li[1]\leftarrow L_{i}[-1] \triangleright Calibrated CT for the last partition
6:    CC\leftarrow Choose any clique in CT
7:    MaxMarg=MaxMarg×maxCβ(C)MaxMarg=MaxMarg\times\max\limits_{C}\beta(C) \triangleright Multiply with max clique belief
8:end for
9:return MaxMargMaxMarg

3.2 MPE Decoding and complexity

Algorithm 4 MPE inference procedure
1:Initialize: MPE {}\leftarrow\{\}; M={MPE,E}M=\{\textrm{MPE},E\};
2:Simplify the BN based on MM
3:\triangleright Run the IBIA framework to partition the BN  
4:LCTFL_{CTF}\leftarrow IBIA.run(BN,mcsp,mcsimBN,\leavevmode\nobreak\ mcs_{p},\leavevmode\nobreak\ mcs_{im}) \triangleright List containing calibrated CTFs for all partitions
5:\triangleright Find partial assignment from last partition  
6:MpM_{p} \leftarrow Traceback (LCTFL_{CTF}[-1]) \triangleright Algorithm 1
7:MPE.update(MpM_{p})
8:if Number of partitions ====then
9:    return MPE
10:else
11:    Goto Line2
12:end if

Algorithm 4 shows the overall procedure. All known states are contained in list MM, initialized to evidence variables. The BN structure is simplified by removing all outgoing edges of variables with known states, and the CPDs are reduced over the known states. Next, the IBIA framework is run to partition the reduced BN based on clique size constraints mcspmcs_{p} and mcsimmcs_{im}. The framework returns the list of calibrated CTFs for all partitions. We find a partial assignment (MpM_{p}) over variables in the last partition using the traceback method described in Algorithm 1. The overall MPE assignment is updated by adding the decoded states in MpM_{p}. If multiple partitions are obtained, the assignment from the last partition is incomplete. Therefore, the partial assignment is added to MM, and the process is repeated until the complete assignment is obtained.

Our method performs a single search operation over the set of possible assignments for all variables in the BN. Due to the approximations involved while partitioning using IBIA, the beliefs in the last partition are approximate. Therefore, the partial assignment obtained in the last partition is not guaranteed to be accurate.

The complexity can be analyzed as follows. Let II be the number of iterations required to get an assignment of all variables, and PP be the maximum number of partitions obtained over all iterations. The time complexity of incremental clique tree formation is polynomial in the number of variables present in the cliques that are impacted by the addition of new nodes. For each partition, both calibration using max-product BP and max-marginalization of beliefs are exponential in the max-clique size mcspmcs_{p}. Similarly, finding the MPE assignment over variables in the last partition involves finding the argmax over individual cliques, which is also exponential in the max-clique size. Therefore, the overall time complexity is upper bounded by O(IP2mcsp)O(I\cdot P\cdot 2^{mcs_{p}}).

4 Results

All experiments presented here were carried out on 3.7-GHz Intel i7-8700 Linux system with 64-GB memory. A time limit of one hour was set for each simulation. For comparison, we use methods Weighted Mini-Bucket (WMB) [Liu and Ihler, 2011], Join Graph Linear Programming (JGLP) [Ihler et al., 2012] from the Merlin library and the AND/OR Branch-and-Bound (AOBB) [Dechter and Mateescu, 2007, Marinescu and Dechter, 2009] method from the Daoopt library. The source codes for both libraries written in C++ are publicly available [Marinescu, 2016, Otten, 2010]. Our code has been implemented in Python3 with NetworkX and NumPy libraries. All methods reduce the BN based on inherent determinism and local structure. For a fair comparison, we also simplify the BN based on known evidence states and determinism present in CPDs. We evaluate our approach on three datasets, namely, PedigreePedigree, BNBN, Grid-BNGrid\mbox{-}BN Ihler [2006], Dechter [2006] included in the Probabilistic Inference Challenge [Elidan, 2011] and several UAI approximate inference challenges. We evaluated all instances available in each dataset and ignored ones that had a single partition. We use solutions provided by  Ihler [2006] as the exact MPE solution (XExactX^{*}_{Exact}).

We denote the MPE assignment obtained using an approximate inference algorithm as XAlgX^{*}_{Alg}. The probability of a complete assignment is simply the product of the CPDs reduced over the given state. For instances where exact solutions are available, we measure the error in the estimated max-marginals and the probability of the MPE assignment and denote it as,

ΔMaxMarg\displaystyle\Delta_{MaxMarg} =logP(MaxMargAlg)logP(XExact)\displaystyle=\log P(MaxMarg_{Alg})-\log P(X^{*}_{Exact})
ΔMPE\displaystyle\Delta_{MPE} =logP(XAlg)logP(XExact)\displaystyle=\log P(X^{*}_{Alg})-\log P(X^{*}_{Exact})

The runtime parameters used for various codes were as follows. We set mcsp=20mcs_{p}=20 and mcsim=15mcs_{im}=15 in our code, as a trade-off between run time and accuracy. However, mcsimmcs_{im} is soft constraint and its value is decremented until the network can be partitioned successfully. For WMB, JGLP and AOBB, we tried an iboundibound of 10 and 20. Solutions for larger number of testcases were obtained for WMB with ibound=20ibound=20 and for AOBB with ibound=10ibound=10. Therefore, we used ibound=20ibound=20 for WMB and JGLP. and ibound=10ibound=10 for AOBB. For AOBB, we used the Mini-Bucket Elimination with Moment Matching (MBE-MM) heuristic [Flerova et al., 2011] to estimate bounds and enabled the use of the stochastic greedy ordering scheme.

Table 1: Comparison of errors in log probabilities of MPE assignments (ΔMPE\Delta_{MPE}) obtained using different methods. PP denotes the maximum number of partitions and II denotes the number of iterations required to obtain a complete assignment with IBIA. Entries are marked as ‘N’ if either no solution is obtained in 1hr or the probability of the solution obtained is zero.
Network II PP 𝚫𝐌𝐏𝐄\mathbf{\Delta_{MPE}} Runtime (s)
IBIA AOBB WMB JGLP AOBB WMB JGLP IBIA
\@slowromancapi@ BN_127 2 19 0 0 -53.65 0 0.09 242 216 13
BN_133 3 17 0 0 -40.02 0 0.06 247 231 14
BN_131 3 17 0 0 -41.02 0 0.04 220 205 13
BN_134 3 15 0 0 -26.17 0 0.1 245 232 12
BN_130 2 18 0 0 -61.82 -0.17 1 239 243 10
BN_126 3 19 0 0 -28.40 -19.48 1 208 193 13
BN_129 3 15 0 0 -39.66 -6.40 0.2 196 216 10
BN_132 2 13 -0.31 0 -39.37 -4.65 37 237 228 9
BN_15 2 6 -0.09 0 -2.87 -0.66 0.5 106 72 1
pedigree30 4 6 0 0 -2.71 0 7 260 423 14
pedigree44 3 8 0 0 -7.63 -0.97 492 816 1318 6
pedigree13 4 11 -1.27 0 -11.09 -1.68 3600 1215 571 10
\@slowromancapii@ BN_60 2 6 0 0 N N 41 391 271 3
BN_74 4 10 0 0 N N 798 50 270 9
pedigree23 2 3 0 0 N N 0.1 25 3600 1
pedigree42 4 7 -0.09 0 N N 273 63 731 3
pedigree7 5 10 -0.37 0 N -2.68 3600 3600 2467 10
pedigree9 4 9 -0.95 0 N -1.03 379 808 513 10
pedigree34 4 10 -1.18 0 N N 3600 1505 3601 8
pedigree41 3 9 -1.06 0 N N 3600 47 3601 8
pedigree33 2 3 -1.91 0 N -0.80 6 2276 460 3
pedigree31 5 13 -2.25 -0.45 N -3.34 3600 3601 1743 12
pedigree40 4 11 -1.88 -7.89 N N 3600 90 196 12
pedigree51 5 10 0 -4.68 N -2.30 3600 1866 750 11
Table 2: Comparison of the logarithm of MPE assignment probabilities obtained with different methods. PP denotes the maximum number of partitions and II denotes the number of iterations required to obtain a complete assignment with IBIA. Entries are marked as ‘N’ if either no solution is obtained in 1hr or the probability of the solution obtained is zero.
Network II PP logP(XAlg)\log P(X^{*}_{Alg}) Runtime (s)
IBIA AOBB WMB JGLP AOBB WMB JGLP IBIA
\@slowromancapi@ 75-21-5 2 3 -13.28 -13.28 N -13.37 21 229 140 2
90-20-5 2 2 -5.70 -5.70 N -6.62 0.3 186 122 1
75-22-5 2 5 -15.61 -15.61 N -16.46 6 227 168 1
75-23-5 3 6 -15.43 -15.43 N -16.40 82 299 188 2
75-24-5 3 5 -16.08 -16.08 N -16.87 37 331 211 2
75-26-5 2 6 -21.89 -21.89 N -23.06 3600 417 270 4
75-25-5 2 6 -20.84 -20.84 N N 520 327 235 3
90-30-5 3 5 -13.12 -13.12 N N 1428 456 378 4
90-34-5 3 3 -13.29 -13.29 N N 898 645 489 4
\@slowromancapii@ BN_72 17 22 -193.89 N N N 3600 3600 2570 141
BN_70 8 21 -126.62 -136.58 N N 3600 2606 1306 138
BN_71 12 23 -157.85 N N N 3600 1785 471 129
BN_73 13 21 -169.45 -180.98 N N 3600 3600 2079 140
90-38-5 5 11 -19.87 N N N 3600 906 581 13
90-42-5 6 9 -20.94 N N N 3600 1101 751 16
90-46-5 3 10 -28.32 N N N 3600 1324 923 13
90-50-5 5 13 -29.16 N N N 3600 1547 1055 24

Table 1 shows a detailed comparison of the error in MPE assignments (ΔMPE\Delta_{MPE}) and the required runtime for different inference methods for a subset of benchmarks. The results for others are similar. For each testcase, the table also shows the maximum number of partitions and the number of iterations required to obtain a complete MPE assignment with IBIA. We check for accuracy of log probabilities upto two decimal places and report errors <0.01<0.01 as 0. Testcases for which solutions obtained by all methods were close to the optimal solution are not tabulated here in the interest of space. We divide testcases into two categories based on the results. For benchmarks in Category \@slowromancapi@, all methods achieve a non-zero probability solution. IBIA gives close to optimal solutions for most of the benchmarks in this category. Category \@slowromancapii@ shows testcases where IBIA and AOBB give solutions, but either WMB or JGLP fails. IBIA achieves good estimates in most testcases. For testcases BN_74,pedigree7,pedigree51BN\_74,\leavevmode\nobreak\ pedigree7,\leavevmode\nobreak\ pedigree51, while IBIA achieves close to the optimal solution in <11<11 seconds, AOBB performs repeated search operations until the timeout constraint is reached. While the error obtained with IBIA is >1>1 in five testcases, in four out of five of these testcases, AOBB also performs repeated search until timeout. IBIA performs better than JGLP in all cases except pedigree33pedigree33. For testcases pedigree40pedigree40 and pedigree51pedigree51, IBIA provides a higher probability solution than other methods.

In Table 2, we compare the logarithm of the MPE probabilities obtained with the different methods for benchmarks where the exact solution is not available. We ignore cases where IBIA, AOBB, and WMB/JGLP give a common solution while tabulation. For testcases in the first category in Table 2, IBIA gives the same solution as AOBB in <5<5 seconds. The estimate obtained in all these cases is better than both WMB and JGLP. For networks in the second category, IBIA performs better than all other methods. While AOBB achieves some non-zero solution in testcases BN_70BN\_70 and BN_73BN\_73, the solution obtained with IBIA has a higher probability than the AOBB solution, with the difference being around 10. In all other cases, IBIA is able to find a possible MPE assignment with non-zero probability while all other methods obtain a zero-probability solution within the set time limit.

Since IBIA is written in Python and available tools are written in C++, the runtimes reported here are not directly comparable. That said, the runtime for IBIA is an order of magnitude lower than JGLP and WMB in most cases. They are also better than AOBB in many cases. This is expected since IBIA does not search for solutions. A single partial assignment is obtained in each iteration.

Table 3 summarizes the total number of instances that require multiple partitions and the number of instances that provide non-zero probability assignments within 1hr for different benchmarks. Out of 117 networks, IBIA provides solutions in 100 testcases which is better than JGLP and WMB, but slightly lower than 108 cases solved by AOBB.

The approximate CT obtained after local max-marginalization depends on the choice of variables that are removed to reduce the clique size. For the results in Tables 1 and2, we prioritize variables present in the least number of cliques. A random assignment is chosen in case of ties. The CTF obtained after approximation will be different if a different set of variables is chosen for removal. We tried three other orderings without any prioritization. Although the change in MaxMargMaxMarg was not much in most cases, there were significant changes in the MPE. The results for the benchmarks where an improvement was obtained are shown in Table 4, which has the best solution obtained. This gives rise to the possibility of searching for a solution over the approximation space. An advantage of this is that the approximation instances are independent of each other and can be run in parallel.

Table 3: Number of testcases where a non-zero solution is obtained using various methods with time limit set to 1hr.
#Inst. WMB JGLP AOBB+ IBIA
MBE-MM
BN_UAI 63 38 42 58 48
Pedigree 22 7 12 22 20
GridBN 32 8 22 28 32
Total 117 53 76 108 100
Table 4: Comparison of errors in MPE probabilities obtained with IBIA using different orderings for selection of variables while local max-marginalization. ΔIBIA\Delta_{IBIA} is the error obtained using structure based heuristics. ΔIBIABest\Delta_{IBIA_{Best}} is error of the best estimate obtained.
Network 𝚫𝐈𝐁𝐈𝐀\mathbf{\Delta_{IBIA}} 𝚫𝐈𝐁𝐈𝐀𝐁𝐞𝐬𝐭\mathbf{\Delta_{IBIA_{Best}}} Network 𝚫𝐈𝐁𝐈𝐀\mathbf{\Delta_{IBIA}} 𝚫𝐈𝐁𝐈𝐀𝐁𝐞𝐬𝐭\mathbf{\Delta_{IBIA_{Best}}}
BN_115 -1.51 0 pedigree31 -2.25 0
BN_12 -0.38 -0.19 pedigree33 -1.91 0
BN_132 -0.31 0 pedigree34 -1.18 0
BN_69 N -0.64 pedigree37 -0.46 0
BN_90 -16 0 pedigree40 -1.88 -0.65
pedigree13 -1.27 0 pedigree41 -1.06 0
pedigree18 N 0 pedigree42 -0.09 0
pedigree19 N -0.01 pedigree50 -1.25 -1.07
pedigree20 -0.53 -0.02 pedigree9 -0.95 0
pedigree25 -2.86 0

To evaluate the quality of the estimate of max-marginal obtained with our method, we compare it with the final upper bounds achieved with JGLP, WMB, and MBE-MM. Figure 3 shows the histogram of error in estimates obtained with different methods. We observe that the error with IBIA is within ±1\pm 1 in 70 instances out of 79 cases. However, the upper bound with all other methods is in this range in slightly more than 40 testcases. The average error in the estimate of max-marginal obtained with IBIA is 0.27 as opposed to 1.12, 1.23, 2.17 for the upper bounds estimated using existing metrics WMB, JGLP, and MBE-MM. Though the estimate of max-marginal obtained with IBIA is not necessarily an upper bound, it provides good estimates when compared to the existing heuristics. Therefore, this method can be used for quick evaluation of different partial assignments in search algorithms like Branch and Bound.

Refer to caption
Figure 3: Comparison of error in max-marginal estimated using different methods for BNBN and PedigreePedigree instances where solutions are known. We limit the range to compare the number of instances with lower errors. #Instances: 79 Bin-size==1

5 Conclusions

We propose a single-shot algorithm for MPE inference based on an Incremental Build-Infer-Approximate paradigm. It is an iterative method in which the subset of variables for which an assignment is obtained is guaranteed to increase in every iteration. In majority of cases, the number of iterations required is <5<5. Even though it is a single shot algorithm, it gives valid assignments in 100 out of 117 testcases. In majority of the testcases, the accuracy is comparable to the branch and bound based technique AOBB. Overall the runtimes are competitive, with significantly lower runtimes than AOBB in several cases. We also show that our method achieves better estimates of the max-marginals when compared to the upper bounds obtained with several mini-bucket methods. Therefore, this approach can possibly be used in search algorithms like Branch and Bound for quick evaluation of different partial assignments. We show some initial experiments to improve the accuracy of the approach by using different heuristics for approximation.

We obtain zero probability solutions in 14 out of a total of 117 benchmarks, calling for a more systematic search over the approximation space. In many of these cases, there were multiple partial assignments, and we chose one of them randomly. A search over these could also possibly give better solutions.

References

  • Bathla and Vasudevan [2021] Shivani Bathla and Vinita Vasudevan. A memory constrained approximate bayesian inference approach using incremental construction of clique trees. TechRxiv. https://doi.org/10.36227/techrxiv.14938275.v1, Jul 2021.
  • Bathla and Vasudevan [2022] Shivani Bathla and Vinita Vasudevan. IBIA: Bayesian inference via incremental build-infer-approximate operations on clique trees. arXiv preprint arXiv:2202.12003, 2022.
  • Bauer et al. [2019] Alexander Bauer, Shinichi Nakajima, Nico Görnitz, and Klaus-Robert Müller. Partial optimality of dual decomposition for map inference in pairwise mrfs. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1696–1703, 2019.
  • Dechter [2006] Rina Dechter. Uai model files and solutions. https://www.ics.uci.edu/~dechter/softwares/benchmarks/
    Mpe_Problme_Sets/, 2006.
    Accessed: 2021-10-15.
  • Dechter and Mateescu [2007] Rina Dechter and Robert Mateescu. And/or search spaces for graphical models. Artificial intelligence, 171(2-3):73–106, 2007.
  • Elidan [2011] Gal Elidan. The probabilistic inference challenge (pic2011). https://www.cs.huji.ac.il/project/PASCAL/, 2011.
  • Flerova et al. [2011] Natalia Flerova, Alexander Ihler, Rina Dechter, and Lars Otten. Mini-bucket elimination with moment matching. In NIPS Workshop DISCML, 2011.
  • Flores et al. [2002] M Julia Flores, José A Gámez, and Kristian G Olesen. Incremental Compilation of Bayesian Networks. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pages 233–240, 2002.
  • Globerson and Jaakkola [2007] Amir Globerson and Tommi Jaakkola. Fixing max-product: Convergent message passing algorithms for map lp-relaxations. Advances in neural information processing systems, 20, 2007.
  • Hazan and Shashua [2010] Tamir Hazan and Amnon Shashua. Norm-product belief propagation: Primal-dual message-passing for approximate inference. IEEE Transactions on Information Theory, 56(12):6294–6316, 2010.
  • Ihler [2006] Alexander Ihler. Uai model files and solutions. http://sli.ics.uci.edu/~ihler/uaidata/, 2006. Accessed: 2021-10-15.
  • Ihler et al. [2012] Alexander T Ihler, Natalia Flerova, Rina Dechter, and Lars Otten. Join-graph based cost-shifting schemes. Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012.
  • Jojic et al. [2010] Vladimir Jojic, Stephen Gould, and Daphne Koller. Accelerated dual decomposition for map inference. In Proceedings of the 27th International Conference on International Conference on Machine Learning, page 503–510, 2010.
  • Koller and Friedman [2009] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
  • Kolmogorov [2005] Vladimir Kolmogorov. Convergent tree-reweighted message passing for energy minimization. In International Workshop on Artificial Intelligence and Statistics, pages 182–189, 2005.
  • Komodakis et al. [2007] Nikos Komodakis, Nikos Paragios, and Georgios Tziritas. Mrf optimization via dual decomposition: Message-passing revisited. In 2007 IEEE 11th International Conference on Computer Vision, pages 1–8, 2007.
  • Liu and Ihler [2011] Qiang Liu and Alexander Ihler. Bounding the partition function using holder’s inequality. In Proceedings of the 28th International Conference on Machine Learning, pages 849–856, 2011.
  • Marinescu [2016] Radu Marinescu. Merlin. https://github.com/radum2275/merlin/, 2016. Accessed: 2021-10-15.
  • Marinescu and Dechter [2005] Radu Marinescu and Rina Dechter. And/or branch-and-bound for graphical models. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, pages 224–229, 2005.
  • Marinescu and Dechter [2007] Radu Marinescu and Rina Dechter. Best-first and/or search for most probable explanations. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence, pages 259–266, 2007.
  • Marinescu and Dechter [2009] Radu Marinescu and Rina Dechter. Memory intensive and/or search for combinatorial optimization in graphical models. Artificial Intelligence, 173(16-17):1492–1524, 2009.
  • Marinescu et al. [2014] Radu Marinescu, Rina Dechter, and Alexander T Ihler. And/or search for marginal map. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, pages 563–572, 2014.
  • Marinescu et al. [2018] Radu Marinescu, Junkyu Lee, Rina Dechter, and Alexander Ihler. And/or search for marginal map. Journal of Artificial Intelligence Research, 63:875–921, 2018.
  • Marinescu et al. [2020] Radu Marinescu, Akihiro Kishimoto, and Adi Botea. Parallel and/or search for marginal map. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 10226–10234, 2020.
  • Martins et al. [2015] André FT Martins, Mário AT Figueiredo, Pedro MQ Aguiar, Noah A Smith, and Eric P Xing. Ad3: Alternating directions dual decomposition for map inference in graphical models. The Journal of Machine Learning Research, 16(1):495–545, 2015.
  • Otten [2010] Lars Otten. Daoopt: Distributed and/or optimization. https://github.com/lotten/daoopt/, 2010. Accessed: 2022-02-10.
  • Otten and Dechter [2012] Lars Otten and Rina Dechter. Anytime and/or depth-first search for combinatorial optimization. Ai Communications, 25(3):211–227, 2012.
  • Park and Darwiche [2004] James D Park and Adnan Darwiche. Complexity results and approximation strategies for map explanations. Journal of Artificial Intelligence Research, 21:101–133, 2004.
  • Ping et al. [2015] Wei Ping, Qiang Liu, and Alexander T Ihler. Decomposition bounds for marginal map. Advances in neural information processing systems, 28, 2015.
  • Shimony and Charniak [1990] Solomon Eyal Shimony and Eugene Charniak. A new algorithm for finding map assignments to belief networks. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, pages 185–196, 1990.
  • Sontag et al. [2008] David Sontag, Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss. Tightening lp relaxations for map using message passing. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, pages 503–510, 2008.
  • Sontag et al. [2011] David Sontag, Amir Globerson, and Tommi Jaakkola. Introduction to dual composition for inference. In Optimization for Machine Learning. MIT Press, 2011.
  • Sontag et al. [2012] David Sontag, Do Kook Choe, and Yitao Li. Efficiently searching for frustrated cycles in map inference. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, pages 795–804, 2012.
  • Wainwright et al. [2005] Martin J Wainwright, Tommi S Jaakkola, and Alan S Willsky. Map estimation via agreement on trees: message-passing and linear programming. IEEE transactions on information theory, 51(11):3697–3717, 2005.
  • Werner [2007] Tomas Werner. A linear programming approach to max-sum problem: A review. IEEE transactions on pattern analysis and machine intelligence, 29(7):1165–1179, 2007.

6 Supplementary Material

In the following, we use the terms CTFinCTF_{in} and CTFaCTF_{a} to denote the input and output CTF of Algorithm 2. The line numbers indicated here refer to line numbers of Algorithm 2.

Proof for Proposition 1 CTFaCTF_{a} is initialized as the subgraph of CTFinCTF_{in} which is valid and max-calibrated (line 2). Each CT CTFa\in CTF_{a} obtained after the subsequent steps is valid because of the following reasons.

  • It contains only maximal cliques since any non-maximal cliques generated while simplification of CTFaCTF_{a} are removed (lines 6,22).

  • It contains disjoint trees. This is because neither of exact and local max-marginalization introduce any loops. In the exact max-marginalization step, neighbors of the collapsed and non-maximal cliques are reconnected to CTFaCTF_{a}, which means that the tree structure of the CT is preserved. In the local max-marginalization step, variables are removed only if no sepset size becomes zero, ensuring that a connected CT remains a connected tree (lines 15-18).

  • It satisfies the running intersection property (RIP) [Koller and Friedman, 2009]. This is because

    • During exact max-marginalization, sepsets in the CTFaCTF_{a} are preserved while re-connecting the neighbors of the collapsed cliques to the new clique, thereby ensuring that RIP is satisfied. Similarly, since neighbors of non-maximal cliques which are removed are connected to the containing cliques, sepsets are preserved.

    • During local max-marginalization, variables are retained in a single connected component of CTFaCTF_{a} (line 14). Therefore, the subgraph of CTFaCTF_{a} over any net is connected.

  • The re-assignment of factors results in a valid distribution.
    Proof: After exact max-marginalization, all the clique and sepset beliefs are preserved. Therefore, the resultant clique is also calibrated. Let CiC_{i} and CjC_{j} be two adjacent cliques in CTFinCTF_{in} with sepset Si,jS_{i,j}. After local max-marginalization of a node nn, we get the corresponding cliques Ci,CjCTFaC_{i}^{\prime},C_{j}^{\prime}\in CTF_{a}, with sepset Si,jS_{i,j}^{\prime}. Since the result is invariant with respect to the order in which variables are maximized over, we have

    maxCiSi,jβ(Ci)\displaystyle\max_{{C_{i}}^{\prime}\setminus{S_{i,j}}^{\prime}}\beta(C_{i}^{\prime}) =maxCiSi,jmaxn.statesβ(Ci)\displaystyle=\max_{{C_{i}}^{\prime}\setminus{S_{i,j}}^{\prime}}\max_{n.states}\beta(C_{i})
    =maxn.statesmaxCiSi,jβ(Ci)\displaystyle=\max_{n.states}\max_{{C_{i}}^{\prime}\setminus{S_{i,j}}^{\prime}}\beta(C_{i})
    =maxn.statesμ(Si,j)=μ(Si,j)\displaystyle=\max_{n.states}\mu(S_{i,j})=\mu(S_{i,j}^{\prime})

    where n.statesn.states denotes the states of nn. Similarly, maxCjSi,jβ(Cj)=μ(Si,j)\max_{{C_{j}}^{\prime}\setminus{S_{i,j}}^{\prime}}\beta(C_{j}^{\prime})=\mu(S_{i,j}^{\prime}). Since this is true for all pairs of adjacent cliques, CTFaCTF_{a} is also max-calibrated. Therefore, since re-parameterization is performed using calibrated clique and sepset beliefs, it results in a valid distribution.

Proof for Proposition 2 CTFaCTF_{a} is obtained from CTFinCTF_{in}, which is calibrated. Therefore, all cliques in a CT have the same maximum belief. After exact max-marginalization, the clique beliefs in CTFaCTF_{a} are identical to the beliefs in CTFinCTF_{in}, since the step involves collapsing all containing cliques before max-marginalization of beliefs. Both exact and local max-marginalizations involve maximizing clique beliefs over all states of variable. Thus, the the maximum belief and the within clique beliefs remain unaltered.