MPE inference using an Incremental Build-Infer-Approximate Paradigm

Shivani Bathla Electrical Department
IIT Madras
Chennai, India Vinita Vasudevan Electrical Department
IIT Madras
Chennai, India

Abstract

Exact inference of the most probable explanation (MPE) in Bayesian networks is known to be NP-complete. In this paper, we propose an algorithm for approximate MPE inference that is based on the incremental build-infer-approximate (IBIA) framework. We use this framework to obtain an ordered set of partitions of the Bayesian network and the corresponding max-calibrated clique trees. We show that the maximum belief in the last partition gives an estimate of the probability of the MPE assignment. We propose an iterative algorithm for decoding, in which the subset of variables for which an assignment is obtained is guaranteed to increase in every iteration. There are no issues of convergence, and we do not perform a search for solutions. Even though it is a single shot algorithm, we obtain valid assignments in 100 out of the 117 benchmarks used for testing. The accuracy of our solution is comparable to a branch and bound search in majority of the benchmarks, with competitive run times.

1 Introduction

Bayesian Networks (BN) are directed graphical models that provide a compact representation of joint probability distribution over a set of random variables. An important task in bayesian inference is to find the Most Probable Explanation (MPE) given a set of evidence variables. It involves finding a complete assignment for all non-evidence variables that gives the maximum probability. The probability of the MPE assignment is same as the max-marginal obtained after maximizing the joint distribution over all non-evidence variables. Exact methods include variable elimination and max-product belief propagation on join trees followed by a decoding step to get the MPE assignment and integer linear programming (ILP) [Koller and Friedman, 2009]. However, MPE inference is known to be an NP-complete problem [Shimony and Charniak, 1990, Park and Darwiche, 2004] and exact methods are not viable in most cases.

One approach to get the MPE assignment is to use search based algorithms that traverse the model’s search space. A commonly used framework for graphical models is the AND/OR search tree [Dechter and Mateescu, 2007]. Several search algorithms have been used to find MPE estimates from these models including depth-first Branch and Bound (AOBB) [Marinescu and Dechter, 2005, Marinescu et al., 2014, 2018], best-first (AOBF) [Marinescu and Dechter, 2007, Marinescu et al., 2020], Breadth-Rotating AND/OR Branch and Bound (BRAOBB) [Otten and Dechter, 2012]. These methods use upper bounds of max-marginals computed using different variants of Mini-Bucket Elimination (MBE) to guide the search. Weighted Mini-Bucket elimination improves the accuracy of the bound of traditional MBE using the Holder’s inequality [Liu and Ihler, 2011]. Another technique to tighten the resulting bound is to use cost-shifting schemes. Two such algorithms are Mini-Bucket Elimination with Moment Matching (MBE-MM) [Flerova et al., 2011] and Join Graph Linear Programming (JGLP) [Ihler et al., 2012]. While MBE-MM uses a single pass of cost-shifting locally within the mini-buckets, JGLP performs cost-shifting updates to the entire mini-bucket join graph iteratively.

Another class of methods are based on LP relaxations. LP relaxations based on message passing include tree re-weighted max-product [Wainwright et al., 2005, Kolmogorov, 2005, Werner, 2007, Jojic et al., 2010, Hazan and Shashua, 2010] and max-product linear programming [Globerson and Jaakkola, 2007, Sontag et al., 2008]. The other class of methods are the dual decomposition solvers [Komodakis et al., 2007, Sontag et al., 2011, 2012, Ping et al., 2015, Martins et al., 2015]. The connection between the solutions obtained using two methods is explored in Bauer et al. [2019]

In this work, we propose a different approach for MPE inference. Our method is based on an Incremental Build-Infer-Approximate paradigm (IBIA) that returns a set of bounded clique size partitions of the BN along with the calibrated forest of of clique trees (CTFs) corresponding to each partition [Bathla and Vasudevan, 2022]. Instead of the sum-product calibration used in Bathla and Vasudevan [2022], we use a max-product calibration along with suitable modifications to the Approximate step. Based on this, we propose a single search algorithm for MPE inference with the following features.

1.

We show that the max-marginal can be estimated by finding the maximum belief from any clique in the max-calibrated CT of the last partition. Based on experiments, we find that our algorithm gives very good estimates with an average error of $\pm 0.27$ in the log probability over 79 benchmarks for which the exact solution is available.
2.

We propose a single shot algorithm for decoding the MPE assignment. It is an iterative method in which the subset of variables for which an assignment is obtained is guaranteed to increase in every iteration. There are no issues of convergence. Even though it is a single shot algorithm, we were able to get a non-zero probability assignment with accuracy comparable to AOBB in majority of the testcases.
3.

Our code is written in Python, with Numpy and NetworkX libraries. The runtimes of our algorithm are very competitive and better in some cases, than the compiled C++ code for AOBB.

2 Background

A Bayesian network is a directed acyclic graph (DAG), $\mathcal{G}$ . Its nodes represent random variables $\mathcal{Y}=\{X_{1},X_{2},\cdots X_{n}\}$ with associated domains $D=\{D_{1},D_{2},\cdots D_{n}\}$ . It has directed edges from a subset of nodes $\textrm{Pa}_{X_{i}}=\{X_{k},k\neq i\}$ to $X_{i}$ , representing a conditional probability distribution (CPD) $\phi_{i}=P\{X_{i}|\textrm{Pa}_{X_{i}}\}$ . The BN represents the joint probability distribution (JPD) of $\mathcal{Y}$ , given by $P(\mathcal{Y})=\prod_{i=1}^{n}\phi_{i}$ . Let $E$ denote the set of instantiated or evidence variables, with $\mathcal{Y}=\{\mathcal{X},E\}$ . The task is to find the most probable explanation (MPE), defined as

\displaystyle\textrm{MPE}=\operatorname*{arg\,max}\limits_{\mathcal{X}}P(\mathcal{X},E=e)

In low treewidth networks, MPE can be obtained using the clique tree (CT) representation, which is a derived undirected graphical model obtained after moralization and chordal completion of the BN. The CT, which is also called the join tree or junction tree, is a hypertree with nodes $\{C_{1},C_{2},\cdots,C_{n}\}$ that are the set of maximal cliques in the chordal graph. An edge between $C_{i}$ and $C_{j}$ is associated with a sepset $S_{i,j}=C_{i}\cap C_{j}$ . Each factor $\phi_{i}$ is associated with a single clique $C_{i}$ such that $\textrm{Scope}(\phi_{i})\subseteq\mathcal{C}_{i}$ . The task of computing MPE is split into two parts. First, the max-product belief propagation (BP) algorithm [Chapter 10 of Koller and Friedman, 2009] is used to compute the maximum marginal probability, $MaxMarg$ , defined as

\displaystyle MaxMarg

\displaystyle=\max\limits_{\mathcal{X}}P(\mathcal{X},E=e)

This algorithm has two rounds of message passing along the edges of the CT, an upward pass (from the leaf nodes to the root node) and a downward pass (from the root node to the leaves). The CT obtained as a result of this algorithm is max-calibrated, which is defined as follows. Let $\beta(C_{i})$ , $\beta(C_{j})$ denote the beliefs associated with adjacent cliques $C_{i}$ and $C_{j}$ and $\mu_{ij}$ , the belief of the corresponding $S_{ij}$ . The CT is said to be max-calibrated if all pairs of adjacent cliques satisfy the following.

\max\limits_{C_{i}\setminus S_{i,j}}\beta(C_{i})=\max\limits_{C_{j}\setminus S_{i,j}}\beta(C_{j})=\mu(S_{i,j})

The $MaxMarg$ can be simply obtained by finding the maximum clique belief in any clique $C$ in the max-calibrated CT since,

\displaystyle MaxMarg=\max\limits_{C}\max\limits_{\mathcal{X}\setminus C}P(\mathcal{X},e)=\max\limits_{C}\beta(C)

Since a network could have disjoint DAGs, we use the term Clique Tree Forest (CTF) to denote the collection of corresponding CTs. The max-marginal for a BN with multiple DAGs is the product of max-marginals of all CTs $\in CTF$ .

Once $MaxMarg$ is computed, the decode step finds an MPE assignment using the max-calibrated CTF. Algorithm 1 has the traceback procedure. Set $S_{u}$ tracks the set of unassigned variables. For each CT in the CTF, any clique can be chosen as the root node ( $C_{root}$ ). To start with all variables in the root clique are unassigned. Since the global MPE assignment also satisfies the local optimality property [Koller and Friedman, 2009], a partial assignment can be obtained by finding the argmax of the clique belief. The overall MPE assignment is updated and all assigned variables are removed from set $S_{u}$ . All other cliques are iterated through in a pre-order traversal. For any clique $C$ , the state of the variables on the sep-sets with the preceding cliques are known. After reducing belief $\beta(C)$ over the known variable states, we find an assignment of the unassigned variables that maximizes the reduced clique belief.

Algorithm 1 Traceback (

CTF

)

CTF

: Max-calibrated CTF

2:Initialize:
MPE

=\leavevmode\nobreak\ <>

\triangleright

Hash table

<var:var.assignment>

S_{u}\leftarrow

Variables

\in CTF

\triangleright

Set of unassigned variables

3:for

CT\in CTF

C_{root}\leftarrow

Choose any clique in CT

5: for

C\in CT.Preorder\_Traversal(C_{root})

S_{u_{c}}\leftarrow C\cap S_{u}

\triangleright

Set of unassigned variables in clique C

7: if

S_{u_{c}}\neq\varnothing

then

\beta(C)

\leftarrow

Reduce belief

\beta(C)

over known states

\in

MPE

9: A

\leftarrow

argmax

\beta(C)

10: MPE[v] = A[v]

\forall v\in S_{u_{c}}

\triangleright

Update MPE assignment

11:

S_{u}

.remove(

S_{u_{c}}

)

12: end if

13: end for

14:end for

15:return MPE

2.1 IBIA framework

Refer to caption — Figure 1: Bayesian inference using the IBIA framework

The IBIA framework proposed in Bathla and Vasudevan [2021, 2022] presents a three-step approach to perform inference on a BN. The overall approach is illustrated in Figure 1. Each DAG $G_{i}\in BN$ is built, inferred and approximated separately. IBIA starts with a clique forest over the nodes that have no parents (referred to as primary inputs (PI)). An incremental approach is used to add as many nodes as possible until a preset bound ( $mcs_{p}$ ) on the clique size is reached. In this work, we define clique-size for a given clique $C_{i}$ as,

cs_{i}=\log_{2}\leavevmode\nobreak\ (\prod\limits_{\forall\leavevmode\nobreak\ v\leavevmode\nobreak\ \in\leavevmode\nobreak\ C_{i}}|D_{v}|)

where $|D_{v}|$ is the cardinality or the number of states in the domain of the variable $v$ . When the clique-size bound $mcs_{p}$ is reached, the CTF is calibrated using the standard join tree message passing algorithm [Koller and Friedman, 2009]. The CTF contains two sets of variables - variables that are parents to nodes that have not yet been added (referred to as interface nodes) and other variables (non-interface nodes). A combination of exact and approximate marginalization is used to lower clique sizes to a preset bound $mcs_{im}$ , thus allowing for addition of new nodes. This approximate CTF is used as the starting point for incremental construction of the CT for the next partition. This process is continued until all the nodes in the BN have been added to some CTF.

The algorithm returns an ordered set of partitions $\{R_{1},R_{2},\cdots R_{P}\}$ and the corresponding calibrated CTFs for each DAG $\in BN$ . The authors show that the CTFs obtained after the build and approximate stage contain valid CTs. Also, the approximation algorithm maintains consistency of within-clique beliefs. A trade-off between runtime and accuracy can be achieved using the input parameters, $mcs_{p}$ and $mcs_{im}$ .

3 Framework for MPE queries

The IBIA framework has been used to infer the partition function and the prior and posterior marginal probabilities of non-evidence variables in Bathla and Vasudevan [2022]. We have modified this framework for MPE queries. We use an iterative procedure that has the build, infer and approximate steps and a partial decode step that gives an MPE assignment for a subset of variables. In each iteration, the subset for which an assignment is obtained is extended. The iteration is continued until either all variables are assigned or an assignment is not possible. In the following sections, each of the steps are explained in more detail.

Step 1: Incremental Build: The CTF can be built incrementally using either of the algorithms detailed in Flores, Gámez, and Olesen [2002], Bathla and Vasudevan [2022]. We use the approach described in Bathla and Vasudevan [2022] since it works directly with the clique tree model and eliminates the need for computation of any other intermediate representation. Typically, it chooses a smaller graph chosen for re-triangulation than Flores et al. [2002].

Step 2: Inference: We use the two-pass Max-product Belief Propagation algorithm to calibrate the partition CTF. After calibration, all cliques in a CT will have the same maximum probability, and the beliefs of adjacent clique beliefs agree over the common variables [Koller and Friedman, 2009].

Step 3: Approximation:

As discussed, the nodes present in the CTF corresponding to a partition can divided into two sets: interface (IV) and non-interface nodes (NIV). The interface nodes are parents of the variables that have not yet been added. Therefore, these are required for building the next partition. We adopt a similar approximation methodology as suggested in Bathla and Vasudevan [2022] to reduce the clique sizes. We also denote the calibrated tree obtained after BP as $CTF_{in}$ and the approximated CTF as $CTF_{a}$ . The minimal subgraph of $CTF_{in}$ that connects the interface variables forms the initial $CTF_{a}$ . This graph may also contain some non-interface variables. We use a combination of exact and local max-marginalization to reduce the clique sizes to a preset bound $mcs_{im}$ . We now discuss both these steps in detail.

Exact max-marginalization: If a non-interface variable $n$ is present in a single clique, it is simply removed from the clique scope and the belief is modified as: $\beta(C^{{}^{\prime}})=\max\limits_{n.states}\beta(C)$ where, $C^{\prime}=C\setminus n$ . On the other hand, if $n$ is present in multiple cliques, the subtree over all containing cliques ( $ST_{n}$ ) is collapsed into a single clique ( $C_{c}$ ). Figure 2(a) shows an example. Cliques $C_{1},C_{2}$ containing $n$ are collapsed to form clique $C_{c}$ . Since this step could potentially result in a large clique, this is only performed if the size of the resultant clique $\leq mcs_{im}$ . The belief of collapsed clique is computed as shown below.

\beta(C_{c})=\max_{n.states}\left(\frac{\prod_{C\in{ST_{n}}}\beta(C)}{\prod_{SP\in{ST_{n}}}\mu(SP)}\right)

where, $SP$ denotes the sepsets in $ST_{n}$ .
This step preserves the joint beliefs i.e. variables in $CTF_{a}$ have the same belief as in $CTF_{in}$ .

2.

Local max-marginalization: In this step, variables present in cliques with size $>mcs_{im}$ are max-marginalized from individual cliques while ensuring (a) All CTs $\in CTF_{a}$ are valid CTs. (b) A connected CT $\in CTF_{in}$ remains connected in $CTF_{a}$ .

Let, $n$ be a variable present in large cliques. The subgraph of $CTF_{in}$ over all cliques containing $n$ is a connected tree since the input CTF is valid. To ensure that the Running Intersection Property (RIP) [Koller and Friedman, 2009] is satisfied after approximation, $n$ is retained in a connected subtree containing smaller cliques with sizes $\leq mcs_{im}$ . It is individually max-marginalized out from all other cliques. Figure 2(b) shows an example where variable $n$ is removed from cliques $C_{1},C_{2}$ and retained in cliques $C_{3},C_{4}$ . Local marginalization of $n$ from cliques $C_{i}$ and $C_{j}$ , and the corresponding sepset $S_{i,j}$ results in cliques $C_{i}^{\prime}=C_{i}\setminus n$ and $C_{j}^{\prime}=C_{j}\setminus n$ with sepset $S_{i,j}^{\prime}=S_{i,j}\setminus n$ . The corresponding beliefs are $\beta(C_{i}^{\prime})=\max\limits_{n.states}\beta(C_{i})$ , $\beta(C_{j}^{\prime})=\max\limits_{n.states}\beta(C_{j})$ , $\mu(S_{i,j}^{\prime})=\max\limits_{n.states}\mu(S_{i,j})$ . While updating the beliefs in this manner preserves the joint beliefs for variables present within the clique, the joint beliefs of variables present in disjoint cliques are approximated.

Once the clique sizes are reduced, we re-assign the clique factors based on calibrated clique and sep-set beliefs.

Proposition 1.

If all CTs in $CTF_{in}$ are valid and max-calibrated, then all CTs in $CTF_{a}$ are also valid and max-calibrated.

Proposition 2.

Approximation algorithm preserves the maximum belief and the within-clique beliefs of all cliques in $CTF_{a}$ .

We discuss the proofs for Propositions 1, 2 in Section 6.

Algorithm 2 shows the overall algorithm. At all points in the method, we remove any non-maximal cliques if generated and re-connect their neighbors. We start by identifying the set of interface and non-interface variables present in $CTF_{in}$ . We initialize $CTF_{a}$ as the minimal subgraph of $CTF_{in}$ that connects the interface variables. We first attempt to reduce clique sizes using exact max-marginalization of non-interface variables. NIVs present in a single clique are removed, and the corresponding clique beliefs are max-marginalized. If any resultant clique is non-maximal, it is removed, and the neighbors are re-connected. Wherever possible within the clique size constraints, NIVs present in multiple cliques are max-marginalized after collapsing the cliques (lines 4-7).

Next, we find the set of large-sized cliques ( $L_{c}$ ) with size $>mcs_{im}$ . All variables in these cliques are grouped into NIVs and IVs and stored in $N$ . We first try to locally max-marginalize non-interface variables. This is done as follows. First, both sets of variables are individually sorted in the increasing order of the number of containing cliques. Starting with NIVs, we pop a variable ( $n$ ) from this list and identify the subgraph of $CTF_{a}$ over cliques containing this variable ( $ST_{n}$ ). Since the input CTF satisfies RIP, $ST_{n}$ is a connected tree. $ST_{n}$ can be further divided into disjoint subtrees over cliques with size $\leq mcs_{im}$ . We choose the largest subtree ( $ST_{r}$ ) and retain the variable $n$ in all cliques in this subtree (lines 13-15). Since we want a connected CT to remain connected, the variable is removed only if no sepset size becomes zero (lines 15-18).

If the clique sizes remain above $mcs_{im}$ after processing NIVs, we start with the IVs. If $n$ is an IV, it must be retained in atleast one clique. Therefore, we ignore interface variables that are only present in large cliques (lines 19-20). Otherwise, we max-marginalize all clique and sep-set beliefs associated with the cliques that are not present in $ST_{r}$ . The list $L_{c}$ is updated by removing cliques where the size is now within the preset limit. This process is repeated until either clique list $L_{c}$ or the variable set $N$ becomes empty.

Once clique-sizes are reduced, we reassign the clique factors (lines 26-27). For each clique tree in $CTF_{a}$ , we pick a root node and perform a pre-order traversal from the root node to all other nodes in the CT. The factor associated with the root node is the corresponding clique belief. For all other nodes, the factor for an unvisited neighbor $C_{j}$ of clique $C_{i}$ is assigned as $\beta(C_{j})=\frac{\beta(C_{i})}{\mu(S_{ij})}$ .

Algorithm 2 ApproximateCTF (

CTF,\leavevmode\nobreak\ mcs_{im}

)

CTF_{in}

: Input clique tree forest

mcs_{im}

: Maximum clique size limit for interface

CTF_{in}

CTF_{a}

: Approximate

CTF_{in}

IV,NIV\leftarrow

Divide nets

\in CTF_{in}

into interface and non-interface variables

CTF_{a}\leftarrow

Minimal subgraph of

CTF_{in}

connecting

IV

NIV\leftarrow Nets\in CTF_{a}\setminus IV

\triangleright

Updated set of Non-interface variables

\triangleright

Exact max-marginalization

7:Max-marginalize NIVs present in a single clique in

CTF_{a}

8:Remove resultant non-maximal cliques, reconnect neighbors

9:Collapse cliques to max-marginalize NIVs in multiple cliques if

cs<mcs_{im}

10:

\triangleright

Local max-marginalization to trim cliques with size

>mcs_{im}

11:

L_{c}\leftarrow

List of cliques with size

>mcs_{im}

12:

N\leftarrow\{NIV,IV\}

\triangleright

Sorted lists of variables present in cliques in

L_{c}

13:while

L_{c}\neq\varnothing

N\neq\varnothing

14:

n=N.

pop net present in least number of cliques

15:

ST_{n}\leftarrow

Subgraph of

CTF_{a}

over cliques containing

n

16:

ST_{r}\leftarrow

Choose connected subtree

\in ST_{n}

s.t. max-clique-size

\leq mcs_{im}

17:

\triangleright

Check if CT will remain connected if

n

is removed

18:

L_{m}\leftarrow

Cliques

\in ST_{n}\leavevmode\nobreak\ \setminus\leavevmode\nobreak\

Cliques

\in ST_{r}

19:

ms\leftarrow

Minimum sep-set size for cliques in

L_{m}

20: if

ms==1

then continue

21:

\triangleright

Check if interface variables are retained in atleast one clique

22: if

ST_{r}.isEmpty()\leavevmode\nobreak\ \&\&\leavevmode\nobreak\ n\in IV

then continue

23: Locally max-marginalize beliefs corresponding to all cliques in

L_{m}

24: Remove resultant non-maximal cliques, reconnect neighbors

25:

L_{c}

.remove(

C

) if

C

.size

\leq mcs_{im}

\forall\leavevmode\nobreak\ C\in L_{m}

26:end while

27:

\triangleright

Re-parameterize the joint distribution

28:Re-assign clique factors in

CTF_{a}

using clique and sep-set beliefs

29:return

CTF_{a}

Step 4: Decode: The assignment of variables in the last partition is used as the partial assignment of MPE. This assignment is obtained from the last partition CT using Algorithm 1.

3.1 Inference of $MaxMarg$

Let $CTF_{i}$ denote the CTF for the $i^{th}$ DAG in the BN. The $n_{i}$ partitions of the $i^{th}$ DAG are denoted $\{R_{i1},R_{i2},\ldots,R_{i{n_{i}}}\}$ , with corresponding CTFs $CTF_{i}=\{CTF_{i,1},CTF_{i,2},\ldots,CTF_{i,{n_{i}}}\}$ .

Proposition 3.

The max-belief of any clique in $CT\in CTF_{i,k}$ is the estimate of the maximum probability in the joint distribution over all variables added to $CT$ in partitions $R_{ij},j\leq k$ .

Proof.

Let, $C$ be a clique in the calibrated $CT\in CTF_{i,1}$ and $\mathcal{X}_{1}$ be the set of variables in $CT$ . Cliques in $CT$ are assigned CPDs $\{\phi_{v},v\in\mathcal{X}_{1}\}$ as factors. Therefore, the following holds true once $CT$ is calibrated.

	$\displaystyle\beta(C)$	$\displaystyle=\max\limits_{\mathcal{X}_{1}\setminus C}\prod\limits_{v\in\mathcal{X}_{1}}\phi_{v}=\max\limits_{\mathcal{X}_{1}\setminus C}P(\mathcal{X}_{1})$
	$\displaystyle\implies$	$\displaystyle\max\limits_{C}\beta(C)=\max\limits_{\mathcal{X}_{1}}P(\mathcal{X}_{1})$

Let $CTF_{k,1}^{{}^{\prime}}$ be the CTF obtained after approximating $CTF_{k,1}$ . While the within clique beliefs are preserved while local max-marginalization (by Proposition 2), the joint beliefs of variables belonging to different cliques are approximated. Let, $\mathcal{X}_{12}$ be the set of variables in $CTF_{k,1}^{{}^{\prime}}$ and $P^{\prime}(\mathcal{X}_{12})$ denote the joint belief obtained from the approximate beliefs in $CTF_{k,1}^{{}^{\prime}}$ as follows. Let, $Q(\mathcal{X}_{1})$ denote the distribution such that

\displaystyle\max\limits_{\mathcal{X}_{1}\setminus\mathcal{X}_{12}}Q(\mathcal{X}_{1})=P^{\prime}(\mathcal{X}_{12})=\dfrac{\prod\beta^{{}^{\prime}}}{\prod\mu^{{}^{\prime}}}

Let, $\mathcal{X}_{2}$ , denote the set of new variables added to the corresponding $CT\in CTF_{i,2}$ in the second partition. Factors assigned to cliques in CT are the CPDs of the new nodes $\{\phi_{v},\leavevmode\nobreak\ \forall v\in\mathcal{X}_{2}\}$ and re-parameterized beliefs $\{\beta^{{}^{\prime}}_{1},\frac{\beta^{{}^{\prime}}_{2}}{\mu^{{}^{\prime}}_{12}},\ldots\}$ . Therefore, after calibration of CT, the following holds true for any clique $C\in CT$ .

	$\displaystyle\beta(C)$	$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{12}\}\setminus C}\prod\limits_{v\in\mathcal{X}_{2}}\phi_{v}\cdot\frac{\prod\beta^{{}^{\prime}}}{\prod\mu^{{}^{\prime}}}$
		$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{12}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\cdot P^{\prime}(\mathcal{X}_{12})$
		$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{12}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\cdot\max\limits_{\mathcal{X}_{1}\setminus\mathcal{X}_{12}}Q(\mathcal{X}_{1})$
		$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{1}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\cdot Q(\mathcal{X}_{1})$

Since variables in $\mathcal{X}_{1}$ are non-descendants of $\mathcal{X}_{2}$ in the BN, ${\mathcal{X}_{2}\perp\mathcal{X}_{1}\leavevmode\nobreak\ |\leavevmode\nobreak\ Par_{\mathcal{X}_{2}}}$ . Therefore,

	$\displaystyle P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ \mathcal{X}_{1},Par_{\mathcal{X}_{2}})$	$\displaystyle=P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})$
	$\displaystyle P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ \mathcal{X}_{1})$	$\displaystyle=P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\leavevmode\nobreak\ \leavevmode\nobreak\ \because\leavevmode\nobreak\ Par_{\mathcal{X}_{2}}\subset\mathcal{X}_{1}$

Therefore, the clique belief can be written as,

	$\displaystyle\beta(C)$	$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{1}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ \mathcal{X}_{1})\cdot Q(\mathcal{X}_{1})$
		$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{1}\}\setminus C}P^{\prime\prime}(\mathcal{X}_{2},\mathcal{X}_{1})$
	$\displaystyle\max\limits_{C}\beta(C)$	$\displaystyle=\max\limits_{\mathcal{X}_{2},\mathcal{X}_{1}}P^{\prime\prime}(\mathcal{X}_{2},\mathcal{X}_{1})$

A similar procedure is repeated in each successive partition. Therefore, the max-belief in the calibrated CT in partition $R_{ik}$ approximates the maximum probability of the joint distribution over all variables added to it in partitions $R_{ij},j\leq k$ . ∎

Theorem 1.

The product of the max-belief of the CTs corresponding to the last partition of all DAGs is the estimate of $MaxMarg$ .

Proof.

For each DAG, $G_{i}$ , the last partition CTF contains a single CT since we never break a connected CT. Using Proposition3, the maximum probability ( $MaxMarg_{i}$ ) for this CT is the estimate of the maximum probability of all variables in $G_{i}$ . This value is exact when there is only one partition and may be approximate when there are multiple partitions. The product $\prod_{i}MaxMarg_{i}$ is the estimate of the overall max-marginal $MaxMarg$ . ∎

Based on Theorem 1, Algorithm 3 computes the overall max-marginal.

Algorithm 3 FindMaxMarg (

L_{CTF}

)

L_{CTF}

: List of clique tree forest corresponding to all DAGs

\in

MaxMarg

: Estimated max-marginal over non-evidence variables

\in

MaxMarg=1.0

4:for

L_{i}\in L_{CTF}

\triangleright

L_{i}

: List of partition CTFs for the i^th DAG

5: CT

\leftarrow L_{i}[-1]

\triangleright

Calibrated CT for the last partition

C\leftarrow

Choose any clique in CT

MaxMarg=MaxMarg\times\max\limits_{C}\beta(C)

\triangleright

Multiply with max clique belief

8:end for

9:return

MaxMarg

3.2 MPE Decoding and complexity

Algorithm 4 MPE inference procedure

1:Initialize: MPE

\leftarrow\{\}

;

M=\{\textrm{MPE},E\}

;

2:Simplify the BN based on

M

\triangleright

Run the IBIA framework to partition the BN

L_{CTF}\leftarrow

IBIA.run(

BN,\leavevmode\nobreak\ mcs_{p},\leavevmode\nobreak\ mcs_{im}

)

\triangleright

List containing calibrated CTFs for all partitions

\triangleright

Find partial assignment from last partition

M_{p}

\leftarrow

Traceback (

L_{CTF}

[-1])

\triangleright

Algorithm 1

7:MPE.update(

M_{p}

)

8:if Number of partitions

==

1 then

9: return MPE

10:else

11: Goto Line2

12:end if

Algorithm 4 shows the overall procedure. All known states are contained in list $M$ , initialized to evidence variables. The BN structure is simplified by removing all outgoing edges of variables with known states, and the CPDs are reduced over the known states. Next, the IBIA framework is run to partition the reduced BN based on clique size constraints $mcs_{p}$ and $mcs_{im}$ . The framework returns the list of calibrated CTFs for all partitions. We find a partial assignment ( $M_{p}$ ) over variables in the last partition using the traceback method described in Algorithm 1. The overall MPE assignment is updated by adding the decoded states in $M_{p}$ . If multiple partitions are obtained, the assignment from the last partition is incomplete. Therefore, the partial assignment is added to $M$ , and the process is repeated until the complete assignment is obtained.

Our method performs a single search operation over the set of possible assignments for all variables in the BN. Due to the approximations involved while partitioning using IBIA, the beliefs in the last partition are approximate. Therefore, the partial assignment obtained in the last partition is not guaranteed to be accurate.

The complexity can be analyzed as follows. Let $I$ be the number of iterations required to get an assignment of all variables, and $P$ be the maximum number of partitions obtained over all iterations. The time complexity of incremental clique tree formation is polynomial in the number of variables present in the cliques that are impacted by the addition of new nodes. For each partition, both calibration using max-product BP and max-marginalization of beliefs are exponential in the max-clique size $mcs_{p}$ . Similarly, finding the MPE assignment over variables in the last partition involves finding the argmax over individual cliques, which is also exponential in the max-clique size. Therefore, the overall time complexity is upper bounded by $O(I\cdot P\cdot 2^{mcs_{p}})$ .

4 Results

All experiments presented here were carried out on 3.7-GHz Intel i7-8700 Linux system with 64-GB memory. A time limit of one hour was set for each simulation. For comparison, we use methods Weighted Mini-Bucket (WMB) [Liu and Ihler, 2011], Join Graph Linear Programming (JGLP) [Ihler et al., 2012] from the Merlin library and the AND/OR Branch-and-Bound (AOBB) [Dechter and Mateescu, 2007, Marinescu and Dechter, 2009] method from the Daoopt library. The source codes for both libraries written in C++ are publicly available [Marinescu, 2016, Otten, 2010]. Our code has been implemented in Python3 with NetworkX and NumPy libraries. All methods reduce the BN based on inherent determinism and local structure. For a fair comparison, we also simplify the BN based on known evidence states and determinism present in CPDs. We evaluate our approach on three datasets, namely, $Pedigree$ , $BN$ , $Grid\mbox{-}BN$ Ihler [2006], Dechter [2006] included in the Probabilistic Inference Challenge [Elidan, 2011] and several UAI approximate inference challenges. We evaluated all instances available in each dataset and ignored ones that had a single partition. We use solutions provided by Ihler [2006] as the exact MPE solution ( $X^{*}_{Exact}$ ).

We denote the MPE assignment obtained using an approximate inference algorithm as $X^{*}_{Alg}$ . The probability of a complete assignment is simply the product of the CPDs reduced over the given state. For instances where exact solutions are available, we measure the error in the estimated max-marginals and the probability of the MPE assignment and denote it as,

	$\displaystyle\Delta_{MaxMarg}$	$\displaystyle=\log P(MaxMarg_{Alg})-\log P(X^{*}_{Exact})$
	$\displaystyle\Delta_{MPE}$	$\displaystyle=\log P(X^{}_{Alg})-\log P(X^{}_{Exact})$

The runtime parameters used for various codes were as follows. We set $mcs_{p}=20$ and $mcs_{im}=15$ in our code, as a trade-off between run time and accuracy. However, $mcs_{im}$ is soft constraint and its value is decremented until the network can be partitioned successfully. For WMB, JGLP and AOBB, we tried an $ibound$ of 10 and 20. Solutions for larger number of testcases were obtained for WMB with $ibound=20$ and for AOBB with $ibound=10$ . Therefore, we used $ibound=20$ for WMB and JGLP. and $ibound=10$ for AOBB. For AOBB, we used the Mini-Bucket Elimination with Moment Matching (MBE-MM) heuristic [Flerova et al., 2011] to estimate bounds and enabled the use of the stochastic greedy ordering scheme.

Table 1: Comparison of errors in log probabilities of MPE assignments (

\Delta_{MPE}

) obtained using different methods.

P

denotes the maximum number of partitions and

I

denotes the number of iterations required to obtain a complete assignment with IBIA. Entries are marked as ‘N’ if either no solution is obtained in 1hr or the probability of the solution obtained is zero.

	Network	$I$	$P$	IBIA	AOBB	WMB	JGLP	AOBB	WMB	JGLP	IBIA
	Network	$I$	$P$	$\mathbf{\Delta_{MPE}}$				Runtime (s)
\@slowromancapi@	BN_127	2	19	0	0	-53.65	0	0.09	242	216	13
	BN_133	3	17	0	0	-40.02	0	0.06	247	231	14
	BN_131	3	17	0	0	-41.02	0	0.04	220	205	13
	BN_134	3	15	0	0	-26.17	0	0.1	245	232	12
	BN_130	2	18	0	0	-61.82	-0.17	1	239	243	10
	BN_126	3	19	0	0	-28.40	-19.48	1	208	193	13
	BN_129	3	15	0	0	-39.66	-6.40	0.2	196	216	10
	BN_132	2	13	-0.31	0	-39.37	-4.65	37	237	228	9
	BN_15	2	6	-0.09	0	-2.87	-0.66	0.5	106	72	1
	pedigree30	4	6	0	0	-2.71	0	7	260	423	14
	pedigree44	3	8	0	0	-7.63	-0.97	492	816	1318	6
	pedigree13	4	11	-1.27	0	-11.09	-1.68	3600	1215	571	10
\@slowromancapii@	BN_60	2	6	0	0	N	N	41	391	271	3
	BN_74	4	10	0	0	N	N	798	50	270	9
	pedigree23	2	3	0	0	N	N	0.1	25	3600	1
	pedigree42	4	7	-0.09	0	N	N	273	63	731	3
	pedigree7	5	10	-0.37	0	N	-2.68	3600	3600	2467	10
	pedigree9	4	9	-0.95	0	N	-1.03	379	808	513	10
	pedigree34	4	10	-1.18	0	N	N	3600	1505	3601	8
	pedigree41	3	9	-1.06	0	N	N	3600	47	3601	8
	pedigree33	2	3	-1.91	0	N	-0.80	6	2276	460	3
	pedigree31	5	13	-2.25	-0.45	N	-3.34	3600	3601	1743	12
	pedigree40	4	11	-1.88	-7.89	N	N	3600	90	196	12
	pedigree51	5	10	0	-4.68	N	-2.30	3600	1866	750	11

Table 2: Comparison of the logarithm of MPE assignment probabilities obtained with different methods.

P

denotes the maximum number of partitions and

I

	Network	$I$	$P$	IBIA	AOBB	WMB	JGLP	AOBB	WMB	JGLP	IBIA
	Network	$I$	$P$	$\log P(X^{*}_{Alg})$				Runtime (s)
\@slowromancapi@	75-21-5	2	3	-13.28	-13.28	N	-13.37	21	229	140	2
	90-20-5	2	2	-5.70	-5.70	N	-6.62	0.3	186	122	1
	75-22-5	2	5	-15.61	-15.61	N	-16.46	6	227	168	1
	75-23-5	3	6	-15.43	-15.43	N	-16.40	82	299	188	2
	75-24-5	3	5	-16.08	-16.08	N	-16.87	37	331	211	2
	75-26-5	2	6	-21.89	-21.89	N	-23.06	3600	417	270	4
	75-25-5	2	6	-20.84	-20.84	N	N	520	327	235	3
	90-30-5	3	5	-13.12	-13.12	N	N	1428	456	378	4
	90-34-5	3	3	-13.29	-13.29	N	N	898	645	489	4
\@slowromancapii@	BN_72	17	22	-193.89	N	N	N	3600	3600	2570	141
	BN_70	8	21	-126.62	-136.58	N	N	3600	2606	1306	138
	BN_71	12	23	-157.85	N	N	N	3600	1785	471	129
	BN_73	13	21	-169.45	-180.98	N	N	3600	3600	2079	140
	90-38-5	5	11	-19.87	N	N	N	3600	906	581	13
	90-42-5	6	9	-20.94	N	N	N	3600	1101	751	16
	90-46-5	3	10	-28.32	N	N	N	3600	1324	923	13
	90-50-5	5	13	-29.16	N	N	N	3600	1547	1055	24

Table 1 shows a detailed comparison of the error in MPE assignments ( $\Delta_{MPE}$ ) and the required runtime for different inference methods for a subset of benchmarks. The results for others are similar. For each testcase, the table also shows the maximum number of partitions and the number of iterations required to obtain a complete MPE assignment with IBIA. We check for accuracy of log probabilities upto two decimal places and report errors $<0.01$ as 0. Testcases for which solutions obtained by all methods were close to the optimal solution are not tabulated here in the interest of space. We divide testcases into two categories based on the results. For benchmarks in Category \@slowromancapi@, all methods achieve a non-zero probability solution. IBIA gives close to optimal solutions for most of the benchmarks in this category. Category \@slowromancapii@ shows testcases where IBIA and AOBB give solutions, but either WMB or JGLP fails. IBIA achieves good estimates in most testcases. For testcases $BN\_74,\leavevmode\nobreak\ pedigree7,\leavevmode\nobreak\ pedigree51$ , while IBIA achieves close to the optimal solution in $<11$ seconds, AOBB performs repeated search operations until the timeout constraint is reached. While the error obtained with IBIA is $>1$ in five testcases, in four out of five of these testcases, AOBB also performs repeated search until timeout. IBIA performs better than JGLP in all cases except $pedigree33$ . For testcases $pedigree40$ and $pedigree51$ , IBIA provides a higher probability solution than other methods.

In Table 2, we compare the logarithm of the MPE probabilities obtained with the different methods for benchmarks where the exact solution is not available. We ignore cases where IBIA, AOBB, and WMB/JGLP give a common solution while tabulation. For testcases in the first category in Table 2, IBIA gives the same solution as AOBB in $<5$ seconds. The estimate obtained in all these cases is better than both WMB and JGLP. For networks in the second category, IBIA performs better than all other methods. While AOBB achieves some non-zero solution in testcases $BN\_70$ and $BN\_73$ , the solution obtained with IBIA has a higher probability than the AOBB solution, with the difference being around 10. In all other cases, IBIA is able to find a possible MPE assignment with non-zero probability while all other methods obtain a zero-probability solution within the set time limit.

Since IBIA is written in Python and available tools are written in C++, the runtimes reported here are not directly comparable. That said, the runtime for IBIA is an order of magnitude lower than JGLP and WMB in most cases. They are also better than AOBB in many cases. This is expected since IBIA does not search for solutions. A single partial assignment is obtained in each iteration.

Table 3 summarizes the total number of instances that require multiple partitions and the number of instances that provide non-zero probability assignments within 1hr for different benchmarks. Out of 117 networks, IBIA provides solutions in 100 testcases which is better than JGLP and WMB, but slightly lower than 108 cases solved by AOBB.

The approximate CT obtained after local max-marginalization depends on the choice of variables that are removed to reduce the clique size. For the results in Tables 1 and2, we prioritize variables present in the least number of cliques. A random assignment is chosen in case of ties. The CTF obtained after approximation will be different if a different set of variables is chosen for removal. We tried three other orderings without any prioritization. Although the change in $MaxMarg$ was not much in most cases, there were significant changes in the MPE. The results for the benchmarks where an improvement was obtained are shown in Table 4, which has the best solution obtained. This gives rise to the possibility of searching for a solution over the approximation space. An advantage of this is that the approximation instances are independent of each other and can be run in parallel.

Table 3: Number of testcases where a non-zero solution is obtained using various methods with time limit set to 1hr.

	#Inst.	WMB	JGLP	AOBB+	IBIA
				MBE-MM
BN_UAI	63	38	42	58	48
Pedigree	22	7	12	22	20
GridBN	32	8	22	28	32
Total	117	53	76	108	100

Table 4: Comparison of errors in MPE probabilities obtained with IBIA using different orderings for selection of variables while local max-marginalization.

\Delta_{IBIA}

is the error obtained using structure based heuristics.

\Delta_{IBIA_{Best}}

is error of the best estimate obtained.

Network	$\mathbf{\Delta_{IBIA}}$	$\mathbf{\Delta_{IBIA_{Best}}}$	Network	$\mathbf{\Delta_{IBIA}}$	$\mathbf{\Delta_{IBIA_{Best}}}$
BN_115	-1.51	0	pedigree31	-2.25	0
BN_12	-0.38	-0.19	pedigree33	-1.91	0
BN_132	-0.31	0	pedigree34	-1.18	0
BN_69	N	-0.64	pedigree37	-0.46	0
BN_90	-16	0	pedigree40	-1.88	-0.65
pedigree13	-1.27	0	pedigree41	-1.06	0
pedigree18	N	0	pedigree42	-0.09	0
pedigree19	N	-0.01	pedigree50	-1.25	-1.07
pedigree20	-0.53	-0.02	pedigree9	-0.95	0
pedigree25	-2.86	0

To evaluate the quality of the estimate of max-marginal obtained with our method, we compare it with the final upper bounds achieved with JGLP, WMB, and MBE-MM. Figure 3 shows the histogram of error in estimates obtained with different methods. We observe that the error with IBIA is within $\pm 1$ in 70 instances out of 79 cases. However, the upper bound with all other methods is in this range in slightly more than 40 testcases. The average error in the estimate of max-marginal obtained with IBIA is 0.27 as opposed to 1.12, 1.23, 2.17 for the upper bounds estimated using existing metrics WMB, JGLP, and MBE-MM. Though the estimate of max-marginal obtained with IBIA is not necessarily an upper bound, it provides good estimates when compared to the existing heuristics. Therefore, this method can be used for quick evaluation of different partial assignments in search algorithms like Branch and Bound.

5 Conclusions

We propose a single-shot algorithm for MPE inference based on an Incremental Build-Infer-Approximate paradigm. It is an iterative method in which the subset of variables for which an assignment is obtained is guaranteed to increase in every iteration. In majority of cases, the number of iterations required is $<5$ . Even though it is a single shot algorithm, it gives valid assignments in 100 out of 117 testcases. In majority of the testcases, the accuracy is comparable to the branch and bound based technique AOBB. Overall the runtimes are competitive, with significantly lower runtimes than AOBB in several cases. We also show that our method achieves better estimates of the max-marginals when compared to the upper bounds obtained with several mini-bucket methods. Therefore, this approach can possibly be used in search algorithms like Branch and Bound for quick evaluation of different partial assignments. We show some initial experiments to improve the accuracy of the approach by using different heuristics for approximation.

We obtain zero probability solutions in 14 out of a total of 117 benchmarks, calling for a more systematic search over the approximation space. In many of these cases, there were multiple partial assignments, and we chose one of them randomly. A search over these could also possibly give better solutions.

References

Bathla and Vasudevan [2021] Shivani Bathla and Vinita Vasudevan. A memory constrained approximate bayesian inference approach using incremental construction of clique trees. TechRxiv. https://doi.org/10.36227/techrxiv.14938275.v1, Jul 2021.
Bathla and Vasudevan [2022] Shivani Bathla and Vinita Vasudevan. IBIA: Bayesian inference via incremental build-infer-approximate operations on clique trees. arXiv preprint arXiv:2202.12003, 2022.
Bauer et al. [2019] Alexander Bauer, Shinichi Nakajima, Nico Görnitz, and Klaus-Robert Müller. Partial optimality of dual decomposition for map inference in pairwise mrfs. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1696–1703, 2019.
Dechter [2006] Rina Dechter. Uai model files and solutions. https://www.ics.uci.edu/~dechter/softwares/benchmarks/
Mpe_Problme_Sets/, 2006. Accessed: 2021-10-15.
Dechter and Mateescu [2007] Rina Dechter and Robert Mateescu. And/or search spaces for graphical models. Artificial intelligence, 171(2-3):73–106, 2007.
Elidan [2011] Gal Elidan. The probabilistic inference challenge (pic2011). https://www.cs.huji.ac.il/project/PASCAL/, 2011.
Flerova et al. [2011] Natalia Flerova, Alexander Ihler, Rina Dechter, and Lars Otten. Mini-bucket elimination with moment matching. In NIPS Workshop DISCML, 2011.
Flores et al. [2002] M Julia Flores, José A Gámez, and Kristian G Olesen. Incremental Compilation of Bayesian Networks. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pages 233–240, 2002.
Globerson and Jaakkola [2007] Amir Globerson and Tommi Jaakkola. Fixing max-product: Convergent message passing algorithms for map lp-relaxations. Advances in neural information processing systems, 20, 2007.
Hazan and Shashua [2010] Tamir Hazan and Amnon Shashua. Norm-product belief propagation: Primal-dual message-passing for approximate inference. IEEE Transactions on Information Theory, 56(12):6294–6316, 2010.
Ihler [2006] Alexander Ihler. Uai model files and solutions. http://sli.ics.uci.edu/~ihler/uaidata/, 2006. Accessed: 2021-10-15.
Ihler et al. [2012] Alexander T Ihler, Natalia Flerova, Rina Dechter, and Lars Otten. Join-graph based cost-shifting schemes. Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012.
Jojic et al. [2010] Vladimir Jojic, Stephen Gould, and Daphne Koller. Accelerated dual decomposition for map inference. In Proceedings of the 27th International Conference on International Conference on Machine Learning, page 503–510, 2010.
Koller and Friedman [2009] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
Kolmogorov [2005] Vladimir Kolmogorov. Convergent tree-reweighted message passing for energy minimization. In International Workshop on Artificial Intelligence and Statistics, pages 182–189, 2005.
Komodakis et al. [2007] Nikos Komodakis, Nikos Paragios, and Georgios Tziritas. Mrf optimization via dual decomposition: Message-passing revisited. In 2007 IEEE 11th International Conference on Computer Vision, pages 1–8, 2007.
Liu and Ihler [2011] Qiang Liu and Alexander Ihler. Bounding the partition function using holder’s inequality. In Proceedings of the 28th International Conference on Machine Learning, pages 849–856, 2011.
Marinescu [2016] Radu Marinescu. Merlin. https://github.com/radum2275/merlin/, 2016. Accessed: 2021-10-15.
Marinescu and Dechter [2005] Radu Marinescu and Rina Dechter. And/or branch-and-bound for graphical models. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, pages 224–229, 2005.
Marinescu and Dechter [2007] Radu Marinescu and Rina Dechter. Best-first and/or search for most probable explanations. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence, pages 259–266, 2007.
Marinescu and Dechter [2009] Radu Marinescu and Rina Dechter. Memory intensive and/or search for combinatorial optimization in graphical models. Artificial Intelligence, 173(16-17):1492–1524, 2009.
Marinescu et al. [2014] Radu Marinescu, Rina Dechter, and Alexander T Ihler. And/or search for marginal map. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, pages 563–572, 2014.
Marinescu et al. [2018] Radu Marinescu, Junkyu Lee, Rina Dechter, and Alexander Ihler. And/or search for marginal map. Journal of Artificial Intelligence Research, 63:875–921, 2018.
Marinescu et al. [2020] Radu Marinescu, Akihiro Kishimoto, and Adi Botea. Parallel and/or search for marginal map. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 10226–10234, 2020.
Martins et al. [2015] André FT Martins, Mário AT Figueiredo, Pedro MQ Aguiar, Noah A Smith, and Eric P Xing. Ad3: Alternating directions dual decomposition for map inference in graphical models. The Journal of Machine Learning Research, 16(1):495–545, 2015.
Otten [2010] Lars Otten. Daoopt: Distributed and/or optimization. https://github.com/lotten/daoopt/, 2010. Accessed: 2022-02-10.
Otten and Dechter [2012] Lars Otten and Rina Dechter. Anytime and/or depth-first search for combinatorial optimization. Ai Communications, 25(3):211–227, 2012.
Park and Darwiche [2004] James D Park and Adnan Darwiche. Complexity results and approximation strategies for map explanations. Journal of Artificial Intelligence Research, 21:101–133, 2004.
Ping et al. [2015] Wei Ping, Qiang Liu, and Alexander T Ihler. Decomposition bounds for marginal map. Advances in neural information processing systems, 28, 2015.
Shimony and Charniak [1990] Solomon Eyal Shimony and Eugene Charniak. A new algorithm for finding map assignments to belief networks. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, pages 185–196, 1990.
Sontag et al. [2008] David Sontag, Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss. Tightening lp relaxations for map using message passing. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, pages 503–510, 2008.
Sontag et al. [2011] David Sontag, Amir Globerson, and Tommi Jaakkola. Introduction to dual composition for inference. In Optimization for Machine Learning. MIT Press, 2011.
Sontag et al. [2012] David Sontag, Do Kook Choe, and Yitao Li. Efficiently searching for frustrated cycles in map inference. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, pages 795–804, 2012.
Wainwright et al. [2005] Martin J Wainwright, Tommi S Jaakkola, and Alan S Willsky. Map estimation via agreement on trees: message-passing and linear programming. IEEE transactions on information theory, 51(11):3697–3717, 2005.
Werner [2007] Tomas Werner. A linear programming approach to max-sum problem: A review. IEEE transactions on pattern analysis and machine intelligence, 29(7):1165–1179, 2007.

6 Supplementary Material

In the following, we use the terms $CTF_{in}$ and $CTF_{a}$ to denote the input and output CTF of Algorithm 2. The line numbers indicated here refer to line numbers of Algorithm 2.

Proof for Proposition 1 $CTF_{a}$ is initialized as the subgraph of $CTF_{in}$ which is valid and max-calibrated (line 2). Each CT $\in CTF_{a}$ obtained after the subsequent steps is valid because of the following reasons.

•

It contains only maximal cliques since any non-maximal cliques generated while simplification of $CTF_{a}$ are removed (lines 6,22).
•

It contains disjoint trees. This is because neither of exact and local max-marginalization introduce any loops. In the exact max-marginalization step, neighbors of the collapsed and non-maximal cliques are reconnected to $CTF_{a}$ , which means that the tree structure of the CT is preserved. In the local max-marginalization step, variables are removed only if no sepset size becomes zero, ensuring that a connected CT remains a connected tree (lines 15-18).
•
It satisfies the running intersection property (RIP) [Koller and Friedman, 2009]. This is because
- –
  
  During exact max-marginalization, sepsets in the $CTF_{a}$ are preserved while re-connecting the neighbors of the collapsed cliques to the new clique, thereby ensuring that RIP is satisfied. Similarly, since neighbors of non-maximal cliques which are removed are connected to the containing cliques, sepsets are preserved.
- –
  
  During local max-marginalization, variables are retained in a single connected component of $CTF_{a}$ (line 14). Therefore, the subgraph of $CTF_{a}$ over any net is connected.

•

The re-assignment of factors results in a valid distribution.
Proof: After exact max-marginalization, all the clique and sepset beliefs are preserved. Therefore, the resultant clique is also calibrated. Let $C_{i}$ and $C_{j}$ be two adjacent cliques in $CTF_{in}$ with sepset $S_{i,j}$ . After local max-marginalization of a node $n$ , we get the corresponding cliques $C_{i}^{\prime},C_{j}^{\prime}\in CTF_{a}$ , with sepset $S_{i,j}^{\prime}$ . Since the result is invariant with respect to the order in which variables are maximized over, we have

	$\displaystyle\max_{{C_{i}}^{\prime}\setminus{S_{i,j}}^{\prime}}\beta(C_{i}^{\prime})$	$\displaystyle=\max_{{C_{i}}^{\prime}\setminus{S_{i,j}}^{\prime}}\max_{n.states}\beta(C_{i})$
		$\displaystyle=\max_{n.states}\max_{{C_{i}}^{\prime}\setminus{S_{i,j}}^{\prime}}\beta(C_{i})$
		$\displaystyle=\max_{n.states}\mu(S_{i,j})=\mu(S_{i,j}^{\prime})$

where $n.states$ denotes the states of $n$ . Similarly, $\max_{{C_{j}}^{\prime}\setminus{S_{i,j}}^{\prime}}\beta(C_{j}^{\prime})=\mu(S_{i,j}^{\prime})$ . Since this is true for all pairs of adjacent cliques, $CTF_{a}$ is also max-calibrated. Therefore, since re-parameterization is performed using calibrated clique and sepset beliefs, it results in a valid distribution.

Proof for Proposition 2 $CTF_{a}$ is obtained from $CTF_{in}$ , which is calibrated. Therefore, all cliques in a CT have the same maximum belief. After exact max-marginalization, the clique beliefs in $CTF_{a}$ are identical to the beliefs in $CTF_{in}$ , since the step involves collapsing all containing cliques before max-marginalization of beliefs. Both exact and local max-marginalizations involve maximizing clique beliefs over all states of variable. Thus, the the maximum belief and the within clique beliefs remain unaltered.

	$\displaystyle\beta(C)$	$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{12}\}\setminus C}\prod\limits_{v\in\mathcal{X}_{2}}\phi_{v}\cdot\frac{\prod\beta^{{}^{\prime}}}{\prod\mu^{{}^{\prime}}}$
		$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{12}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\cdot P^{\prime}(\mathcal{X}_{12})$
		$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{12}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\cdot\max\limits_{\mathcal{X}_{1}\setminus\mathcal{X}_{12}}Q(\mathcal{X}_{1})$
		$\displaystyle=\max\limits_{\{\mathcal{X}_{2},\mathcal{X}_{1}\}\setminus C}P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\cdot Q(\mathcal{X}_{1})$

	$\displaystyle P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ \mathcal{X}_{1},Par_{\mathcal{X}_{2}})$	$\displaystyle=P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})$
	$\displaystyle P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ \mathcal{X}_{1})$	$\displaystyle=P(\mathcal{X}_{2}\leavevmode\nobreak\ \|\leavevmode\nobreak\ Par_{\mathcal{X}_{2}})\leavevmode\nobreak\ \leavevmode\nobreak\ \because\leavevmode\nobreak\ Par_{\mathcal{X}_{2}}\subset\mathcal{X}_{1}$

MPE inference using an Incremental Build-Infer-Approximate Paradigm

Abstract

1 Introduction

2 Background

2.1 IBIA framework

3 Framework for MPE queries

Proposition 1.

Proposition 2.

3.1 Inference of M​a​x​M​a​r​gMaxMarg

Proposition 3.

Proof.

Theorem 1.

Proof.

3.2 MPE Decoding and complexity

4 Results

5 Conclusions

References

6 Supplementary Material

3.1 Inference of $MaxMarg$