This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Improved Algorithm for the Network Alignment Problem with Application to Binary Diffing

Elie Mengin Fabrice Rossi SAMM, EA 4543, Université Paris 1 Panthéon-Sorbonne, 75013 Paris Quarkslab SA, 13 rue Saint-Ambroise, 75011 Paris CEREMADE, CNRS, UMR 7534, Université Paris-Dauphine, PSL University, 75016 Paris
Abstract

In this paper, we present a novel algorithm to address the Network Alignment problem. It is inspired from a previous message passing framework of Bayati et al. [2] and includes several modifications designed to significantly speed up the message updates as well as to enforce their convergence. Experiments show that our proposed model outperforms other state-of-the-art solvers. Finally, we propose an application of our method in order to address the Binary Diffing problem. We show that our solution provides better assignment than the reference differs in almost all submitted instances and outline the importance of leveraging the graphical structure of binary programs.

keywords:
Network Alignment ; Graph Matching ; Belief Propagation ; Binary Diffing ; Binary Code Analysis

1 Introduction

The problem of finding a relevant one-to-one correspondence between the nodes of two graphs may be known as the Network Alignment problem (NAP). It has a wide variety of applications such as image recognition [27, 14, 30], ontology alignment [2], social network analysis [28, 16] or protein interaction analysis [15, 10]. Multiple formal definitions have been proposed. For instance, one may only consider the node content and aim at finding the mapping with maximum overall similarity score. Such formulation reduces to the Maximum Weight Matching problem (MWM) [4]. On the contrary, one may only focus on the graph topologies and search for the alignment with maximum induced overlapping edges. In this case, the problem is known as the Maximum Common Edge Subgraph problem (MCS) [1]. In this paper, we propose a mixed formulation: given any two directed attributed graphs AA and BB, and two measures of similarity on both nodes and edges, find the one-to-one mapping that maximizes the sum of similarity of both matched nodes and induced edges. In other words, we are seeking the assignment that maximizes a linear combination of MWM and MCS.

Several methods have been proposed to address this problem. Among them, NetAlign [2] introduces a complete message passing framework based on max-product belief propagation. In the present paper, we propose several modifications to this algorithm in order to significantly speed up the computation and to control the messages convergence. We show that these modifications provide better assignments than the original model as well as other state-of-the-art solvers in most problem instances. Finally, we show that our method could be used to efficiently retrieve the differences between two binary executables.

The rest of this paper is organized as follows. Section 2 introduces in more details the network alignment problem and reviews some existing solutions. The original model of NetAlign as well as our proposed optimizations are described in Section 3. Finally, Section 4 is dedicated to the experimental evaluation of our method.

2 Problem formulation

Let us consider any two directed attributed graphs A=(VA,EA)A=(V_{A},E_{A}) and B=(VB,EB)B=(V_{B},E_{B}), where VA={1,,n}V_{A}=\{1,\dots,n\} are the vertices of AA (resp. VB={1,,m}V_{B}=\{1^{\prime},\dots,m^{\prime}\} for BB) and EA={(i,j)|i,jVA2,ij}E_{A}=\{(i,j)|i,j\in V_{A}^{2},i\neq j\} are the edges of AA (resp. EB={(i,j)|i,jVB2,ij}E_{B}=\{(i^{\prime},j^{\prime})|i^{\prime},j^{\prime}\in V_{B}^{2},i^{\prime}\neq j^{\prime}\} for BB). Without loss of generality, we assume that none of the graph includes self-loops (they can be considered as node attributes if necessary).

We assume given two arbitrary non-negative measures of similarity on both nodes and edges, σV:i,iVA×VB+\sigma_{V}:i,i^{\prime}\in V_{A}\times V_{B}\to\mathbb{R}_{+} and σE:(i,j),(i,j)EA×EB+\sigma_{E}:(i,j),(i^{\prime},j^{\prime})\in E_{A}\times E_{B}\to\mathbb{R}_{+}. Using these measures, we may encode all pairwise node similarity scores into a flatten vector 𝐩+|VA|×|VB|\mathbf{p}\in\mathbb{R}_{+}^{|V_{A}|\times|V_{B}|} such that pii=σV(i,i)p_{ii^{\prime}}=\sigma_{V}(i,i^{\prime}), as well as the similarity of all potential induced edges into matrix Q+|VA|2×|VB|2Q\in\mathbb{R}_{+}^{|V_{A}|^{2}\times|V_{B}|^{2}} such that Qiijj=σE((i,j),(i,j))Q_{ii^{\prime}jj^{\prime}}=\sigma_{E}((i,j),(i^{\prime},j^{\prime})) if and only if (i,j)EA(i,j)\in E_{A} and (i,j)EB(i^{\prime},j^{\prime})\in E_{B}, and 0 otherwise. Finally, we describe any one-to-one node mapping through a binary vector 𝐱{0,1}|VA|×|VB|\mathbf{x}\in\{0,1\}^{|V_{A}|\times|V_{B}|} for which xii=1x_{ii^{\prime}}=1 if and only if node ii in AA is matched with node ii^{\prime} in BB.

Given these definitions, the network alignment problem consists in solving the following constrained quadratic program:

𝐱=argmax𝐱α𝐱T𝐩+(1α)𝐱TQ𝐱 subject to iVA,jVBxij1 and iVB,jVAxji1,\mathbf{x}^{*}=\underset{\mathbf{x}}{\arg\max}\enspace\alpha\mathbf{x}^{T}\mathbf{p}+(1-\alpha)\mathbf{x}^{T}Q\mathbf{x}\quad\text{ subject to }\enspace\forall i\in V_{A},\sum_{j^{\prime}\in V_{B}}x_{ij^{\prime}}\leq 1\text{ and }\forall i^{\prime}\in V_{B},\sum_{j\in V_{A}}x_{ji^{\prime}}\leq 1, (NAP)

where α[0,1]\alpha\in[0,1] is an arbitrary constant that determines the trade-off between node and edge similarity.

This problem is a generalization of the Quadratic Assignment problem and, as such, is known to be NP-complete and even APX-hard [23]. Though exact algorithms exist, they rapidly become intractable as the number of vertices rises [7]. In practice, the computation of the NAP for graphs of more than a hundred nodes must be approximated. In the rest of this section, we review some of the existing approaches proposed to address the problem.

Related work

Amongst the first approaches to approximate the NAP are the spectral methods that can be distinguished into two main categories. On one hand, spectral matching approaches are based on the idea that similar graphs share a similar spectrum [25]. Thus, they aim at best aligning the (leading) eigenvectors of the two affinity matrices (or Laplacians) [14, 21]. On the other hand, PageRank methods approximates the NAP through an eigenvalue problem over the matrix QQ [24]. The idea consists in computing the principal eigenvectors of QQ, and to use it as a similarity score of every possible correspondences. The resulting assignment can then be computed using conventional MWM solvers. Over the years, several improvements have been proposed to enhance the procedure [16, 20, 12, 28].

Other common approaches propose to directly address the quadratic program by means of relaxations. The most common convex relaxation consists in extending the solution set to the set of doubly stochastic matrices. The relaxed problem can then be exactly solved using convex-optimization solvers, and is finally projected into the set of permutation matrices to provide an integral assignment. However, when the solution of the convex program is far from the optimal permutation matrix, the final projection may result in an incorrect mapping [18]. Other approaches make use of a concave [27] or indefinite [26] relaxations. The induced programs are generally much harder to solve but yield better results when properly initialized. Note that most methods use a combination of both relaxations [30, 29].

Several other methods are based on a linearization of the NAP objective function. The idea is to reformulate the quadratic program into an equivalent linear program, and to solve it using conventional (mixed-integer) linear programming solvers. However, this reformulation usually requires the introduction of many new variables and constraints, and computing the exact solutions of the linear program may become prohibitively expensive. In most cases, relaxations must also be introduced. A successful method, based on Adams and Johnson linearization and using a Lagrangian dual relaxation have been proposed by Klau [15] and later improved by El-Kebir et al. [10].

Finally, a message passing framework has shown promising results [2]. It is directly derived from a previous model that provided important results on solving the MWM using max-product belief propagation [4]. In our work, we chose to apply this model to our use case.

Note that an important limitation to the NAP is the size of the matrix QQ that grows quartically with the size of the graphs. This memory requirement may become prohibitive for relatively large graphs encountered in many real-world problems. It is mainly due to the intrinsic nature of the problem which requires that every potential correspondence is evaluated with regards to other candidates, in order to take into account its topological consistency. In practice, most methods are designed to efficiently exploit the potential sparseness of the matrix QQ. Therefore, they generally apply to sparse graphs only. Moreover, several approaches propose to also restrain the number of potential candidates [10, 2] and thus the problem complexity. This pre-selection may rely on prior knowledge or on arbitrary decision rules. It mostly aims at preventing the algorithm to compute the assignment score of highly improbable correspondences. The framework we introduce in the next section make use of this interesting feature.

3 Network alignment via Max-Product Belief Propagation

In this section, we first recall the original model of Bayati et al. [2] and then introduce our proposed improvements. More details about the complete framework and practical implementation of NetAlign can be found in [3].

3.1 Original model

The most common reformulation of a constrained optimization program into probabilistic graphical models usually requires the introduction of a factor-graph which assigns maximum probability to the solution of the program. For more details about factor-graph graphical models, we refer the reader to Kschischang et al. [17]. In order to design such graphical model, we must encode both the objective function and the constraints of NAP into an equivalent probability distribution. This encoding is done through the factorization of several functions (function nodes) over the different variables of the program (variable nodes). In this form, the mode of the distribution, can be efficiently computed (at least approximated) using the max-product algorithm. In the following, we introduce the factor-graph designed to address NAP.

The factor-graph

We first define the set of variable nodes X={Xii{0,1},iiVA×VB}X=\{X_{ii^{\prime}}\in\{0,1\},ii^{\prime}\in V_{A}\times V_{B}\}. These variables record the correspondences belonging to the current mapping. We then introduce the different function nodes of our graphical model. We distinguish factors providing the energy to the objective function from factors encoding the program constraints.

On one hand, the objective function is encoded via two sets of function nodes cii:{0,1}+c_{ii^{\prime}}:\{0,1\}\rightarrow\mathbb{R}^{+} and ciijj:{0,1}2+c_{ii^{\prime}jj^{\prime}}:\{0,1\}^{2}\rightarrow\mathbb{R}^{+}, such that iiVA×VB,cii(xii)=eαxiipii\forall ii^{\prime}\in V_{A}\times V_{B},c_{ii^{\prime}}(x_{ii^{\prime}})=e^{\alpha x_{ii^{\prime}}p_{ii^{\prime}}}, and ii,jj(VA×VB)2,ciijj(xii,xjj)=e(1α)xiiQiijjxjj\forall ii^{\prime},jj^{\prime}\in\left(V_{A}\times V_{B}\right)^{2},c_{ii^{\prime}jj^{\prime}}(x_{ii^{\prime}},x_{jj^{\prime}})=e^{(1-\alpha)x_{ii^{\prime}}Q_{ii^{\prime}jj^{\prime}}x_{jj^{\prime}}}

On the other hand, the hard-constraints of NAP are encoded using {0,1}\{0,1\} Dirac measures fi:{0,1}|fi|{0,1}f_{i}:\{0,1\}^{|\partial f_{i}|}\rightarrow\{0,1\}, and similarly for gig_{i^{\prime}}, such that:

iVA,fi(xfi)={1, if jVBxij1.0, otherwise.\displaystyle\forall i\in V_{A},f_{i}(x_{\partial f_{i}})=\begin{cases}1,&\text{ if }\sum_{j^{\prime}\in V_{B}}x_{ij^{\prime}}\leq 1.\\ 0,&\text{ otherwise}.\end{cases} iVB,gi(xgi)={1, if jVAxji1.0, otherwise.\displaystyle\forall i^{\prime}\in V_{B},g_{i^{\prime}}(x_{\partial g_{i^{\prime}}})=\begin{cases}1,&\text{ if }\sum_{j\in V_{A}}x_{ji^{\prime}}\leq 1.\\ 0,&\text{ otherwise}.\end{cases}

where we denote xfi={xijx,jVB}x_{\partial f_{i}}=\{x_{ij^{\prime}}\in\textbf{x},j^{\prime}\in V_{B}\}, and similarly, xgi={xjix,jVA}x_{\partial g_{i^{\prime}}}=\{x_{ji^{\prime}}\in\textbf{x},j\in V_{A}\}.

By factorizing all the function nodes, we obtain the probability distribution of our factor-graph:

pX(x)=1Z[i=1nfi(xfi)j=1mgj(xgj)]eαxTp+(1α)xTQx,p_{X}(\textbf{x})=\frac{1}{Z}\left[\prod_{i=1}^{n}f_{i}(x_{\partial f_{i}})\prod_{j=1}^{m}g_{j}(x_{\partial g_{j}})\right]e^{\alpha\textbf{x}^{T}\textbf{p}+(1-\alpha)\textbf{x}^{T}Q\textbf{x}}, (1)

where the normalization constant ZZ denotes the partition function of the model. It is clear that the support of our model distribution (1) is equivalent to the set of feasible solutions in NAP. Furthermore, the vector x with maximum probability corresponds to the optimal solution of the NAP.

The message passing framework

The main interest of the factor-graph (1) is the ability to efficiently compute an approximation of its mode using the max-product algorithm. Following the message passing framework proposed by Pearl [22], and denoting by m(t)m^{(t)} the value of the message mm after tt iterations, we may apply the following updates [2] :

μXiifi(t+1)(xii)=λciiXii(t)(xii)λgiXii(t)(xii)jjλciijjXii(t)(xii)μXiigi(t+1)(xii)=λciiXii(t)(xii)λfiXii(t)(xii)jjλciijjXii(t)(xii)μXiiciijj(t+1)(xii,xjj)=λciiXii(t)(xii)λfiXii(t)(xii)λgiXii(t)(xii)kkjjλciikkXii(t)(xii)\displaystyle\begin{aligned} \mu_{X_{ii^{\prime}}\rightarrow f_{i}}^{(t+1)}(x_{ii^{\prime}})&=\lambda_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\lambda_{g_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\prod_{jj^{\prime}}\lambda_{c_{ii^{\prime}jj^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\\ \mu_{X_{ii^{\prime}}\rightarrow g_{i^{\prime}}}^{(t+1)}(x_{ii^{\prime}})&=\lambda_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\lambda_{f_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\prod_{jj^{\prime}}\lambda_{c_{ii^{\prime}jj^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\\ \mu_{X_{ii^{\prime}}\rightarrow c_{ii^{\prime}jj^{\prime}}}^{(t+1)}(x_{ii^{\prime}},x_{jj^{\prime}})&=\lambda_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\lambda_{f_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\lambda_{g_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\prod_{kk^{\prime}\neq jj^{\prime}}\lambda_{c_{ii^{\prime}kk^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\end{aligned}
λfiXii(t)(xii)=maxxfi\{ii}fi(xfi)jiμXijfi(t)(xij),λciiXii(t)(xii)=cii(xii)λgiXii(t)(xii)=maxxgi\{ii}gi(xgi)jiμXjigi(t)(xji)λciijjXii(t)(xii)=maxxjjciijj(xii,xjj)μXjjciijj(t)(xjj)\displaystyle\begin{aligned} \lambda_{f_{i}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})&=\max_{\textbf{x}_{\partial f_{i}\backslash\{ii^{\prime}\}}}f_{i}(\textbf{x}_{\partial f_{i}})\prod_{j^{\prime}\neq i^{\prime}}\mu_{X_{ij^{\prime}}\rightarrow f_{i}}^{(t)}(x_{ij^{\prime}}),&&\lambda_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})=c_{ii^{\prime}}(x_{ii^{\prime}})\\ \lambda_{g_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})&=\max_{\textbf{x}_{\partial g_{i^{\prime}}\backslash\{ii^{\prime}\}}}g_{i^{\prime}}(\textbf{x}_{\partial g_{i^{\prime}}})\prod_{j\neq i}\mu_{X_{ji^{\prime}}\rightarrow g_{i^{\prime}}}^{(t)}(x_{ji^{\prime}})&&\lambda_{c_{ii^{\prime}jj^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})=\max_{x_{jj^{\prime}}}c_{ii^{\prime}jj^{\prime}}(x_{ii^{\prime}},x_{jj^{\prime}})\mu_{X_{jj^{\prime}}\rightarrow c_{ii^{\prime}jj^{\prime}}}^{(t)}(x_{jj^{\prime}})\end{aligned}

Since xiix_{ii^{\prime}} are binary valued, computing any message λab(xii)\lambda_{a\rightarrow b}(x_{ii^{\prime}}) for xii=0x_{ii^{\prime}}=0 and xii=1x_{ii^{\prime}}=1 is redundant. Therefore, we may halve the computation cost, by only considering its log-ratio mab=logλab(1)λab(0)m_{a\rightarrow b}=\log{\frac{\lambda_{a\rightarrow b}(1)}{\lambda_{a\rightarrow b}(0)}}. Following this notation, it can be shown [2] that the messages from the variable nodes to the function nodes introduced above simplify to:

mfiXii(t)\displaystyle m_{f_{i}\rightarrow X_{ii^{\prime}}}^{(t)} =(maxkimXikfi(t))+,\displaystyle=-\left(\max_{k^{\prime}\neq i^{\prime}}m_{X_{ik^{\prime}}\rightarrow f_{i}}^{(t)}\right)_{+}, mciiXii(t)=αpii\displaystyle m_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}=\alpha p_{ii^{\prime}}
mgiXii(t)\displaystyle m_{g_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)} =(maxkimXkigi(t))+,\displaystyle=-\left(\max_{k\neq i}m_{X_{ki^{\prime}}\rightarrow g_{i^{\prime}}}^{(t)}\right)_{+}, mciijjXii(t)=((1α)Qiijj+mXjjciijj(t))+(mXjjciijj(t))+\displaystyle m_{c_{ii^{\prime}jj^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}=\left((1-\alpha)Q_{ii^{\prime}jj^{\prime}}+m_{X_{jj^{\prime}}\rightarrow c_{ii^{\prime}jj^{\prime}}}^{(t)}\right)_{+}-\left(m_{X_{jj^{\prime}}\rightarrow c_{ii^{\prime}jj^{\prime}}}^{(t)}\right)_{+}

where we use the notations: x+=max(0,x)x_{+}=\max(0,x). Consequently, the message-passing framework reduces to the following updates:

mXiifi(t+1)\displaystyle m_{X_{ii^{\prime}}\rightarrow f_{i}}^{(t+1)} =mciiXii(t)+mgiXii(t)+jjmciijjXii(t)\displaystyle=m_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}+m_{g_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}+\sum_{jj^{\prime}}m_{c_{ii^{\prime}jj^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)} (2)
mXiigi(t+1)\displaystyle m_{X_{ii^{\prime}}\rightarrow g_{i^{\prime}}}^{(t+1)} =mciiXii(t)+mfiXii(t)+jjmciijjXii(t)\displaystyle=m_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}+m_{f_{i}\rightarrow X_{ii^{\prime}}}^{(t)}+\sum_{jj^{\prime}}m_{c_{ii^{\prime}jj^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)} (3)
mXiiciijj(t+1)\displaystyle m_{X_{ii^{\prime}}\rightarrow c_{ii^{\prime}jj^{\prime}}}^{(t+1)} =mciiXii(t)+mfiXii(t)+mgiXii(t)+kkjjmciikkXii(t)\displaystyle=m_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}+m_{f_{i}\rightarrow X_{ii^{\prime}}}^{(t)}+m_{g_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}+\sum_{kk^{\prime}\neq jj^{\prime}}m_{c_{ii^{\prime}kk^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)} (4)

3.2 Proposed modifications

Solution assignment

After each message-passing iteration, the algorithm must compute the current best solution based on the updated messages. In their work, Bayati et al. [2] proposed several mechanisms, called ”rounding strategies”. Unfortunately, all of them require to address an instance of the MWM problem. Even worse, according to the authors, the best rounding strategy requires to solve exactly two MWM problems. Though this can be done in reasonable time, proceeding to this computational step after each iteration seriously slows down the algorithm.

To overcome this issue, we propose another simple assignment procedure based on the current estimated ”max-marginals”. In fact, following the notation introduced in Section 3.1, and referring to the computation rules of Pearl [22], we may estimate the max-marginal distribution of each variable node such that:

p^Xii(t)(xii)λciiXii(t)(xii)λfiXii(t)(xii)λgiXii(t)(xii)jjλciijjXii(t)(xii)\hat{p}_{X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\propto\lambda_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\lambda_{f_{i}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\lambda_{g_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})\prod_{jj^{\prime}}\lambda_{c_{ii^{\prime}jj^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})

After tt iterations, we may deduce the current best assignment from the sign of each max-marginal log-ratios logp^Xii(t)(1)p^Xii(t)(0)\log\frac{\hat{p}_{X_{ii^{\prime}}}^{(t)}(1)}{\hat{p}_{X_{ii^{\prime}}}^{(t)}(0)}:

X^ii=argmaxx{0,1}p^Xii(t)(xii)=sign(mciiXii(t)+mfiXii(t)+mgiXii(t)+jjmciijjXii(t))\hat{X}_{ii^{\prime}}=\operatorname*{argmax}_{x\in\{0,1\}}\hat{p}_{X_{ii^{\prime}}}^{(t)}(x_{ii^{\prime}})=\text{sign}\left(m_{c_{ii^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}+m_{f_{i}\rightarrow X_{ii^{\prime}}}^{(t)}+m_{g_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}+\sum_{jj^{\prime}}m_{c_{ii^{\prime}jj^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}\right)

Note that this mechanism may result in a partial mapping. Therefore, after the last iteration, i.e. when the updates converge or reach the maximum number of iterations, we propose to enhance the resulting assignment with less confident matches by solving a MWM problem on the estimated max-marginal log-ratio of each unmatched node X^ii\hat{X}_{ii^{\prime}}.

Auction based ϵ\epsilon-complementary slackness

A well known problem of the Max-Product algorithm when running on loopy graphical models is that it is not guaranteed to converge. In fact, it may fall into infinite loops and oscillate between few states [19]. Therefore, most implementations include a mechanism that enforces convergence [6, 13]. In their work, Bayati et al. [2] propose a damping factor to mitigate the updates over iterations. Once this damping is sufficiently low, the message updates become insignificant, and the algorithm converges.

In our work, we propose another mechanism based on the concept of ϵ\epsilon-complementary slackness [5]. This relaxation has been originally proposed for the Auction algorithm to address MWM instances that admit multiple optimal solutions. The idea is to prevent the saturation of the complementary slackness with a small constant ϵ\epsilon margin. This scheme not only breaks ties and ensures the convergence but also provably finds to the optimal solution for an ϵ\epsilon small enough [5]. Furthermore, for larger ϵ\epsilon values, it shows to generally provide near-optimal assignments in much less computation time. Though very similar in its MWM version (α=1\alpha=1), our model is quite different from an Auction algorithm in the general case. In order to adapt the idea of ϵ\epsilon-complementary slackness to our message-passing scheme, we propose the following modifications of updates (2) and (3):

mfiXii(t)=(maxkimXikfi(t))+1mXiifi(t)max kmXikfi(t)ϵ\displaystyle m_{f_{i}\rightarrow X_{ii^{\prime}}}^{(t)}=-\left(\max_{k^{\prime}\neq i^{\prime}}m_{X_{ik^{\prime}}\rightarrow f_{i}}^{(t)}\right)_{+}-1_{m_{X_{ii^{\prime}}\rightarrow f_{i}}^{(t)}\neq\underset{k^{\prime}}{\text{max }}m_{X_{ik^{\prime}}\rightarrow f_{i}}^{(t)}}\epsilon mgiXii(t)=(maxkimXkigi(t))+1mXiigi(t)max 𝑘mXkigi(t)ϵ\displaystyle m_{g_{i^{\prime}}\rightarrow X_{ii^{\prime}}}^{(t)}=-\left(\max_{k\neq i}m_{X_{ki^{\prime}}\rightarrow g_{i^{\prime}}}^{(t)}\right)_{+}-1_{m_{X_{ii^{\prime}}\rightarrow g_{i^{\prime}}}^{(t)}\neq\underset{k}{\text{max }}m_{X_{ki^{\prime}}\rightarrow g_{i^{\prime}}}^{(t)}}\epsilon

In our experiments, this mechanism shows to strongly favor the messages convergence and thus reduce the number of required running iterations. More importantly, this scheme tends to improve the overall final assignment score in many cases.

However, this relaxation suffers from an important drawback: the value of the introduced ϵ\epsilon must be chosen carefully. If set too small, the mechanism cannot fully play its part and the algorithm may reach a maximum number of iterations before converging. On the contrary, if ϵ\epsilon is too high, the algorithm tends to converge too quickly to a poor local optimum. In their work, Bertsekas [5] propose an iterative method, called ϵ\epsilon-scaling, to properly setup the relaxation. The idea consists in repeatedly decreasing ϵ\epsilon after the messages converged, until it reaches a small enough value, known to provide an optimal solution. In our work, we suggest the opposite scheme. The model starts with a rather small ϵ\epsilon that helps to softly break local ties. Then, as the algorithm iterates, we propose to rise the relaxation value each time the messages have not improved the current objective function for few iterations. As ϵ\epsilon rises, the messages are more and more likely to escape their local optimum and to fall into another better one. As soon as the current assignment improves, ϵ\epsilon is set back to its original value, such that the new local solutions can be carefully explored.

4 Evaluation

The proposed evaluation of our method, named QBinDiff, is twofold. We first analyze its performances as a NAP solver. To do so, we submit the exact same problems to several state of the art solvers, and compare the computed alignment scores, without any consideration about the underlying purpose of the problem instance. Then, we evaluate the relevance of our solution in order to address the binary diffing problem. Therefore, we compare the resulting mappings to ground truth assignments and evaluate each matching method with respect to accuracy metrics. In all our experiments, we ran our method with default parameters: ϵ=0.5\epsilon=0.5, and within a maximum number of 10001000 iterations.

4.1 Benchmark experiments

We compare our method to four state-of-the-art NAP solvers and their associated benchmarks (see Table 1): PATH [27], NetAlign [2], Natalie 2.0 [11], and Final [28]. All these solvers have been configured with their default parameters. In order to analyze the ability of each solver to provide proper solutions both in terms of node similarity and edge overlaps, we tested different values of the trade-off parameter α\alpha. Notice that some problems include a similarity score matrix with several zero entries. In some models (ours, NetAlign, Natalie), those entries are considered as unfeasable matches whereas they are legal correspondences in others. These models would thus optimize the problem on a subset of all possible one-to-one mappings.

Table 1: Description of our benchmark dataset.
Source AA BB |VA||V_{A}| |VB||V_{B}| |EA||E_{A}| |EB||E_{B}|
 
Zaslavskiy et al. [27] 1EWK 1U19 57 59 3192 2974
El-Kebir et al. [11] dmela scere 9459 5696 25635 31261
Zhang and Tong [28] flickr lastfm 15436 12974 32638 32298
Bayati et al. [2] lcsh wiki 1919 2000 3130 7808
Zhang and Tong [28] offline online 1118 3906 3022 16328
 

Our results show that our approach outperforms or nearly ties the other existing methods on every problems (see Table 2). It appears to provide better results on sparse graphs, while it may compute slightly suboptimal assignments on the densest one (1EWK-1U19). It also seems to be the best fitted to perform diffing at different arbitrary setting of the trade-off parameter α\alpha, even in the degenerated MCS case (α=0\alpha=0), unlike most other evaluated methods.

In terms of computing time, as expected, QBinDiff takes much less time to approximate the NAP than NetAlign. Regarding other solvers, both Natalie and Final run within comparable time while Path tends to be very expensive, and may become prohibitive for larger problem instances. Note that it was not able to provide a solution to the Flickr-Lastm problem when α=0.25\alpha=0.25 within 8 days, and was considered timed-out. Of course these timings depends on the quality of the implementation and should only be considered with respect to their order of magnitude. We used the implementations provided by the authors of other methods.

4.2 Binary Diffing experiments

In a second set of experiments, we propose to evaluate the relevance of our model in order to address the Binary Diffing problem. Given two binary executables AA and BB, this problem consists in retrieving the correspondence between the functions of AA and those of BB that best describes their semantic differences. This problem can be reduced to a network alignment problem over the call graphs of AA and BB.

The results of the alignment should be compared to some ground truth. We computed it as follows. We first downloaded the official repository of a program, then compiled the different available versions using GCC v7.5 for x86-64 target architecture and with -O3 optimization level. Once extracted, each binary was stripped to remove all symbols, then disassembled using IDA Pro v7.2111https://www.hex-rays.com/products/ida, and finally exported into a readable file with the help of BinExport222https://github.com/google/binexport. During the problem statement, only plain text functions determined during the disassembly process are considered. For each program, assuming that this extraction protocol provided us with nn different versions, we propose to evaluate our method in diffing all the n(n1)2\frac{n(n-1)}{2} possible pairs of different binaries.

For all these diffing instances, we finally had to extract the ground truth assignments. We proceed in two steps. We first manually determine what we think to be the function mapping that best describes the modifications between two successive program versions. This is done considering the binary symbols, as well as the explicit commit descriptions. Then, we deduce all the remaining pairwise diffing assignments by extrapolating the mappings from version to versions. Formally, if we encode the mapping between A1A_{1} and A2A_{2} into a boolean matrix MA1A2M_{A_{1}\to A_{2}} such that MA1A2ii=1{M_{A_{1}\to A_{2}}}_{ii^{\prime}}=1 if and only if function ii in A1A_{1} is paired with function ii^{\prime} in A2A_{2}, then, our extrapolating scheme simply consists in computing the diffing correspondence between AkA_{k} and AnA_{n} as follows: MAkAn=i=kn1MAiAi+1M_{A_{k}\to A_{n}}=\prod_{i=k}^{n-1}M_{A_{i}\to A_{i+1}}. We applied this extraction protocol to three well known open source programs, namely Zlib 333https://github.com/madler/zlib, Libsodium 444https://github.com/jedisct1/libsodium and OpenSSL 555https://github.com/openssl/openssl from which we collected respectively 18, 33 and 17 different binary versions and therefore 153, 528 and 136 diffing instances. Statistics describing our evaluation dataset are given in Table 3.

Table 2: Resulting objective scores of each solver on different benchmark problems. The last column records the average computing time in seconds.
Problem Matcher α=0.0\alpha=0.0 α=0.25\alpha=0.25 α=0.5\alpha=0.5 α=0.75\alpha=0.75 α=0.9\alpha=0.9 Time
 
1EWK-1U19 QBinDiff 2890.000 2183.366 1473.608 764.758 339.628 22.532
Final 2874.000 2169.750 1465.500 761.250 338.700 17.944
Natalie 2896.000 2172.496 1448.992 725.488 291.385 852.921
NetAlign 2896.000 2185.750 1474.016 765.023 338.700 1620.431
Path 2896.000 2169.750 1465.500 761.250 338.700 0.700
 
dmela-scere QBinDiff 255.000 347.937 441.554 543.832 616.475 32.784
Final 112.000 250.953 389.906 528.859 612.230 45.898
Natalie 174.000 163.109 142.558 126.837 119.156 8.677
NetAlign 224.000 332.401 431.732 543.442 615.557 115.397
Path 47.000 230.809 376.118 522.177 609.812 17951.033
 
flickr-lastfm QBinDiff 6144.000 6955.018 7775.695 8594.405 9118.512 412.266
Final 1980.000 3890.405 5800.810 7711.215 8857.458 156.849
Natalie 6012.000 5164.405 4282.425 3417.638 2896.056 94.092
NetAlign 5830.000 6742.407 7567.495 8427.405 9025.683 434531.487
Path 10.000 NA 4316.110 7403.845 8845.276 387114.102
 
lcsh-wiki QBinDiff 632.000 614.867 612.690 605.062 614.086 36.344
Final 574.000 585.217 596.435 607.652 614.383 20.302
Natalie 584.000 496.465 419.760 337.640 287.927 2.932
NetAlign 506.000 547.398 552.335 584.123 610.774 52.317
Path 238.000 385.085 462.935 539.480 594.963 4332.587
 
offline-online QBinDiff 2608.000 2138.206 1678.450 1103.162 812.036 95.696
Final 40.000 222.352 404.704 587.055 696.466 22.685
Natalie 198.000 315.160 327.387 392.081 446.390 1553.770
NetAlign 1772.000 1507.908 1352.685 1002.279 756.600 27971.596
Path 244.000 789.614 725.422 690.011 742.723 13881.264
 
Table 3: Description of our binary diffing dataset. The last five columns respectively record the number of different binary versions, the number of resulting diffing instances, the average number of functions and function calls and the average ratio of conserved functions in our manually extracted ground truth.
Program Versions Problems |V]¯\overline{|V]} |E]¯\overline{|E]} GT ratio
 
Zlib 18 153 153 235 0.96
Libsodium 33 528 589 701 0.79
OpenSSL 17 136 3473 18563 0.72
 

As a baseline, we chose to compare to the two most common diffing tools: BinDiff and Diaphora. BinDiff uses a matching algorithm close to VF2 [8] originally introduced to approximate the MCS whereas Diaphora proposes a different greedy assignment strategy that first matches the most similar functions and then searches for potential correspondences in the remaining ones. This mapping mechanism is known to provide 12\frac{1}{2}-approximate solutions to the MWM problem [9]. Note that, in order to compare with NAP solvers, and because, in general, binary diffing favors recall over precision, QBinDiff is designed to produce a complete assignment and does not include a mechanism to limit the mapping of very unlikely correspondences during computation.

Our experiments show that QBinDiff generally outperforms other matching approaches in both alignment score and recall (see Table 4). In fact, our method appears to perform clearly better at diffing more different programs, whereas it provides comparable solutions on similar binaries. This highlights that the local greedy matching strategies of both BinDiff and Diaphora are able to provide good solutions on simple cases but generalize poorly on more difficult problem instances. This results should be view as promising in the perspective of diffing much more different binaries.

We reproduced our experiments with different trade-off parameters α\alpha, in order to estimate which setup should be used such that the optimal assignment meets the ground truth. Our computations suggest that a trade-off around 0.75 is a fair choice though a slightly higher value could provide satisfying assignments as well, especially while diffing OpenSSL programs.

Table 4: Average normalized resulting scores for each matching method. Computing time in seconds.
Project Matcher Similarity Squares Objective Precision Recall Time
 
Zlib QBinDiff 0.929 0.807 0.888 0.955 0.995 0.245
BinDiff 0.921 0.765 0.869 0.943 0.975 1.239
Diaphora 0.863 0.655 0.792 0.978 0.940 10.494
Final 0.929 0.784 0.881 0.948 0.986 18.469
Natalie 0.587 0.812 0.663 0.901 0.603 0.354
NetAlign 0.929 0.807 0.888 0.955 0.995 21.935
Path 0.930 0.789 0.883 0.953 0.992 0.589
 
Libsodium QBinDiff 0.706 0.604 0.677 0.722 0.880 6.616
BinDiff 0.662 0.544 0.628 0.752 0.869 1.278
Diaphora 0.605 0.470 0.567 0.783 0.831 35.327
Final 0.710 0.455 0.636 0.686 0.829 30.264
Natalie 0.424 0.565 0.462 0.596 0.438 36.984
NetAlign 0.703 0.597 0.673 0.722 0.879 283.130
Path 0.700 0.519 0.648 0.696 0.836 61.784
 
OpenSSL QBinDiff 0.720 0.595 0.643 0.605 0.783 213.252
BinDiff 0.643 0.501 0.553 0.572 0.681 3.944
Diaphora 0.543 0.332 0.408 0.577 0.565 228.666
Final 0.719 0.355 0.487 0.476 0.627 264.397
Natalie 0.620 0.589 0.601 0.599 0.668 733.802
NetAlign 0.682 0.587 0.623 0.625 0.755 15193.922
Path 0.722 0.521 0.596 0.556 0.715 7667.840
 

4.3 Limitations

Though our method improves the state-of-the-art, some difficulties remain.

A first limitation of our approach concerns the design of the network alignment itself. Indeed, the determination of the trade-off parameter α\alpha is subject to arbitrary considerations and may have an important impact on the resulting solution. Moreover, this trade-off certainly depends on the density of the graphs since densest graphs mechanically include more potential edge overlaps. Finally, the proposed model is designed to compute complete assignments and therefore does not include any mechanism to optimize the precision. This later could be done in introducing a penalty term ζ\zeta to the node similarity scores such pii=σV(i,i)ζp_{ii^{\prime}}=\sigma_{V}(i,i^{\prime})-\zeta. Correspondences with negative similarity scores would thus belong to the final mapping if they induce enough topological similarity.

Another difficulty of our approach is the determination of the relaxation parameter ϵ\epsilon. As previously mentioned, its setting controls on a trade-off between the quality of the solution and the speed at which the messages converges. While the proposed solution gives very satisfactory results, other rising schemes could be used to adapt the parameter to the current solution during computation. We leave this investigations to future work.

Finally, our formulation is limited by its memory consumption associated to the use of a quartic memory matrix QQ. Though the proposed model enables to significantly reduce the problem size by limiting the solution set to the most probable correspondences, this relaxation inevitably induces information loss, especially for large graphs where the relaxation must rise consequently. In practice, graphs of several thousands of nodes can be handled efficiently. For larger instances, a solution could consists in first partitioning the graphs into smaller consistent subgraphs, and then proceed the matching among them. However such partition is not trivial and might result in important alignment errors.

5 Conclusion

In this paper, we introduced a new algorithm to address the network alignment problem. It leverages a previous model and includes new mechanism to enforce message convergence as well as speed-up the computation. Moreover, it proved to provide better assignments than the original version in almost all alignment instances.

Our evaluation showed that our approach outperforms state of the art solvers. It also appears to be very well fitted to compute proper solutions for different trade-offs between node similarity and edge overlaps. We finally proposed an application of our method to address the binary diffing problem. Our experiments showed that our algorithm provides better assignments than other existing approaches for most diffing instances. This result suggests that the formulation of the binary diffing problem as a network alignment problem is the correct approach.

References

  • Bahiense et al. [2012] Bahiense, L., Manić, G., Piva, B., De Souza, C.C., 2012. The maximum common edge subgraph problem: A polyhedral investigation. Discrete Applied Mathematics 160, 2523–2541.
  • Bayati et al. [2009] Bayati, M., Gerritsen, M., Gleich, D.F., Saberi, A., Wang, Y., 2009. Algorithms for Large, Sparse Network Alignment Problems, in: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, IEEE Computer Society, USA. pp. 705–710.
  • Bayati et al. [2013] Bayati, M., Gleich, D.F., Saberi, A., Wang, Y., 2013. Message-Passing Algorithms for Sparse Network Alignment. ACM Transactions on Knowledge Discovery from Data (TKDD) 7, 3:1–3:31.
  • Bayati et al. [2005] Bayati, M., Shah, D., Sharma, M., 2005. Maximum weight matching via max-product belief propagation, in: Proceedings. International Symposium on Information Theory, 2005. ISIT 2005., pp. 1763–1767.
  • Bertsekas [1992] Bertsekas, D.P., 1992. Auction algorithms for network flow problems: A tutorial introduction. Computational Optimization and Applications 1, 7–66.
  • Braunstein and Zecchina [2006] Braunstein, A., Zecchina, R., 2006. Learning by Message Passing in Networks of Discrete Synapses. Physical Review Letters 96, 030201.
  • Burkard et al. [1998] Burkard, R.E., Çela, E., Pardalos, P.M., Pitsoulis, L.S., 1998. The Quadratic Assignment Problem, in: Du, D.Z., Pardalos, P.M. (Eds.), Handbook of Combinatorial Optimization: Volume1–3. Springer US, Boston, MA, pp. 1713–1809.
  • Cordella et al. [1998] Cordella, L.P., Foggia, P., Sansone, C., Vento, M., 1998. Subgraph Transformations for the Inexact Matching of Attributed Relational Graphs, in: Jolion, J.M., Kropatsch, W.G. (Eds.), Graph Based Representations in Pattern Recognition, Springer, Vienna. pp. 43–52.
  • Duan and Pettie [2014] Duan, R., Pettie, S., 2014. Linear-Time Approximation for Maximum Weight Matching. Journal of the ACM 61, 1:1–1:23.
  • El-Kebir et al. [2011] El-Kebir, M., Heringa, J., Klau, G.W., 2011. Lagrangian relaxation applied to sparse global network alignment, in: Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics, Springer-Verlag, Delft, The Netherlands. pp. 225–236.
  • El-Kebir et al. [2015] El-Kebir, M., Heringa, J., Klau, G.W., 2015. Natalie 2.0: Sparse Global Network Alignment as a Special Case of Quadratic Assignment. Algorithms 8, 1035–1051.
  • Feizi et al. [2020] Feizi, S., Quon, G., Recamonde-Mendoza, M., Médard, M., Kellis, M., Jadbabaie, A., 2020. Spectral Alignment of Graphs. IEEE Transactions on Network Science and Engineering 7, 1182–1197.
  • Frey and Dueck [2007] Frey, B.J., Dueck, D., 2007. Clustering by Passing Messages Between Data Points. Science 315, 972–976.
  • Horaud et al. [2011] Horaud, R., Forbes, F., Yguel, M., Dewaele, G., Zhang, J., 2011. Rigid and Articulated Point Registration with Expectation Conditional Maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 587–602.
  • Klau [2009] Klau, G.W., 2009. A new graph-based method for pairwise global network alignment. BMC Bioinformatics 10, S59.
  • Kollias et al. [2012] Kollias, G., Mohammadi, S., Grama, A., 2012. Network Similarity Decomposition (NSD): A Fast and Scalable Approach to Network Alignment. IEEE Transactions on Knowledge and Data Engineering 24, 2232–2243.
  • Kschischang et al. [2006] Kschischang, F.R., Frey, B.J., Loeliger, H.A., 2006. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47, 498–519.
  • Lyzinski et al. [2016] Lyzinski, V., Fishkind, D.E., Fiori, M., Vogelstein, J.T., Priebe, C.E., Sapiro, G., 2016. Graph Matching: Relax at Your Own Risk. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 60–73.
  • Murphy et al. [1999] Murphy, K.P., Weiss, Y., Jordan, M.I., 1999. Loopy belief propagation for approximate inference: an empirical study, in: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., Stockholm, Sweden. pp. 467–475.
  • Nassar et al. [2018] Nassar, H., Veldt, N., Mohammadi, S., Grama, A., Gleich, D.F., 2018. Low Rank Spectral Network Alignment, in: Proceedings of the 2018 World Wide Web Conference, International World Wide Web Conferences Steering Committee, Lyon, France. pp. 619–628.
  • Patro and Kingsford [2012] Patro, R., Kingsford, C., 2012. Global network alignment using multiscale spectral signatures. Bioinformatics 28, 3105–3114.
  • Pearl [1982] Pearl, J., 1982. Reverend bayes on inference engines: a distributed hierarchical approach, in: Proceedings of the Second AAAI Conference on Artificial Intelligence, AAAI Press, Pittsburgh, Pennsylvania. pp. 133–136.
  • Sahni and Gonzalez [1976] Sahni, S., Gonzalez, T., 1976. P-Complete Approximation Problems. Journal of the ACM (JACM) 23, 555–565.
  • Singh et al. [2008] Singh, R., Xu, J., Berger, B., 2008. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences 105, 12763–12768.
  • Umeyama [1988] Umeyama, S., 1988. An Eigendecomposition Approach to Weighted Graph Matching Problems. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 695–703.
  • Vogelstein et al. [2014] Vogelstein, J.T., Conroy, J.M., Lyzinski, V., Podrazik, L.J., Kratzer, S.G., Harley, E.T., Fishkind, D.E., Vogelstein, R.J., Priebe, C.E., 2014. Fast Approximate Quadratic Programming for Large (Brain) Graph Matching. arXiv:1112.5507 [cs, math, q-bio] .
  • Zaslavskiy et al. [2009] Zaslavskiy, M., Bach, F., Vert, J.P., 2009. A Path Following Algorithm for the Graph Matching Problem. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2227–2242.
  • Zhang and Tong [2016] Zhang, S., Tong, H., 2016. FINAL: Fast Attributed Network Alignment, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, NY, USA. pp. 1345–1354.
  • Zhang et al. [2019] Zhang, Z., Xiang, Y., Wu, L., Xue, B., Nehorai, A., 2019. KerGM: Kernelized Graph Matching, in: Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F.d., Fox, E., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 32. Curran Associates, Inc., pp. 3335–3346.
  • Zhou [2012] Zhou, F., 2012. Factorized graph matching, in: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, USA. pp. 127–134.