Fast Decentralized Linear Functions Over
Edge Fluctuating Graphs

Siavash Mollaebrahim, , and Baltasar Beferull-Lozano S. Mollaebrahim and B. Lozano are with the Intelligent Signal Processing and Wireless Networks (WISENET) Center, University of Agder, Norway. e-mail:{siavash.mollaebrahim, baltasar.beferull}@uia.no

Abstract

Implementing linear transformations is a key task in the decentralized signal processing framework, which performs learning tasks on data sets distributed over multi-node networks. That kind of network can be represented by a graph. Recently, some decentralized methods have been proposed to compute linear transformations by leveraging the notion of graph shift operator, which captures the local structure of the graph. However, existing approaches have some drawbacks such as considering some special instances of linear transformations, or reducing the family of transformations by assuming that a shift matrix is given such that a subset of its eigenvectors spans the subspace of interest. In contrast, this paper develops a framework for computing a wide class of linear transformations in a decentralized fashion by relying on the notion of graph shift operator. The main goal of the proposed method is to compute the desired linear transformation in a small number of iterations. To this end, a set of successive graph shift operators is employed, then, a new optimization problem is proposed whose goal is to compute the desired transformation as fast as possible. In addition, usually, the topology of the networks, especially the wireless sensor networks, change randomly because of node failures or random links. In this paper, the effect of edge fluctuations on the performance of the proposed method is studied. To deal with the negative effect of edge fluctuations, an online kernel-based method is proposed which enables nodes to estimate the missed values with their at hand information. The proposed method can also be employed to sparsify the network graph or reduce the number of local exchanges between nodes, which saves sensors power in the wireless sensor networks.

Index Terms:

graph signal processing, decentralized linear transformations, graph filters.

I Introduction

Processing the data sets defined over non-regular domains is the main application of the graph signal processing (GSP) field. Those data sets are characterized by graphs, and traditional signal processing notions such as frequency analysis and sampling have been extended to process signals defined over graphs (graph signals), i.e. data associated with the nodes of the network [1, 2, 3]. Graph signals exist in different fields such as social networks and wireless sensor networks (WSN).

Recently, a number of decentralized methods have been proposed to compute some special instances of linear transformations, as a common application of decentralized signal processing. Projecting signals onto a low-dimensional subspace [4, 5] or interpolating bandlimited signals [6] are examples of those linear transformations. For instance, the authors in [7] propose a decentralized approach to implement the subspace projection task, which encompasses a number of learning tasks in WSN, such as estimation and denoising [8]. In the aforementioned approach [7], each node at every iteration linearly combines its iterate with the ones of its neighbors. Then, by optimizing a criterion that quantifies asymptotic convergence, the coefficients of those linear combinations are obtained. However, this algorithm can only accommodate a limited set of topologies [9], its convergence is asymptotic, and in general requires a relatively large number of local information exchanges. Those issues are addressed in [10, 11, 4] by proposing some decentralized GSP-based approaches for the average consensus task, which is a special case of subspace projection.

Moreover, some decentralized GSP-based approaches employ Graph filters (GFs) to compute the linear transformations. GFs are expressed by polynomials of the so-called graph shift matrices, which capture the local structure of the graph [2]. Applying GFs gives rise to decentralized implementations, which means that the nodes of the network can implement the graph filter by exchanging only local information with their neighbors. Graph filters have been used for different applications such as graph signal analysis [12, 2] reconstruction [13, 14, 6, 15] and denoising [16, 17, 5, 18].

In [17], GFs are designed to compute pre-specified linear transformations. Nevertheless, the GFs design in [17] for rank-1 projections or for cases, where projections share their eigenvectors with given graph shift matrices such as the Laplacian or adjacency matrices, which limits the possible linear functions that can be implemented. Moreover, the aforementioned method considers the design of GFs only for symmetric topologies. On the other hand, due to interference and background noise, WSNs are often characterized by asymmetric packet losses, leading to network topologies that can be viewed as asymmetric graphs [19, 20]. To approximate the linear transformations, a decentralized approach has been proposed based on the so-called edge-variant (EV) GFs in [21]. This method uses a different graph shift matrix in each iteration of the graph filtering which enables nodes to weigh the signal of their neighbors with different values. Furthermore, in [5] a decentralized subspace projection method via GFs for symmetric topologies has been proposed by optimizing a criterion that yields convergence to the subspace projection in a nearly minimal number of iterations.

In this paper, we propose an approach to compute the linear transformations as fast as possible i.e. in a small number of local exchanges. Our approach is based on a sequence of different shift operators are employed to compute the desired transformation matrix. In our method, the difference between the sequence of shift operators and the transformation matrix at each round is minimized. In order to decrease the required number of rounds which leads to fast convergence, a weighted-sum method is proposed, where larger weights are assigned to the last rounds to enforce the error of those rounds to be smaller.

In addition, edge fluctuation is a critical issue in the decentralized GSP-based approaches, and it negatively affects the performance of them. For instance, in WSN, a number of network edges are missed when edge fluctuations occur due to node failures. In this case, some nodes miss a portion of their neighbors’ information, which creates a deviation in the nodes’ exchange information procedure. The effect of graph perturbations on finite impulse response (FIR) graph filters has been investigated in [22, 23, 24, 25]. Those works analyze the effect of graph perturbation based on the random edge sampling (RES) graph model [23], where the activation probability of graph edges is assumed to be known.

In summary, the contributions that we provide, can be summarized as follows:

•

In contrast with [26, 5], where only symmetric topologies have been considered, the proposed method designs shift operators to compute the linear transformations over different topologies, including both symmetric and asymmetric graphs. Compared to [21], the proposed method needs a noticeably smaller number of local exchanges, leading to a faster convergence and computes the desired linear transformation exactly in scenarios where the method in [21] approximates them.
•

Furthermore, this work considers the issue of edge fluctuations on the problem of computing the linear transformations, which is not considered by existing works [26, 5, 21]. In this paper, the effect of edge fluctuation on the proposed approach is studied. Then, to deal with the negative effect of edge fluctuations, an efficient online method is proposed, where each node estimates the missing values (due to the fluctuation of the edges) with its at-hand information, by employing an online kernel-based learning approach. In the proposed method, each node estimates the missing values just by applying some simple local calculations.
•

Finally, the proposed estimator enables us to randomly remove a number of edges at each iteration to sparsify the graph, which gives rise to saving power of sensor nodes in WSN, increasing their lifetime, since they are usually empowered by batteries. By using our method, the sensor nodes spend less power to relay their information to their neighbours by decreasing the number of graph edges in each iteration. Indeed, for each node in each iteration, we can randomly remove some edges, and then by applying the proposed estimator, nodes can estimate their neighbours’ information corresponded to the removed edges online. This advantage can be further enhanced if the proposed estimators are trained during a certain number of iterations, so that the local exchanges between nodes stop, and instead the nodes update their information based on an estimate of their neighbours’ information. By this new approach, nodes power is saved since the number of local exchanges is decreased.

Notation: Bold uppercase (lowercase) letters denote matrices (column vectors), while $(.)^{\top}$ stands for matrix transposition. $({a})_{n}$ and $(\mathbf{A})_{n,n^{\prime}}$ denote the $n$ -th entry of $\mathbf{a}$ and the entries of $\mathbf{A}$ , respectively. Also, $\mathbf{1}$ is the all ones vector and $\mathbf{e}_{n}$ is the basis vector whose all entries are zero except the $n$ -th one which equals one. The indicator function and the Hadamard product are denoted by $\mathbbm{1}$ and $\circ$ , respectively. Moreover, $\Pi$ and $\mathrm{vec}$ represent the product of a collection of terms and the vectorization operation. Finally, $\text{trace}(\mathbf{A})$ and the Frobenius norm are denoted by $\mathrm{tr}(\mathbf{A})$ and $||.||_{F}$ , respectively.

II Preliminaries

II-A System Model

Consider $N$ networked sensor nodes which are able to exchange messages with their neighbour nodes to which they are connected. The network is modeled as a directed connected graph $\mathcal{G}(\mathcal{V},\mathcal{E})$ , where the vertex (node) set $\mathcal{V}:={1,2,\cdots,N}$ corresponds to the network nodes, and $\mathcal{E}$ represents the set of edges. The $n^{\prime}$ -th vertex $v_{n^{\prime}}$ is connected to $v_{n}$ if there is a directed edge from $v_{n^{\prime}}$ to $v_{n}$ $(v_{n^{\prime}},{v}_{n})\in\mathcal{E}$ , but this does not imply that $v_{n}$ is connected to $v_{n^{\prime}}$ unless $(v_{n},{v}_{n^{\prime}})\in\mathcal{E}$ . The in-neighborhood of the $n$ -th node is defined as the set of nodes connected to it, which can be denoted as $\mathcal{N}_{n}=({v_{n^{\prime}}|(v_{n^{\prime}},{v}_{n})\in\mathcal{E}}$ ). In the GSP context, vector $\mathbf{x}=[x_{1},x_{2},\cdots,x_{N}]^{\top}$ is refereed to as a graph signal where the $i$ -th component of $\mathbf{x}$ represents the signal value at the $i$ -th node of $\mathcal{G}$ .

II-B Decentralized linear transformations via graph filters

In this section, we briefly review implementing linear transformations via GFs. Finite impulse response (FIR) graph filtering [2] involves two steps, in the first step, a sequence of graph signals $\mathbf{x}^{(l)}=[x_{1}^{(l)},x_{2}^{(l)},\cdots,x_{N}^{(l)}]^{\top},l=0,\cdots,L$ is obtained via $L$ number of local exchanges between nodes where $\mathbf{x}^{(0)}=\mathbf{x}$ . Particularly, the graph signal at the $l$ -th iteration i.e. $\mathbf{x}^{(l)}$ is computed based on the graph signal of the previous iteration i.e. $\mathbf{x}^{(l-1)}$ such that $x_{n}^{(l)}=\sum_{n^{\prime}\in\mathcal{N}_{n}}s_{n,n^{\prime}}x_{n^{\prime}}^{(l-1)}$ where $s_{n,n^{\prime}}$ is the coefficient used for aggregation of information between the $n,n^{\prime}$ nodes. Thus, in matrix form, we have $\mathbf{x}^{(l)}=\mathbf{S}\mathbf{x}^{(l-1)}$ where $(\mathbf{S})_{n,n^{\prime}}=0$ if $({v}_{n},{v}_{n^{\prime}})\notin\mathcal{E}$ .

In the GSP context, $\mathbf{S}$ is called the shift operator [2, 27]. The graph shift operator is a decentralized operator because each node updates its information only via information of its neighbours. Examples of the graph shift operator are the adjacency and Laplacian matrices of $\mathcal{G}$ [17]. In the second step, outputs of all iterations computed in the first step are linearly aggregated as follows

\displaystyle\mathbf{y}=\mathbf{H}\mathbf{x}\overset{\Delta}{=}\sum_{l=0}^{L-1}c_{l}\mathbf{S}^{l}\mathbf{x}

(1)

where $\{c_{l}\}_{l=0}^{L-1}$ are the filter coefficients, and $L$ is the order of the filter [2]. The graph filter in (1) has been used in [17] to implement a pre-specified transformation matrix $\mathbf{T}$ . Alternatively, in [17] the graph filter (1) is designed only for symmetric topologies such that $\sum_{l=0}^{L-1}c_{l}\mathbf{S}^{l}=\mathbf{T}$ for rank-1 projections or for scenarios where $\mathbf{T}$ and the given shift matrix such as the Laplacian or the Adjacency matrix are simultaneously diagonaizable i.e. they share the same eigenvectors.

The concept of FIR graph filters is generalized in [21] by introducing the edge-variant (EV) graph filters where a different set of weights is used in each iteration. The edge weighting matrices share the same support with the adjacency matrix of the network graph. In [21], the edge-variant graph filter is used to approximate linear transformations.

III Proposed method: Successive shift operators

In this section, by using the notion of graph shift operator, we propose a new approach to implement $\mathbf{T}\mathbf{x}$ in a decentralized manner as fast as possible. To this end, a sequence of different shift operators is applied so that $\mathbf{S}_{l}\cdots\mathbf{S}_{2}\mathbf{S}_{1}\mathbf{x}=\mathbf{T}\mathbf{x}$ , where $\forall{\mathbf{S}_{i}},i=1,\cdots,l$ , we have $(\mathbf{S}_{i})_{nn^{\prime}}=0$ if $(v_{n^{\prime}},v_{n})\not\in\mathcal{E}$ .

As stated before, the proposed method leads to a decentralized approach. In fact, after the first round of information exchange among nodes, the nodes compute $\mathbf{x}^{(1)}:=\mathbf{S}_{1}\mathbf{x}$ . Then, at the second round, $\mathbf{x}^{(2)}:=\mathbf{S}_{2}\mathbf{x}^{(1)}=\mathbf{S}_{2}\mathbf{S}_{1}\mathbf{x}$ . The procedure is repeated for $l$ iterations. By properly designing $\{\mathbf{S}_{i}\}_{i=1}^{l}$ , $\mathbf{T}\mathbf{z}$ is exactly computed or $\Pi_{i=1}^{l}\mathbf{S}_{i}\mathbf{x}\approx\mathbf{T}\mathbf{x}$ is obtained with a small error.

To compute $\mathbf{T}$ as efficient as possible i.e. with the smallest number of iterations, we design the sequence of feasible graph shift matrices that minimize the number of local exchanges, i.e. $l$ by solving the following problem:


$\displaystyle\underset{\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{l}}{\text{min}}$	$\displaystyle l$	(2a)
s. t.	$\displaystyle(\mathbf{S}_{i})_{n,n^{\prime}}=0\hskip 5.69054pt$
	$\displaystyle\text{if}\hskip 5.69054pt(v_{n^{\prime}},v_{n})\not\in\mathcal{E},n,n^{\prime}=1,....,N,i=1,...,l$	(2b)
	$\displaystyle\mathbf{T}=\Pi_{i=1}^{l}\mathbf{S}_{i}$	(2c)

where constraint (2b) is added to the problem because the shift operators must satisfy the topological constraints of the graph connectivity.

Optimization problem (2) is intractable. In order to address this issue, an upper limit for $l$ is considered such that $l\in\{1,\cdots,L\}$ . Then, to design the shift operators that compute $\mathbf{T}$ via $\Pi_{l}\mathbf{S}_{l}$ as fast as possible, the error at each round can be minimized i.e. the error between $\mathbf{T}$ and $\Pi_{l}\mathbf{S}_{l}$ , which can be attained by solving the following optimization problem:


$\displaystyle\underset{\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{L}}{\text{min}}$	$\displaystyle\sum_{l=1}^{L}{{\left\\|\mathbf{T}-\Pi_{i=1}^{l}\mathbf{S}_{i}\right\\|}_{F}^{2}}$	(3a)
s. t.	$\displaystyle(\mathbf{S}_{i})_{n,n^{\prime}}=0\hskip 5.69054pt$
	$\displaystyle\text{if}\hskip 5.69054pt(v_{n^{\prime}},v_{n})\not\in\mathcal{E},n,n^{\prime}=1,....,N,i=1,...,L$	(3b)

In optimization problem (3), we put the same emphasis on all rounds, which means that all of them are treated equally. However, if we put more emphasise on the later rounds, we expect that a faster approach is obtained. By this approach, we accept larger error on the earlier rounds to obtain smaller error at the later ones, which can be implemented by assigning different weights to the terms in cost function (3a). To end this, larger weights are assigned to the later rounds. Thus, we have:


$\displaystyle\underset{\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{L}}{\text{min.}}$	$\displaystyle\sum_{l=1}^{L}\omega_{l}{{\left\\|\mathbf{T}-\Pi_{i=1}^{l}\mathbf{S}_{i}\right\\|}_{F}^{2}}$	(4a)
s. t.	$\displaystyle(\mathbf{S}_{i})_{n,n^{\prime}}=0\hskip 5.69054pt$
	$\displaystyle\text{if}\hskip 5.69054pt(v_{n^{\prime}},v_{n})\not\in\mathcal{E},n,n^{\prime}=1,....,N,i=1,...,L$	(4b)

where $\boldsymbol{\omega}=[\omega_{1},\omega_{2},\cdots,\omega_{L}]^{\top}$ is the weight vector whose entries are non-negative increasing, and such that $\sum_{l=1}^{L}\omega_{l}=1$ .

Optimization problem (4) is non-convex with respect to all its variables; however, the cost function (4a) is a strongly convex function with respect to each $\mathbf{S}_{i}$ separately, i.e. when the other blocks of variables are fixed. Therefore, we can apply the block coordinate descent (BCD) algorithm to solve the optimization problem (4). The convergence of the BCD algorithm is guaranteed for the optimization problem in (4) [28]. In this approach, all blocks of variables are fixed except one of them, and the optimization problem is solved with respect to the free block. This procedure continues until all the matrices $\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{L}$ are considered as the block variable of the optimization problem. The BCD algorithm is repeated until a stopping criterion is satisfied.

Here, we try to obtain the closed form solution of each round of the BCD algorithm, where one of shift matrices is the variable of optimization problem (4) and the rest of shift matrices are fixed, the following optimization problem is solved:

\displaystyle{J^{*}_{j}}\overset{\Delta}{=}\underset{\mathbf{S}_{j}=\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top}}{\text{Inf}}

\displaystyle{J_{j}}(\mathbf{S}_{j})

(5)

where ${J_{j}}(\mathbf{S}_{j})=\sum_{l=j}^{L}\omega_{l}{\left\|\mathbf{T}-\mathbf{S}_{l}\mathbf{S}_{l-1}\cdots\mathbf{S}_{j+1}\mathbf{S}_{j}\mathbf{S}_{j-1}\cdots\mathbf{S}_{1}\right\|}_{F}^{2}$ , $\mathbf{S}_{j}=\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top}$ satisfies constraint (4b), and $S_{n,n^{\prime}}^{(j)}$ is the entry that lies in the $n$ -th row and the $n^{\prime}$ -th column of $\mathbf{S}_{j}$ . Thus, we have:

\displaystyle{J^{*}_{j}}=\underset{\{{S}^{(j)}_{n,n^{\prime}}\}_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}}{\text{Inf}}

\displaystyle{J_{j}}(\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top})

(6)

The term inside of (6) can be expressed in vector form by applying the vectorization operator as follows:

\displaystyle\mathrm{vec}(\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top})=\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}(\mathbf{e}_{n^{\prime}}\otimes\mathbf{e}_{n})

(7)

where in (7), we use the fact that $\mathrm{vec}(\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top})=\mathbf{e}_{n^{\prime}}\otimes\mathbf{e}_{n}$ . Moreover, we have:

\displaystyle\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}(\mathbf{e}_{n^{\prime}}\otimes\mathbf{e}_{n})=\underbrace{[\mathbf{e}_{n^{\prime}_{1}}\otimes\mathbf{e}_{n_{1}},\cdots,\mathbf{e}_{n^{\prime}_{E}}\otimes\mathbf{e}_{n_{E}}]}_{\mathbf{E}}{\mathbf{s}^{(j)}}

(8)

where $\mathbf{s}^{(j)}=\left[\begin{array}[]{ccc}S_{n_{1},n^{\prime}_{1}}^{(j)}&\cdots&S_{n_{E},n^{\prime}_{E}}^{(j)}\end{array}\right]^{\top}$ . Therefore, optimization problem (6) can be represented as follows:

		$\displaystyle\underset{\{S^{(j)}_{n,n^{\prime}}\}_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}}{\text{inf}}J(\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top})=$
		$\displaystyle\underset{\mathbf{s}^{(j)}\in\mathbb{R}^{E}}{\text{inf}}J(\mathrm{vec}^{-1}(\mathbf{E}\mathbf{s}^{(j)}))$		(9)

Consequently, from (III), optimization problem (4) can be re-written as follows:

\displaystyle\underset{\mathbf{s}^{(j)}}{\text{inf}}

\displaystyle\sum_{l=j}^{L}\omega_{l}{{\left\|\mathbf{T}-\mathbf{S}_{l:j+1}\mathrm{vec}^{-1}(\mathbf{E}\mathbf{s}^{(j)}))\mathbf{S}_{j-1:1}\right\|}_{F}^{2}}

(10)

where $\mathbf{S}_{l:j}\overset{\Delta}{=}\mathbf{S}_{l}\mathbf{S}_{l-1}\cdots\mathbf{S}_{j}$ .

Finally, the vectorized form of (10) equals:

\displaystyle\underset{\mathbf{s}^{(j)}}{\text{inf}}

\displaystyle\sum_{l=j}^{L}\omega_{l}{{\left\|\mathrm{vec}(\mathbf{T})-(\mathbf{S}_{j-1:1}^{\top}\otimes\mathbf{S}_{l:j+1})\mathbf{E}\mathbf{s}^{(j)}\right\|}_{2}^{2}}

(11)

The following algorithm represents the approach for solving (11). For the stopping criterion $|f(\mathbf{S}_{L:1}^{(i+1)})-f(\mathbf{S}_{L:1}^{(i)})|/|f(\mathbf{S}_{L:1}^{(i)})|<\varepsilon$ is used for a sufficiently small $\varepsilon$ , where $f(\mathbf{S}_{L:1})$ equals cost function (3a), and $i$ denotes the $i$ -th iteration of the BCD algorithm.

input :

\mathcal{E},L,\omega_{1},\cdots,\omega_{L},\varepsilon

initialize :

\mathbf{S}_{2},\mathbf{S}_{3},\cdots,\mathbf{S}_{L}

1 while stopping criterion not satisfied do

2 for $j=1$ to $L$ do

3 fix

\mathbf{S}_{m},m=1,\cdots,L,m\neq{j}

, obtain

\mathbf{s}^{(j)}

via (11)

4 end for

6 end while

output :

\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{L}

Algorithm 1 Proposed solver

IV Edge fluctuation

IV-A Edge Fluctuation Model

In the previous section, we have assumed that all edges are fixed; however, in practical scenarios, there are fluctuations on edges due to different reasons such as random links or node failures in WSN, which leads to a random graph topology. In this subsection, the effect of edge fluctuations on the proposed method i.e. the successive method is studied. We consider that edge fluctuations occur randomly and independently across edges. Please note that in this section, to provide an analysis of the effect of edge fluctuations, we assume that the estimation of link activation probabilities is available. However, knowing the link activation probabilities is not required for our proposed method to deal with the edge fluctuations effect. Moreover, we assume that the spectral norm of all graph shift operators is upper-bounded i.e. $||\mathbf{S}_{i}||_{2}\leq\rho<\infty\quad\forall i=1,\cdots{L}$ . This assumption is practical because it implies that the graph of interest has finite edge weights [24].

When an edge is missed in a certain of iteration of the successive method, the corresponded destination node to the missing edge loses the information of its neighbor, which generates a deviation. The shift operator corresponding to the graph with perturbations can be modeled as $\hat{\mathbf{S}}=\mathbf{S}+\tilde{\mathbf{S}}$ where $\tilde{\mathbf{S}}$ is created because of edge fluctuations, and its non-zero entries equal the corresponding entries of $\mathbf{S}$ with a negative sign. Thus, the signal output for the first iteration of the successive method can be expressed

\displaystyle\mathbf{y}^{(1)}=(\mathbf{S}_{1}+\tilde{\mathbf{S}}_{1})\mathbf{x}=\mathbf{S}_{1}\mathbf{x}+\mathbf{z}_{1}

(12)

where $\mathbf{z}_{1}\overset{\Delta}{=}\tilde{\mathbf{S}}_{1}\mathbf{x}$ . We assume that $\mathbf{z}_{1}$ is a realization of a random process, with mean $\mathbf{m}_{z_{1}}$ and covariance matrix $\mathbf{\Sigma}_{z_{1}}$ . Note that $\tilde{\mathbf{S}}_{1}$ can be modeled by using the activation link probabilities i.e. $\mathbb{E}[\tilde{\mathbf{S}}_{1}]=(\mathbf{1}-\mathbf{P}_{\text{ac}})\circ(-\mathbf{S}_{1})$ , where $\mathbf{P}_{\text{ac}}$ is the link activation probability matrix. Thus, since $\mathbf{x}$ is fixed we have $\mathbf{m}_{z_{1}}=\mathbf{m}_{\tilde{\mathbf{S}}_{1}}\mathbf{x}$ , where $\mathbf{m}_{\tilde{\mathbf{S}}_{1}}=(\mathbf{P}_{\text{ac}}-\mathbf{1})\circ(\mathbf{S}_{1})$ is the mean of ${\tilde{\mathbf{S}}_{1}}$ . In addition, for the covariance of $\mathbf{z}_{1}$ , we have

	$\displaystyle\mathbf{\Sigma}_{z_{1}}=\mathbb{E}[\mathbf{z}_{1}\mathbf{z}_{1}^{\top}]-\mathbb{E}[\mathbf{z}_{1}]\mathbb{E}[\mathbf{z}_{1}^{\top}]$
	$\displaystyle\quad\quad=\mathbb{E}[\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}}_{1}^{\top}]-\mathbf{m}_{\mathbf{z}_{1}}\mathbf{m}_{\mathbf{z}_{1}}^{\top}$		(13)

The $ij$ -th element of $\mathbb{E}[\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}}_{1}^{\top}]$ equals $\mathbb{E}(\tilde{\mathbf{s}}_{1_{i}}^{\top}\mathbf{X}\tilde{\mathbf{s}}_{1_{j}})$ , where $\mathbf{X}=\mathbf{x}\mathbf{x}^{\top}$ and $\tilde{\mathbf{S}}_{1}=[\tilde{\mathbf{s}}_{1_{1}}|\tilde{\mathbf{s}}_{1_{2}}|\cdots|\tilde{\mathbf{s}}_{1_{N}}]^{\top}$ . From the cyclic property of trace, and the fact that trace is a linear operator, we have

		$\displaystyle(\mathbb{E}[\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}}_{1}^{\top}])_{i,j}=\mathbb{E}[\tilde{\mathbf{s}}_{1_{i}}^{\top}\mathbf{X}\tilde{\mathbf{s}}_{1_{j}}]=\mathrm{tr}(\mathbf{X}\mathbb{E}(\tilde{\mathbf{s}}_{1_{j}}\tilde{\mathbf{s}}_{1_{i}}^{\top}))$
		$\displaystyle=\mathrm{tr}(\mathbf{X}\mathbf{C}_{1_{{ji}}})$		(14)

where $\mathbf{C}_{1_{{ji}}}=\mathbb{E}[\mathbf{s}_{1_{j}}\mathbf{s}_{1_{i}}^{\top}]$ is the cross-correlation between $\tilde{\mathbf{s}}_{1_{j}},\tilde{\mathbf{s}}_{1_{i}}$ . Consequently:

\displaystyle(\mathbf{\Sigma}_{z_{1}})_{i,j}=\mathrm{tr}(\mathbf{X}\mathbf{C}_{1_{{ji}}})-(\mathbf{M}_{\mathbf{z}_{1}})_{i,j}\quad\forall{i,j=1,\cdots,N}

(15)

where $\mathbf{M}_{\mathbf{z}_{1}}=\mathbf{m}_{\mathbf{z}_{1}}\mathbf{m}_{\mathbf{z}_{1}}^{\top}$ . In the next iteration, we have:

\mathbf{y}^{(2)}=(\mathbf{S}_{2}+\tilde{\mathbf{S}}_{2})\mathbf{y}^{(1)}=\mathbf{S}_{2}\mathbf{S}_{1}\mathbf{x}+\mathbf{S}_{2}\mathbf{z}_{1}+\mathbf{z}_{2}

(16)

where $\mathbf{z}_{2}\overset{\Delta}{=}\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{x}+\tilde{\mathbf{S}}_{2}\mathbf{z}_{1}$ , which implies that $\mathbf{z}_{2}$ is correlated with $\mathbf{z}_{1}$ . Indeed, $\mathbf{m}_{\mathbf{z}_{2}}=\mathbf{m}_{\tilde{\mathbf{S}}_{2}}(\mathbf{S}_{1}+\mathbf{m}_{\tilde{\mathbf{S}}_{1}})\mathbf{x}$ and

		$\displaystyle\mathbf{\Sigma}_{\mathbf{z}_{2}}=\mathbb{E}[\mathbf{z}_{2}\mathbf{z}_{2}^{\top}]=\mathbb{E}[(\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{x}+\tilde{\mathbf{S}}_{2}\mathbf{z}_{1})$
		$\displaystyle\times(\mathbf{x}^{\top}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}+\mathbf{z}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top})]-\mathbf{m}_{\mathbf{z}_{2}}\mathbf{m}_{\mathbf{z}_{2}}^{\top}$		(17)

From $\mathbf{z}_{1}\overset{\Delta}{=}\tilde{\mathbf{S}}_{1}\mathbf{x}$ and (IV-A), we have that

	$\displaystyle\mathbf{\Sigma}_{\mathbf{z}_{2}}=\mathbb{E}[\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{x}\mathbf{x}^{\top}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}+\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}+$
	$\displaystyle\tilde{\mathbf{S}}_{2}\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}_{1}}^{\top}\tilde{\mathbf{S}}_{2}^{\top}+\tilde{\mathbf{S}}_{2}\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]-\mathbf{M}_{\mathbf{z}_{2}}$		(18)

where $\mathbf{M}_{\mathbf{z}_{2}}=\mathbf{m}_{\mathbf{z}_{2}}\mathbf{m}_{\mathbf{z}_{2}}^{\top}$ . By taking expectations and the linearity property of the expectation operator, we have:

	$\displaystyle\mathbf{\Sigma}_{\mathbf{z}_{2}}=\mathbb{E}[\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{X}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]+\mathbb{E}[\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{X}\mathbf{m}_{\tilde{\mathbf{S}}_{1}}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]+$
	$\displaystyle\mathbb{E}[\tilde{\mathbf{S}}_{2}\tilde{\mathbf{S}}_{1}\mathbf{X}\tilde{\mathbf{S}_{1}}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]+\mathbb{E}[\tilde{\mathbf{S}}_{2}\mathbf{m}_{\tilde{\mathbf{S}}_{1}}\mathbf{X}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]-\mathbf{M}_{\mathbf{z}_{2}}$		(19)

Similar to (IV-A) and (15), for each entry of $\mathbf{\Sigma}_{\mathbf{z}_{2}}$ , we have

	$\displaystyle(\mathbf{\Sigma}_{\mathbf{z}_{2}})_{i,j}=\mathrm{tr}((\mathbf{S}_{1}\mathbf{X}\mathbf{S}_{1}^{\top}+\mathbf{S}_{1}\mathbf{X}\mathbf{m}_{\tilde{\mathbf{S}}_{1}}^{\top}+\boldsymbol{\chi}_{1}+\mathbf{m}_{\tilde{\mathbf{S}}_{1}}\mathbf{X}\mathbf{S}_{1}^{\top})\mathbf{C}_{2_{{ji}}})$
	$\displaystyle-(\mathbf{M}_{\mathbf{z}_{2}})_{i,j}\quad\forall{i,j=1,\cdots,N}$		(20)

where $\boldsymbol{\chi}_{1}=\mathbb{E}[\tilde{\mathbf{S}}_{1}\mathbf{X}\tilde{\mathbf{S}}_{1}^{\top}]$ and $\mathbf{C}_{2_{{ji}}}$ is the cross-correlation between $\tilde{\mathbf{s}}_{2_{i}},\tilde{\mathbf{s}}_{2_{j}}$ . By using a similar approach to the one used in (IV-A), we have that $(\boldsymbol{\chi}_{1})_{i,j}=\mathrm{tr}(\mathbf{X}\mathbf{C}_{1_{{ji}}}),\forall{i,j=1,\cdots,N}$ .

After $L$ iterations, the output of the successive method under the edge fluctuation model equals:

\mathbf{y}=\mathbf{H}\mathbf{x}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}+\mathbf{z}_{L}

(21)

where $\mathbf{H}=\Pi_{i=1}^{L}\mathbf{S}_{i}$ . The first and second order statistics of $\mathbf{z}_{i},\forall{i=3,\cdots,L}$ can be obtained by the similar approach as the one used for $\mathbf{z}_{1},\mathbf{z}_{2}$ .

Next, we show that, under the edge fluctuation model, the mean square error (MSE) of the the successive approach output after $L$ iterations expressed as follows is upper bounded.

\displaystyle\mathbb{E}[||\boldsymbol{\Omega}||_{2}^{2}]=\mathbb{E}[||\mathbf{z}_{L}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L}\mathbf{S}_{j+1})\mathbf{z}_{i}||_{2}^{2}]

(22)

where $\boldsymbol{\Omega}=\mathbf{y}-\mathbf{H}\mathbf{x}$ .

Proposition 1: The MSE of the output of the successive approach in (21) under the edge fluctuation model is upper-bounded by:

		$\displaystyle\mathbb{E}[]\|\|\boldsymbol{\Omega}\|\|_{2}^{2}]\leq\mathbb{E}[\mathrm{tr}(\boldsymbol{\Psi})]+2\sum_{j=2}^{L}\rho^{L-j+1}\mathrm{tr}(\boldsymbol{\Sigma}_{z_{j-1}}+\mathbf{m}_{z_{j-1}})+$
		$\displaystyle\mathrm{tr}(\boldsymbol{\Sigma}_{z_{L}}+\mathbf{m}_{z_{L}})$		(23)

where $\boldsymbol{\Psi}=\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}\sum_{k=1,k\neq{i}}^{L-1}\mathbf{z}_{k}^{\top}(\Pi_{m=k}^{L-1}\mathbf{S}_{m+1})^{\top}\\ +\mathbf{z}_{L}\sum_{k=1}^{L-1}\mathbf{z}_{k}^{\top}(\Pi_{j=k}^{L-1}\mathbf{S}_{j+1})^{\top}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}\mathbf{z}_{L}^{\top}$ .

Proof.

Please see Appendix A. ∎

From Proposition 1, the MSE of the successive approach output under the edge fluctuation model is upper-bounded by $\rho$ and some other parameters i.e. $\boldsymbol{\Psi},\mathbf{m}_{z_{i}}$ and $\mathbf{\Sigma}_{z_{i}},\forall{i=1,\cdots,N}$ , which are dependent on the statistics of ${\tilde{\mathbf{S}}_{i}},\forall{i=1,\cdots,N}$ and the shift matrices based on Proposition 1.

IV-B Proposed method: the missing values estimator

In this section, to deal with the edge fluctuations effect, we focus to estimate the missing values via a kernel-based estimator. For this, the main task can be expressed as follows: given $\mathbf{x},\mathcal{G},\mathbf{H}$ , the goal is to estimate the missing values due to the edge fluctuations. To this end, an online kernel-based method is proposed such that the node whose neighbors’ information is missed, estimates the missing value based on its own available information.

Let us consider that in the $i$ -th iteration of the successive approach, the $n$ -th node misses the information from the $n^{\prime}$ -th node which is given by

\displaystyle x_{n^{\prime}}^{(i-1)}=\sum_{n^{\prime\prime}\in\mathcal{N}_{n^{\prime}}}({\mathbf{S}_{i})_{n^{\prime},n^{\prime\prime}}}x_{n^{\prime\prime}}^{(i-2)}+({\mathbf{S}_{i})_{n^{\prime},n^{\prime\prime}}}x_{n^{\prime}}^{(i-2)}

(24)

From (24), we can conclude that $x_{n^{\prime}}^{(i-2)}$ and $x_{n}^{(i-2)}$ are the most relevant information that is available for the $n$ -th node to estimate the missing value. Rest information of the $n$ -th node could be also informative since the $n$ and $n^{\prime}$ nodes can have one-hop or two-hops common neighbors. Moreover, due to the fact that the network is connected, all nodes are connected to each other with a number of hops. Consequently, in our method, the $n$ -th node utilizes $\mathbf{u}_{n}=[x_{n^{\prime}}^{(i-2)},x_{n}^{(i-2)},\mathbf{c}_{n}^{(i-2)}]^{\top}\in\mathbb{R}^{|\mathcal{N}_{n}|+1\times{1}}$ to estimate the missing value (information of the $n^{\prime}$ node in the $i-1$ -th iteration) where $\mathbf{c}_{n}$ is a vector that contains the neighbours of the $n$ -th node except the $n^{\prime}$ -th one.

To estimate the missing values, in supervised learning, an estimator function is sought via the training samples. To end this, flexible non-parametric models are needed for good results. A general class of models is based on functions of the form [29]

\displaystyle f(\mathbf{q})=\sum_{k=1}^{K}\alpha_{k}\mathcal{F}(\mathbf{q},\mathbf{q}_{k})

(25)

where $\mathcal{F}$ is a non-linear function, $\alpha_{1},\cdots,\alpha_{K}\in\mathbb{R}$ are coefficients, $\mathbf{q}_{1},\cdots,\mathbf{q}_{K}\in\mathbb{R}^{d}$ are the training samples and $K$ is the number of samples. Kernel methods are a famous example of an approach using functions of the form (25) [30].

In kernel-based learning, it is assumed that the estimator function belongs to a reproducing kernel Hilbert space (RKHS) expressed as [30]

\displaystyle\mathcal{H}:=\{f:f(\mathbf{q})=\sum_{k=1}^{K}\alpha_{k}\kappa(\mathbf{q},\mathbf{q}_{k}),\alpha_{k}\in\mathbb{R}\}

(26)

where $\kappa:\mathbb{R}^{N}\times\mathbb{R}^{N}\to\mathbb{R}$ is a pre-selected symmetric and positive definite function. Given $\mathbf{y}$ and matrix samples $\mathbf{Q}=[\mathbf{q}_{1},\cdots.\mathbf{q}_{K}]$ , the kernel-based estimate is usually obtained by solving the following problem:

\displaystyle\hat{f}=\underset{f\in\mathcal{H}}{\text{arg min}}\{1/K\sum_{k=1}^{K}\mathcal{L}(f(\mathbf{q}_{k})-y_{k})+\lambda||f||^{2}_{\mathcal{H}}\}

(27)

where $\mathcal{L}$ is a pre-selected cost function, for regression the least-squares is utilized. Moreover, $||f||^{2}_{\mathcal{H}}=\sum_{m=1}\sum_{m^{\prime}=1}\alpha_{m}\alpha_{m^{\prime}}\kappa(\mathbf{q}_{m}.\mathbf{q}_{m^{\prime}})$ denotes the RKHS norm used as a regularizer to deal with overfitting and $\lambda$ is a regularization parameter.

Due to the fact that $\mathcal{H}$ is infinite dimensional, $\hat{f}$ cannot be obtained directly by solving (27). However, based on the representer theorem [31], (27) admits a solution in form of

\displaystyle\hat{f}(\mathbf{q})=\sum_{k=1}^{K}\alpha_{k}\kappa(\mathbf{q},\mathbf{q}_{k}):=\boldsymbol{\alpha}^{\top}\mathbf{t}(\mathbf{q})

(28)

where $\mathbf{t}(\mathbf{q}):=[\kappa(\mathbf{q},\mathbf{q}_{1}),\cdots,\kappa(\mathbf{q},\mathbf{q}_{K})]^{\top}$ . By substituting (28) into (27), $\boldsymbol{\alpha}$ is obtained via solving the following problem:

\displaystyle\underset{\boldsymbol{\alpha}\in\mathbb{R}^{K}}{\text{ min}}\{1/K\sum_{k=1}^{K}\mathcal{L}(\boldsymbol{\alpha}^{\top}\mathbf{t}(\mathbf{q}_{k})-y_{k})+\lambda\boldsymbol{\alpha}^{\top}\boldsymbol{\Gamma}\boldsymbol{\alpha}\}

(29)

where $\boldsymbol{\Gamma}\in\mathbb{R}^{K\times{K}}$ whose $(i,i^{\prime})$ -th entry is $\kappa(\mathbf{q}_{i},\mathbf{q}_{i^{\prime}})$ .

In our proposed method, each node for each of its neighbours employs a function estimator to estimate the missing value. This can be obtained by using estimators in form of (28) such that the $i$ -th node employs the following estimator for each of its neighbours:

\displaystyle{f}_{ij}(\mathbf{u}_{i})=\sum_{m=1}^{K}\alpha_{m}^{(ij)}\kappa(\mathbf{u}_{i},\mathbf{u}_{m})

(30)

where $j\in\mathcal{N}_{i}$ , $\{\mathbf{u}_{m}\}_{m=1}^{K}$ is the training samples. The estimators are trained by solving optimization problem (29) that admits a closed-form solution as follows for kernel regression:

\displaystyle\hat{\boldsymbol{\alpha}}=(\boldsymbol{\Gamma}+\lambda{K}\mathbf{I})^{-1}\mathbf{y}

(31)

From (31), to obtain $\hat{\boldsymbol{\alpha}}$ , each node for each of its neighbors should invert a $K\times{K}$ matrix which is computationally heavy especially when the number of samples is high. Indeed, each node, when an edge fluctuation occurs, has to inverse a $K\times{K}$ matrix whose computation complexity equals $\mathcal{O}(K^{3})$ . Also, they have to save a high number of scalars, for instance, the $i$ -th node should save ${K}\times(|\mathcal{N}_{i}|+1)$ scalars.

To tackle this issue, we employ the random feature approach [32], which is an approximation of $\kappa(\mathbf{u},\mathbf{u}_{m})$ obtained from the shift-invariant kernels i.e. $\kappa(\mathbf{u}_{n},\mathbf{u}_{m})=\kappa(\mathbf{u}_{n}-\mathbf{u}_{m})$ . In this paper, we employ shift-invariant and bounded kernels. The examples of that kind of kernels are Gaussian, Laplacian and Cauchy kernels [32]. The Fourier transforms of the shift-invariant kernels denoted by $\Theta(\mathbf{w})$ exist in closed form as follows:

\displaystyle\kappa(\mathbf{u}_{n},\mathbf{u}_{m})=\int\Theta(\mathbf{w})e^{j\mathbf{w}(\mathbf{u}_{n}-\mathbf{u}_{m})}d\mathbf{w}

(32)

From the definition of the expected value, (32) equals $\mathbb{E}_{\mathbf{w}}[e^{j\mathbf{w}(\mathbf{u}_{n}-\mathbf{u}_{m})}]$ . The main idea of the random feature approach is that by drawing a sufficient number of samples $\{\mathbf{w}_{i=1}^{D}\}$ from $\Theta(\mathbf{w})$ , $\kappa(\mathbf{u}_{n},\mathbf{u}_{m})$ can be approximated by

\displaystyle\hat{\kappa}(\mathbf{u}_{n},\mathbf{u}_{m})=\boldsymbol{\Delta}_{\mathbf{W}}^{\top}(\mathbf{u}_{n})\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{m})

(33)

where $\mathbf{W}:=[\mathbf{w}_{1},\cdots,\mathbf{w}_{D}]$ and

		$\displaystyle\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u})=\frac{1}{\sqrt{D}}[\mathrm{sin}(\mathbf{w}_{1}^{\top}\mathbf{u}),\cdots,\mathrm{sin}(\mathbf{w}_{D}^{\top}\mathbf{u}),$
		$\displaystyle\mathrm{cos}(\mathbf{w}_{1}^{\top}\mathbf{u}),\cdots,\mathrm{cos}(\mathbf{w}_{D}^{\top}\mathbf{u})]^{\top}$		(34)

From (32) and taking expectation from both sides of (33) with respect to $\mathbf{W}$ , we can conclude that $\hat{\kappa}(.)$ is unbiased. Moreover, the variance of $\hat{\kappa}(.)$ is proportional with $1/D$ , which implies that it tends to zero when the number of random features tends to infinity [33].

By substituting (33) to (30), we have the following estimator

		$\displaystyle\hat{f}_{ij}(\mathbf{u}_{i})=\sum_{m=1}^{K}\alpha_{m}^{(ij)}\boldsymbol{\Delta}_{\mathbf{W}}^{\top}(\mathbf{u}_{m})\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}):=(\boldsymbol{\beta}^{(ij)})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i})$
		$\displaystyle\quad\forall{i=1,\cdots{N}},\forall{j\in\mathcal{N}_{i}}$		(35)

From (IV-B), $(\boldsymbol{\beta}^{(ij)})$ is obtained by solving the following problem [33]:

		$\displaystyle\underset{\boldsymbol{\beta}^{(ij)}\in\mathbb{R}^{2D}}{\text{ min}}\frac{1}{K}\sum_{m=1}^{K}\mathcal{C}((\boldsymbol{\beta}^{(ij)})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}^{(m)}),y_{m})$
		$\displaystyle\forall{i=1,\cdots{N}},\forall{j\in\mathcal{N}_{i}}$		(36)

where $\mathbf{u}_{i}^{(m)}$ denotes the $m$ -th training sample. Also, we have:

		$\displaystyle\mathcal{C}((\boldsymbol{\beta}^{(ij)})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}^{(m)}),y_{m})=\mathcal{L}((\boldsymbol{\beta}^{(ij)})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}^{(m)})-y_{m})$
		$\displaystyle+\lambda\|\|\boldsymbol{\beta}^{(ij)}\|\|_{2}^{2}$		(37)

where $||\boldsymbol{\beta}^{(ij)}||_{2}^{2}:=\sum_{n}\sum_{k}\alpha_{n}^{(ij)}\alpha_{k}^{(ij)}\boldsymbol{\Delta}_{\mathbf{W}}^{\top}(\mathbf{u}_{n})\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{k})\simeq||f||_{\mathcal{H}}^{2}$ . The random feature approach can be employed in an online manner where the nodes estimate the missing values online. In this approach, in each iteration of the successive approach denoted by $l$ , each node solves optimization problem (IV-B) via online gradient descent [34] to update its parameters which is expressed as follows:

		$\displaystyle\boldsymbol{\beta}^{(ij)}_{l+1}=\boldsymbol{\beta}^{(ij)}_{l}-\eta_{l}\nabla\mathcal{C}((\boldsymbol{\beta}^{(ij)}_{l})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}),\mathbf{x}_{j}^{l-1}))$
		$\displaystyle\forall{i=1,\cdots{N}},\forall{j\in\mathcal{N}_{i}}$		(38)

where $\eta_{l}$ is the step-size of gradient descent. In (IV-B), $y_{m}=x_{j}^{l-1}$ is used. Then, if the $i$ -th node misses the $j$ -th node information ( $j\in\mathcal{N}_{i}$ ), the $i$ -th node estimates the missing value via (IV-B). At the end of $L$ iteration, the complexity of (IV-B) is $\mathcal{O}({L}D)$ .

Please note that for the scenario, where edge fluctuations happen in the first iteration, the estimators are trained offline for just finding $\boldsymbol{\beta}^{(ij)}_{1},\forall{j\in\mathcal{N}_{i}}$ . Then, the offline trained parameters are used for initialization. For the rest of iterations, the parameters of the estimators are updated online via (IV-B). Finally, the proposed approach in this section can be used for also different types of graph filters such as FIR [2] or edge-variant graph filters [21].

V Conclusion

This paper proposes a new approach to compute the linear transformations as fast as possible by leveraging a set of successive different shift operators. To compute the desired transformation by a small number of iterations, a new optimization problem is proposed to design the shift operators. An exhaustive simulation study demonstrates that the proposed approach can effectively compute the desired transformation markedly faster than existing algorithms. Furthermore, this work studies the effect of edge fluctuations, which is overlooked by the existing works and a common issue in the wireless sensor networks due to the random links or node failures. To deal with that issue, an online learning method is proposed which enables nodes to estimate the missing values due to the edge fluctuations.

Appendix A Proof of Proposition 1

Proof.

From (22) and the fact that $||\mathbf{a}||_{2}^{2}=\mathrm{tr}(\mathbf{a}\mathbf{a}^{\top})$ , we have:

		$\displaystyle\mathbb{E}[\|\|\boldsymbol{\Omega}\|\|_{2}^{2}]=\mathbb{E}[\mathrm{tr}((\mathbf{z}_{L}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i})$
		$\displaystyle(\mathbf{z}^{\top}_{L}+\sum_{k=1}^{L-1}\mathbf{z}^{\top}_{k}(\Pi_{m=k}^{L-1}\mathbf{S}_{m+1})^{\top})]$		(39)

where $\boldsymbol{\Omega}=\mathbf{y}-\mathbf{H}\mathbf{x}$ . By expanding (A), we have:

	$\displaystyle\mathbb{E}[\|\|\boldsymbol{\Omega}\|\|_{2}^{2}]=\mathbb{E}[\mathrm{tr}(\boldsymbol{\Psi}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}\mathbf{z}_{i}^{\top}(\Pi_{m=i}^{L-1}\mathbf{S}_{m+1})^{\top}+$
	$\displaystyle\mathbf{z}_{L}\mathbf{z}_{L}^{\top})]$		(40)

By using the cyclic property of the trace and the fact that trace is a linear operator, we have:

\displaystyle\mathbb{E}[||\boldsymbol{\Omega}||_{2}^{2}]=\mathrm{tr}(\mathbb{E}[\mathbf{z}_{L}\mathbf{z}^{\top}_{L}]+\mathbb{E}[\boldsymbol{\Psi}]+\sum_{k=1}^{L-1}\boldsymbol{\Upsilon}_{k}^{\top}\boldsymbol{\Upsilon}_{k}\mathbb{E}[\mathbf{z}_{k}\mathbf{z}_{k}^{\top}])

(41)

where $\boldsymbol{\Upsilon}_{i}=\Pi_{m=i}^{L-1}\mathbf{S}_{m+1}$ . Finally, by applying the inequality $\mathrm{tr}(\mathbf{A}\mathbf{B})\leq||\mathbf{A}||_{2}\mathrm{tr}(\mathbf{B})$ (which holds for any square matrix $\mathbf{A}$ and positive semi-definite matrix $\mathbf{B}$ [35]) and the following properties of the spectral norm $||\mathbf{A}+\mathbf{B}||_{2}=||\mathbf{A}||_{2}+||\mathbf{B}||_{2}$ , $||\mathbf{A}\mathbf{B}||_{2}\leq||\mathbf{A}||_{2}||\mathbf{B}||_{2}$ , we have

		$\displaystyle\mathbb{E}[\|\|\boldsymbol{\Omega}\|\|_{2}^{2}]\leq\mathrm{tr}(\mathbb{E}[\boldsymbol{\Psi}])+2\sum_{j=2}^{L}\rho^{L-j+1}\mathrm{tr}(\mathbb{E}[\mathbf{z}_{j-1}\mathbf{z}_{j-1}^{\top}])+$
		$\displaystyle\mathrm{tr}(\mathbb{E}[\mathbf{z}_{L}\mathbf{z}_{L}^{\top}])$		(42)

From $\mathbb{E}(\mathbf{z}_{j-1}\mathbf{z}_{j-1}^{\top})=\boldsymbol{\Sigma}_{z_{j-1}}+\mathbf{m}_{z_{j-1}}$ and (A), we have:

	$\displaystyle\mathbb{E}[\|\|\boldsymbol{\Omega}\|\|_{2}^{2}]\leq\mathbb{E}[\mathrm{tr}(\boldsymbol{\Psi})]+2\sum_{j=2}^{L}\rho^{L-j+1}\mathrm{tr}(\boldsymbol{\Sigma}_{z_{j-1}}+\mathbf{m}_{z_{j-1}})+$
	$\displaystyle\mathrm{tr}(\boldsymbol{\Sigma}_{z_{L}}+\mathbf{m}_{z_{L}})$		(43)

∎

References

[1] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, vol. 30, pp. 83–98, May 2013.
[2] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs:frequency analysis,” IEEE Trans. on Signal Processing, vol. 61, pp. 1644–1656, Apr 2013.
[3] S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević, “Discrete signal processing on graphs: Sampling theory,” IEEE Transactions on Signal Processing, vol. 63, no. 24, pp. 6510–6523, 2015.
[4] S. Safavi and U. A. Khan, “Revisiting finite-time distributed algorithms via successive nulling of eigenvalues,” vol. 22, pp. 54–57, Jan. 2015.
[5] T. Weerasinghe, D. Romero, C. Asensio-Marco, and B. Beferull-Lozano, “Fast decentralized subspace projection via graph filtering,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Calgary,Canada), pp. 4639–4643, Apr. 2018.
[6] S. Segarra, A. G. Marques, G. Leus, and A. Ribeiro, “Reconstruction of graph signals through percolation from seeding nodes,” IEEE Transactions on Signal Processing, vol. 64, no. 16, pp. 4363–4378, 2016.
[7] S. Barbarossa, G. Scutari, and T. Battisti, “Distributed signal subspace projection algorithms with maximum convergence rate for sensor networks with topological constraints,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Taipei,Taiwan), pp. 2893–2896, Apr. 2009.
[8] P. Di Lorenzo, S. Barbarossa, and S. Sardellitti, “Distributed signal processing and optimization based on in-network subspace projections,” IEEE Transactions on Signal Processing, vol. 68, pp. 2061–2076, 2020.
[9] C. A.-M. F. Camaro-Nogues, D. Alonso-Roman and B. Beferull-Lozano, “Reducing the observation error in a wsn through a consensus-based subspace projection,” in WCNC, (Shanghai,China), pp. 3643–3648, Apr. 2013.
[10] A. Y. Kibangou, “Graph laplacian based matrix design for finite-time distributed average consensus,” in 2012 American Control Conference (ACC), pp. 1901–1906, 2012.
[11] A. Sandryhaila, S. Kar, and J. M. F. Moura, “Finite-time distributed consensus through graph filters,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Florence, Italy), pp. 1080–1084, May 2014.
[12] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “Vertex-frequency analysis on graphs,” Applied and Computational Harmonic Analysis, vol. 40, no. 2, pp. 260–291, 2016.
[13] S. K. Narang and A. Ortega, “Perfect reconstruction two-channel wavelet filter banks for graph structured data,” IEEE Transactions on Signal Processing, vol. 60, no. 6, pp. 2786–2799, 2012.
[14] B. Girault, Signal Processing on Graphs - Contributions to an Emerging Field. PhD thesis, École Normale Supérieure de Lyon, 2015.
[15] D. Romero, M. Ma, and G. B. Giannakis, “Kernel-based reconstruction of graph signals,” IEEE Transactions on Signal Processing, vol. 65, no. 3, pp. 764–778, 2017.
[16] M. Onuki, S. Ono, M. Yamagishi, and Y. Tanaka, “Graph signal denoising via trilateral filter on graph spectral domain,” IEEE Transactions on Signal and Information Processing over Networks, vol. 2, no. 2, pp. 137–148, 2016.
[17] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear network operators,” IEEE Trans. Sig. Process., vol. 65, pp. 4117–4131, May 2017.
[18] S. Mollaebrahim, C. Asensio-Marco, D. Romero, and B. Beferull-Lozano, “Decentralized subspace projection in large networks,” in Global Conference on Signal and Information Processing (GlobalSIP), (Anaheim,USA), pp. 788–792, Feb. 2018.
[19] M. Zamalloa and B. Krishnamachari, “An analysis of unreliability and asymmetry in low-power wireless links,” ACM Trans. Sen. Netw., vol. 3, p. 1–34, June 2007.
[20] L. Sang, A. Arora, and H. Zhang, “On link asymmetry and one-way estimation in wireless sensor networks,” ACM Trans. Sen. Netw., vol. 6, p. 12–25, Mar. 2010.
[21] M. Coutino, E. Isufi, and G. Leus, “Advances in distributed graph filtering,” IEEE Transactions on Signal Processing, vol. 67, pp. 2320–2333, May. 2019.
[22] E. Isufi and G. Leus, “Distributed sparsified graph filters for denoising and diffusion tasks,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5865–5869, 2017.
[23] E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Filtering random graph processes over random time-varying graphs,” IEEE Transactions on Signal Processing, vol. 65, no. 16, pp. 4406–4421, 2017.
[24] F. Gama, E. Isufi, A. Ribeiro, and G. Leus, “Controllability of bandlimited graph processes over random time varying graphs,” IEEE Transactions on Signal Processing, vol. 67, no. 24, pp. 6440–6454, 2019.
[25] L. B. Saad and B. Beferull-Lozano, “Accurate graph filtering in wireless sensor networks,” IEEE Internet of Things Journal, pp. 1–1, 2020.
[26] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology inference from spectral templates,” IEEE Trans. on Signal and Information Processing over Networks,, vol. 3, p. 467–483, Sep 2017.
[27] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: Frequency analysis,” IEEE Trans. on Signal Processing, vol. 62, pp. 3042–3054, Jun 2014.
[28] Y. Xu and W. Yin, “A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion,” SIAM Journal on Imaging Sciences, vol. 6, no. 3, pp. 1758–1789, 2013.
[29] V. N. Vapnik, Statistical Learning Theory. Wiley-Interscience, 1998.
[30] B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.
[31] B. Schölkopf, R. Herbrich, and A. J. Smola, “A generalized representer theorem,” in Proceedings of the 14th Annual Conference on Computational Learning Theory, p. 416–426, 2001.
[32] A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Advances in Neural Information Processing Systems 20, pp. 1177–1184, 2008.
[33] Y. Shen, T. Chen, and G. B. Giannakis, “Random feature-based online multi-kernel learning in environments with unknown dynamics,” J. Mach. Learn. Res., vol. 20, p. 773–808, Jan. 2019.
[34] E. Hazan, “Introduction to online convex optimization,” Found. Trends Optim., vol. 2, p. 157–325, Aug. 2016.
[35] Sheng-De Wang, Te-Son Kuo, and Chen-Fa Hsu, “Trace bounds on the solution of the algebraic matrix riccati and lyapunov equation,” IEEE Transactions on Automatic Control, vol. 31, no. 7, pp. 654–656, 1986.

Fast Decentralized Linear Functions Over Edge Fluctuating Graphs

Abstract

Index Terms:

I Introduction

II Preliminaries

II-A System Model

II-B Decentralized linear transformations via graph filters

III Proposed method: Successive shift operators

IV Edge fluctuation

IV-A Edge Fluctuation Model

Proof.

IV-B Proposed method: the missing values estimator

V Conclusion

Appendix A Proof of Proposition 1

Proof.

References

Fast Decentralized Linear Functions Over
Edge Fluctuating Graphs