This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fast Decentralized Linear Functions Over
Edge Fluctuating Graphs

Siavash Mollaebrahim, , and Baltasar Beferull-Lozano S. Mollaebrahim and B. Lozano are with the Intelligent Signal Processing and Wireless Networks (WISENET) Center, University of Agder, Norway. e-mail:{siavash.mollaebrahim, baltasar.beferull}@uia.no
Abstract

Implementing linear transformations is a key task in the decentralized signal processing framework, which performs learning tasks on data sets distributed over multi-node networks. That kind of network can be represented by a graph. Recently, some decentralized methods have been proposed to compute linear transformations by leveraging the notion of graph shift operator, which captures the local structure of the graph. However, existing approaches have some drawbacks such as considering some special instances of linear transformations, or reducing the family of transformations by assuming that a shift matrix is given such that a subset of its eigenvectors spans the subspace of interest. In contrast, this paper develops a framework for computing a wide class of linear transformations in a decentralized fashion by relying on the notion of graph shift operator. The main goal of the proposed method is to compute the desired linear transformation in a small number of iterations. To this end, a set of successive graph shift operators is employed, then, a new optimization problem is proposed whose goal is to compute the desired transformation as fast as possible. In addition, usually, the topology of the networks, especially the wireless sensor networks, change randomly because of node failures or random links. In this paper, the effect of edge fluctuations on the performance of the proposed method is studied. To deal with the negative effect of edge fluctuations, an online kernel-based method is proposed which enables nodes to estimate the missed values with their at hand information. The proposed method can also be employed to sparsify the network graph or reduce the number of local exchanges between nodes, which saves sensors power in the wireless sensor networks.

Index Terms:
graph signal processing, decentralized linear transformations, graph filters.

I Introduction

Processing the data sets defined over non-regular domains is the main application of the graph signal processing (GSP) field. Those data sets are characterized by graphs, and traditional signal processing notions such as frequency analysis and sampling have been extended to process signals defined over graphs (graph signals), i.e. data associated with the nodes of the network [1, 2, 3]. Graph signals exist in different fields such as social networks and wireless sensor networks (WSN).

Recently, a number of decentralized methods have been proposed to compute some special instances of linear transformations, as a common application of decentralized signal processing. Projecting signals onto a low-dimensional subspace [4, 5] or interpolating bandlimited signals [6] are examples of those linear transformations. For instance, the authors in [7] propose a decentralized approach to implement the subspace projection task, which encompasses a number of learning tasks in WSN, such as estimation and denoising [8]. In the aforementioned approach [7], each node at every iteration linearly combines its iterate with the ones of its neighbors. Then, by optimizing a criterion that quantifies asymptotic convergence, the coefficients of those linear combinations are obtained. However, this algorithm can only accommodate a limited set of topologies [9], its convergence is asymptotic, and in general requires a relatively large number of local information exchanges. Those issues are addressed in [10, 11, 4] by proposing some decentralized GSP-based approaches for the average consensus task, which is a special case of subspace projection.

Moreover, some decentralized GSP-based approaches employ Graph filters (GFs) to compute the linear transformations. GFs are expressed by polynomials of the so-called graph shift matrices, which capture the local structure of the graph [2]. Applying GFs gives rise to decentralized implementations, which means that the nodes of the network can implement the graph filter by exchanging only local information with their neighbors. Graph filters have been used for different applications such as graph signal analysis [12, 2] reconstruction [13, 14, 6, 15] and denoising [16, 17, 5, 18].

In [17], GFs are designed to compute pre-specified linear transformations. Nevertheless, the GFs design in [17] for rank-1 projections or for cases, where projections share their eigenvectors with given graph shift matrices such as the Laplacian or adjacency matrices, which limits the possible linear functions that can be implemented. Moreover, the aforementioned method considers the design of GFs only for symmetric topologies. On the other hand, due to interference and background noise, WSNs are often characterized by asymmetric packet losses, leading to network topologies that can be viewed as asymmetric graphs [19, 20]. To approximate the linear transformations, a decentralized approach has been proposed based on the so-called edge-variant (EV) GFs in [21]. This method uses a different graph shift matrix in each iteration of the graph filtering which enables nodes to weigh the signal of their neighbors with different values. Furthermore, in [5] a decentralized subspace projection method via GFs for symmetric topologies has been proposed by optimizing a criterion that yields convergence to the subspace projection in a nearly minimal number of iterations.

In this paper, we propose an approach to compute the linear transformations as fast as possible i.e. in a small number of local exchanges. Our approach is based on a sequence of different shift operators are employed to compute the desired transformation matrix. In our method, the difference between the sequence of shift operators and the transformation matrix at each round is minimized. In order to decrease the required number of rounds which leads to fast convergence, a weighted-sum method is proposed, where larger weights are assigned to the last rounds to enforce the error of those rounds to be smaller.

In addition, edge fluctuation is a critical issue in the decentralized GSP-based approaches, and it negatively affects the performance of them. For instance, in WSN, a number of network edges are missed when edge fluctuations occur due to node failures. In this case, some nodes miss a portion of their neighbors’ information, which creates a deviation in the nodes’ exchange information procedure. The effect of graph perturbations on finite impulse response (FIR) graph filters has been investigated in [22, 23, 24, 25]. Those works analyze the effect of graph perturbation based on the random edge sampling (RES) graph model [23], where the activation probability of graph edges is assumed to be known.

In summary, the contributions that we provide, can be summarized as follows:

  • In contrast with [26, 5], where only symmetric topologies have been considered, the proposed method designs shift operators to compute the linear transformations over different topologies, including both symmetric and asymmetric graphs. Compared to [21], the proposed method needs a noticeably smaller number of local exchanges, leading to a faster convergence and computes the desired linear transformation exactly in scenarios where the method in [21] approximates them.

  • Furthermore, this work considers the issue of edge fluctuations on the problem of computing the linear transformations, which is not considered by existing works [26, 5, 21]. In this paper, the effect of edge fluctuation on the proposed approach is studied. Then, to deal with the negative effect of edge fluctuations, an efficient online method is proposed, where each node estimates the missing values (due to the fluctuation of the edges) with its at-hand information, by employing an online kernel-based learning approach. In the proposed method, each node estimates the missing values just by applying some simple local calculations.

  • Finally, the proposed estimator enables us to randomly remove a number of edges at each iteration to sparsify the graph, which gives rise to saving power of sensor nodes in WSN, increasing their lifetime, since they are usually empowered by batteries. By using our method, the sensor nodes spend less power to relay their information to their neighbours by decreasing the number of graph edges in each iteration. Indeed, for each node in each iteration, we can randomly remove some edges, and then by applying the proposed estimator, nodes can estimate their neighbours’ information corresponded to the removed edges online. This advantage can be further enhanced if the proposed estimators are trained during a certain number of iterations, so that the local exchanges between nodes stop, and instead the nodes update their information based on an estimate of their neighbours’ information. By this new approach, nodes power is saved since the number of local exchanges is decreased.

Notation: Bold uppercase (lowercase) letters denote matrices (column vectors), while (.)(.)^{\top} stands for matrix transposition. (a)n({a})_{n} and (𝐀)n,n(\mathbf{A})_{n,n^{\prime}} denote the nn-th entry of 𝐚\mathbf{a} and the entries of 𝐀\mathbf{A}, respectively. Also, 𝟏\mathbf{1} is the all ones vector and 𝐞n\mathbf{e}_{n} is the basis vector whose all entries are zero except the nn-th one which equals one. The indicator function and the Hadamard product are denoted by 𝟙\mathbbm{1} and \circ, respectively. Moreover, Π\Pi and vec\mathrm{vec} represent the product of a collection of terms and the vectorization operation. Finally, trace(𝐀)\text{trace}(\mathbf{A}) and the Frobenius norm are denoted by tr(𝐀)\mathrm{tr}(\mathbf{A}) and ||.||F||.||_{F}, respectively.

II Preliminaries

II-A System Model

Consider NN networked sensor nodes which are able to exchange messages with their neighbour nodes to which they are connected. The network is modeled as a directed connected graph 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}), where the vertex (node) set 𝒱:=1,2,,N\mathcal{V}:={1,2,\cdots,N} corresponds to the network nodes, and \mathcal{E} represents the set of edges. The nn^{\prime}-th vertex vnv_{n^{\prime}} is connected to vnv_{n} if there is a directed edge from vnv_{n^{\prime}} to vnv_{n} (vn,vn)(v_{n^{\prime}},{v}_{n})\in\mathcal{E}, but this does not imply that vnv_{n} is connected to vnv_{n^{\prime}} unless (vn,vn)(v_{n},{v}_{n^{\prime}})\in\mathcal{E}. The in-neighborhood of the nn-th node is defined as the set of nodes connected to it, which can be denoted as 𝒩n=(vn|(vn,vn)\mathcal{N}_{n}=({v_{n^{\prime}}|(v_{n^{\prime}},{v}_{n})\in\mathcal{E}}). In the GSP context, vector 𝐱=[x1,x2,,xN]\mathbf{x}=[x_{1},x_{2},\cdots,x_{N}]^{\top} is refereed to as a graph signal where the ii-th component of 𝐱\mathbf{x} represents the signal value at the ii-th node of 𝒢\mathcal{G}.

II-B Decentralized linear transformations via graph filters

In this section, we briefly review implementing linear transformations via GFs. Finite impulse response (FIR) graph filtering [2] involves two steps, in the first step, a sequence of graph signals 𝐱(l)=[x1(l),x2(l),,xN(l)],l=0,,L\mathbf{x}^{(l)}=[x_{1}^{(l)},x_{2}^{(l)},\cdots,x_{N}^{(l)}]^{\top},l=0,\cdots,L is obtained via LL number of local exchanges between nodes where 𝐱(0)=𝐱\mathbf{x}^{(0)}=\mathbf{x}. Particularly, the graph signal at the ll-th iteration i.e. 𝐱(l)\mathbf{x}^{(l)} is computed based on the graph signal of the previous iteration i.e. 𝐱(l1)\mathbf{x}^{(l-1)} such that xn(l)=n𝒩nsn,nxn(l1)x_{n}^{(l)}=\sum_{n^{\prime}\in\mathcal{N}_{n}}s_{n,n^{\prime}}x_{n^{\prime}}^{(l-1)} where sn,ns_{n,n^{\prime}} is the coefficient used for aggregation of information between the n,nn,n^{\prime} nodes. Thus, in matrix form, we have 𝐱(l)=𝐒𝐱(l1)\mathbf{x}^{(l)}=\mathbf{S}\mathbf{x}^{(l-1)} where (𝐒)n,n=0(\mathbf{S})_{n,n^{\prime}}=0 if (vn,vn)({v}_{n},{v}_{n^{\prime}})\notin\mathcal{E}.

In the GSP context, 𝐒\mathbf{S} is called the shift operator [2, 27]. The graph shift operator is a decentralized operator because each node updates its information only via information of its neighbours. Examples of the graph shift operator are the adjacency and Laplacian matrices of 𝒢\mathcal{G} [17]. In the second step, outputs of all iterations computed in the first step are linearly aggregated as follows

𝐲=𝐇𝐱=Δl=0L1cl𝐒l𝐱\displaystyle\mathbf{y}=\mathbf{H}\mathbf{x}\overset{\Delta}{=}\sum_{l=0}^{L-1}c_{l}\mathbf{S}^{l}\mathbf{x} (1)

where {cl}l=0L1\{c_{l}\}_{l=0}^{L-1} are the filter coefficients, and LL is the order of the filter [2]. The graph filter in (1) has been used in [17] to implement a pre-specified transformation matrix 𝐓\mathbf{T}. Alternatively, in [17] the graph filter (1) is designed only for symmetric topologies such that l=0L1cl𝐒l=𝐓\sum_{l=0}^{L-1}c_{l}\mathbf{S}^{l}=\mathbf{T} for rank-1 projections or for scenarios where 𝐓\mathbf{T} and the given shift matrix such as the Laplacian or the Adjacency matrix are simultaneously diagonaizable i.e. they share the same eigenvectors.

The concept of FIR graph filters is generalized in [21] by introducing the edge-variant (EV) graph filters where a different set of weights is used in each iteration. The edge weighting matrices share the same support with the adjacency matrix of the network graph. In [21], the edge-variant graph filter is used to approximate linear transformations.

III Proposed method: Successive shift operators

In this section, by using the notion of graph shift operator, we propose a new approach to implement 𝐓𝐱\mathbf{T}\mathbf{x} in a decentralized manner as fast as possible. To this end, a sequence of different shift operators is applied so that 𝐒l𝐒2𝐒1𝐱=𝐓𝐱\mathbf{S}_{l}\cdots\mathbf{S}_{2}\mathbf{S}_{1}\mathbf{x}=\mathbf{T}\mathbf{x}, where 𝐒i,i=1,,l\forall{\mathbf{S}_{i}},i=1,\cdots,l, we have (𝐒i)nn=0(\mathbf{S}_{i})_{nn^{\prime}}=0 if (vn,vn)(v_{n^{\prime}},v_{n})\not\in\mathcal{E}.

As stated before, the proposed method leads to a decentralized approach. In fact, after the first round of information exchange among nodes, the nodes compute 𝐱(1):=𝐒1𝐱\mathbf{x}^{(1)}:=\mathbf{S}_{1}\mathbf{x}. Then, at the second round, 𝐱(2):=𝐒2𝐱(1)=𝐒2𝐒1𝐱\mathbf{x}^{(2)}:=\mathbf{S}_{2}\mathbf{x}^{(1)}=\mathbf{S}_{2}\mathbf{S}_{1}\mathbf{x}. The procedure is repeated for ll iterations. By properly designing {𝐒i}i=1l\{\mathbf{S}_{i}\}_{i=1}^{l}, 𝐓𝐳\mathbf{T}\mathbf{z} is exactly computed or Πi=1l𝐒i𝐱𝐓𝐱\Pi_{i=1}^{l}\mathbf{S}_{i}\mathbf{x}\approx\mathbf{T}\mathbf{x} is obtained with a small error.

To compute 𝐓\mathbf{T} as efficient as possible i.e. with the smallest number of iterations, we design the sequence of feasible graph shift matrices that minimize the number of local exchanges, i.e. ll by solving the following problem:

min𝐒1,𝐒2,,𝐒l\displaystyle\underset{\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{l}}{\text{min}} l\displaystyle l (2a)
s. t. (𝐒i)n,n=0\displaystyle(\mathbf{S}_{i})_{n,n^{\prime}}=0\hskip 5.69054pt
if(vn,vn),n,n=1,.,N,i=1,,l\displaystyle\text{if}\hskip 5.69054pt(v_{n^{\prime}},v_{n})\not\in\mathcal{E},n,n^{\prime}=1,....,N,i=1,...,l (2b)
𝐓=Πi=1l𝐒i\displaystyle\mathbf{T}=\Pi_{i=1}^{l}\mathbf{S}_{i} (2c)

where constraint (2b) is added to the problem because the shift operators must satisfy the topological constraints of the graph connectivity.

Optimization problem (2) is intractable. In order to address this issue, an upper limit for ll is considered such that l{1,,L}l\in\{1,\cdots,L\}. Then, to design the shift operators that compute 𝐓\mathbf{T} via Πl𝐒l\Pi_{l}\mathbf{S}_{l} as fast as possible, the error at each round can be minimized i.e. the error between 𝐓\mathbf{T} and Πl𝐒l\Pi_{l}\mathbf{S}_{l}, which can be attained by solving the following optimization problem:

min𝐒1,𝐒2,,𝐒L\displaystyle\underset{\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{L}}{\text{min}} l=1L𝐓Πi=1l𝐒iF2\displaystyle\sum_{l=1}^{L}{{\left\|\mathbf{T}-\Pi_{i=1}^{l}\mathbf{S}_{i}\right\|}_{F}^{2}} (3a)
s. t. (𝐒i)n,n=0\displaystyle(\mathbf{S}_{i})_{n,n^{\prime}}=0\hskip 5.69054pt
if(vn,vn),n,n=1,.,N,i=1,,L\displaystyle\text{if}\hskip 5.69054pt(v_{n^{\prime}},v_{n})\not\in\mathcal{E},n,n^{\prime}=1,....,N,i=1,...,L (3b)

In optimization problem (3), we put the same emphasis on all rounds, which means that all of them are treated equally. However, if we put more emphasise on the later rounds, we expect that a faster approach is obtained. By this approach, we accept larger error on the earlier rounds to obtain smaller error at the later ones, which can be implemented by assigning different weights to the terms in cost function (3a). To end this, larger weights are assigned to the later rounds. Thus, we have:

min.𝐒1,𝐒2,,𝐒L\displaystyle\underset{\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{L}}{\text{min.}} l=1Lωl𝐓Πi=1l𝐒iF2\displaystyle\sum_{l=1}^{L}\omega_{l}{{\left\|\mathbf{T}-\Pi_{i=1}^{l}\mathbf{S}_{i}\right\|}_{F}^{2}} (4a)
s. t. (𝐒i)n,n=0\displaystyle(\mathbf{S}_{i})_{n,n^{\prime}}=0\hskip 5.69054pt
if(vn,vn),n,n=1,.,N,i=1,,L\displaystyle\text{if}\hskip 5.69054pt(v_{n^{\prime}},v_{n})\not\in\mathcal{E},n,n^{\prime}=1,....,N,i=1,...,L (4b)

where 𝝎=[ω1,ω2,,ωL]\boldsymbol{\omega}=[\omega_{1},\omega_{2},\cdots,\omega_{L}]^{\top} is the weight vector whose entries are non-negative increasing, and such that l=1Lωl=1\sum_{l=1}^{L}\omega_{l}=1.

Optimization problem (4) is non-convex with respect to all its variables; however, the cost function (4a) is a strongly convex function with respect to each 𝐒i\mathbf{S}_{i} separately, i.e. when the other blocks of variables are fixed. Therefore, we can apply the block coordinate descent (BCD) algorithm to solve the optimization problem (4). The convergence of the BCD algorithm is guaranteed for the optimization problem in (4[28]. In this approach, all blocks of variables are fixed except one of them, and the optimization problem is solved with respect to the free block. This procedure continues until all the matrices 𝐒1,𝐒2,,𝐒L\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{L} are considered as the block variable of the optimization problem. The BCD algorithm is repeated until a stopping criterion is satisfied.

Here, we try to obtain the closed form solution of each round of the BCD algorithm, where one of shift matrices is the variable of optimization problem (4) and the rest of shift matrices are fixed, the following optimization problem is solved:

Jj=ΔInf𝐒j=(vn,vn)Sn,n(j)𝐞n𝐞n\displaystyle{J^{*}_{j}}\overset{\Delta}{=}\underset{\mathbf{S}_{j}=\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top}}{\text{Inf}} Jj(𝐒j)\displaystyle{J_{j}}(\mathbf{S}_{j}) (5)

where Jj(𝐒j)=l=jLωl𝐓𝐒l𝐒l1𝐒j+1𝐒j𝐒j1𝐒1F2{J_{j}}(\mathbf{S}_{j})=\sum_{l=j}^{L}\omega_{l}{\left\|\mathbf{T}-\mathbf{S}_{l}\mathbf{S}_{l-1}\cdots\mathbf{S}_{j+1}\mathbf{S}_{j}\mathbf{S}_{j-1}\cdots\mathbf{S}_{1}\right\|}_{F}^{2}, 𝐒j=(vn,vn)Sn,n(j)𝐞n𝐞n\mathbf{S}_{j}=\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top} satisfies constraint (4b), and Sn,n(j)S_{n,n^{\prime}}^{(j)} is the entry that lies in the nn-th row and the nn^{\prime}-th column of 𝐒j\mathbf{S}_{j}. Thus, we have:

Jj=Inf{Sn,n(j)}(vn,vn)\displaystyle{J^{*}_{j}}=\underset{\{{S}^{(j)}_{n,n^{\prime}}\}_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}}{\text{Inf}} Jj((vn,vn)Sn,n(j)𝐞n𝐞n)\displaystyle{J_{j}}(\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top}) (6)

The term inside of (6) can be expressed in vector form by applying the vectorization operator as follows:

vec((vn,vn)Sn,n(j)𝐞n𝐞n)=(vn,vn)Sn,n(j)(𝐞n𝐞n)\displaystyle\mathrm{vec}(\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top})=\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}(\mathbf{e}_{n^{\prime}}\otimes\mathbf{e}_{n}) (7)

where in (7), we use the fact that vec(𝐞n𝐞n)=𝐞n𝐞n\mathrm{vec}(\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top})=\mathbf{e}_{n^{\prime}}\otimes\mathbf{e}_{n}. Moreover, we have:

(vn,vn)Sn,n(j)(𝐞n𝐞n)=[𝐞n1𝐞n1,,𝐞nE𝐞nE]𝐄𝐬(j)\displaystyle\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}(\mathbf{e}_{n^{\prime}}\otimes\mathbf{e}_{n})=\underbrace{[\mathbf{e}_{n^{\prime}_{1}}\otimes\mathbf{e}_{n_{1}},\cdots,\mathbf{e}_{n^{\prime}_{E}}\otimes\mathbf{e}_{n_{E}}]}_{\mathbf{E}}{\mathbf{s}^{(j)}} (8)

where 𝐬(j)=[Sn1,n1(j)SnE,nE(j)]\mathbf{s}^{(j)}=\left[\begin{array}[]{ccc}S_{n_{1},n^{\prime}_{1}}^{(j)}&\cdots&S_{n_{E},n^{\prime}_{E}}^{(j)}\end{array}\right]^{\top}. Therefore, optimization problem (6) can be represented as follows:

inf{Sn,n(j)}(vn,vn)J((vn,vn)Sn,n(j)𝐞n𝐞n)=\displaystyle\underset{\{S^{(j)}_{n,n^{\prime}}\}_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}}{\text{inf}}J(\sum_{(v_{n^{\prime}},v_{n})\in\mathcal{E}}S_{n,n^{\prime}}^{(j)}\mathbf{e}_{n}\mathbf{e}_{n^{\prime}}^{\top})=
inf𝐬(j)EJ(vec1(𝐄𝐬(j)))\displaystyle\underset{\mathbf{s}^{(j)}\in\mathbb{R}^{E}}{\text{inf}}J(\mathrm{vec}^{-1}(\mathbf{E}\mathbf{s}^{(j)})) (9)

Consequently, from (III), optimization problem (4) can be re-written as follows:

inf𝐬(j)\displaystyle\underset{\mathbf{s}^{(j)}}{\text{inf}} l=jLωl𝐓𝐒l:j+1vec1(𝐄𝐬(j)))𝐒j1:1F2\displaystyle\sum_{l=j}^{L}\omega_{l}{{\left\|\mathbf{T}-\mathbf{S}_{l:j+1}\mathrm{vec}^{-1}(\mathbf{E}\mathbf{s}^{(j)}))\mathbf{S}_{j-1:1}\right\|}_{F}^{2}} (10)

where 𝐒l:j=Δ𝐒l𝐒l1𝐒j\mathbf{S}_{l:j}\overset{\Delta}{=}\mathbf{S}_{l}\mathbf{S}_{l-1}\cdots\mathbf{S}_{j}.

Finally, the vectorized form of (10) equals:

inf𝐬(j)\displaystyle\underset{\mathbf{s}^{(j)}}{\text{inf}} l=jLωlvec(𝐓)(𝐒j1:1𝐒l:j+1)𝐄𝐬(j)22\displaystyle\sum_{l=j}^{L}\omega_{l}{{\left\|\mathrm{vec}(\mathbf{T})-(\mathbf{S}_{j-1:1}^{\top}\otimes\mathbf{S}_{l:j+1})\mathbf{E}\mathbf{s}^{(j)}\right\|}_{2}^{2}} (11)

The following algorithm represents the approach for solving (11). For the stopping criterion |f(𝐒L:1(i+1))f(𝐒L:1(i))|/|f(𝐒L:1(i))|<ε|f(\mathbf{S}_{L:1}^{(i+1)})-f(\mathbf{S}_{L:1}^{(i)})|/|f(\mathbf{S}_{L:1}^{(i)})|<\varepsilon is used for a sufficiently small ε\varepsilon, where f(𝐒L:1)f(\mathbf{S}_{L:1}) equals cost function (3a), and ii denotes the ii-th iteration of the BCD algorithm.

input : ,L,ω1,,ωL,ε\mathcal{E},L,\omega_{1},\cdots,\omega_{L},\varepsilon.
initialize :  𝐒2,𝐒3,,𝐒L\mathbf{S}_{2},\mathbf{S}_{3},\cdots,\mathbf{S}_{L}.
1 while stopping criterion not satisfied do
2       for j=1j=1 to LL do
3             fix 𝐒m,m=1,,L,mj\mathbf{S}_{m},m=1,\cdots,L,m\neq{j}, obtain 𝐬(j)\mathbf{s}^{(j)} via (11)
4       end for
5      
6 end while
output : 𝐒1,𝐒2,,𝐒L\mathbf{S}_{1},\mathbf{S}_{2},\cdots,\mathbf{S}_{L}
Algorithm 1 Proposed solver

IV Edge fluctuation

IV-A Edge Fluctuation Model

In the previous section, we have assumed that all edges are fixed; however, in practical scenarios, there are fluctuations on edges due to different reasons such as random links or node failures in WSN, which leads to a random graph topology. In this subsection, the effect of edge fluctuations on the proposed method i.e. the successive method is studied. We consider that edge fluctuations occur randomly and independently across edges. Please note that in this section, to provide an analysis of the effect of edge fluctuations, we assume that the estimation of link activation probabilities is available. However, knowing the link activation probabilities is not required for our proposed method to deal with the edge fluctuations effect. Moreover, we assume that the spectral norm of all graph shift operators is upper-bounded i.e. 𝐒i2ρ<i=1,L||\mathbf{S}_{i}||_{2}\leq\rho<\infty\quad\forall i=1,\cdots{L}. This assumption is practical because it implies that the graph of interest has finite edge weights [24].

When an edge is missed in a certain of iteration of the successive method, the corresponded destination node to the missing edge loses the information of its neighbor, which generates a deviation. The shift operator corresponding to the graph with perturbations can be modeled as 𝐒^=𝐒+𝐒~\hat{\mathbf{S}}=\mathbf{S}+\tilde{\mathbf{S}} where 𝐒~\tilde{\mathbf{S}} is created because of edge fluctuations, and its non-zero entries equal the corresponding entries of 𝐒\mathbf{S} with a negative sign. Thus, the signal output for the first iteration of the successive method can be expressed

𝐲(1)=(𝐒1+𝐒~1)𝐱=𝐒1𝐱+𝐳1\displaystyle\mathbf{y}^{(1)}=(\mathbf{S}_{1}+\tilde{\mathbf{S}}_{1})\mathbf{x}=\mathbf{S}_{1}\mathbf{x}+\mathbf{z}_{1} (12)

where 𝐳1=Δ𝐒~1𝐱\mathbf{z}_{1}\overset{\Delta}{=}\tilde{\mathbf{S}}_{1}\mathbf{x}. We assume that 𝐳1\mathbf{z}_{1} is a realization of a random process, with mean 𝐦z1\mathbf{m}_{z_{1}} and covariance matrix 𝚺z1\mathbf{\Sigma}_{z_{1}}. Note that 𝐒~1\tilde{\mathbf{S}}_{1} can be modeled by using the activation link probabilities i.e. 𝔼[𝐒~1]=(𝟏𝐏ac)(𝐒1)\mathbb{E}[\tilde{\mathbf{S}}_{1}]=(\mathbf{1}-\mathbf{P}_{\text{ac}})\circ(-\mathbf{S}_{1}), where 𝐏ac\mathbf{P}_{\text{ac}} is the link activation probability matrix. Thus, since 𝐱\mathbf{x} is fixed we have 𝐦z1=𝐦𝐒~1𝐱\mathbf{m}_{z_{1}}=\mathbf{m}_{\tilde{\mathbf{S}}_{1}}\mathbf{x}, where 𝐦𝐒~1=(𝐏ac𝟏)(𝐒1)\mathbf{m}_{\tilde{\mathbf{S}}_{1}}=(\mathbf{P}_{\text{ac}}-\mathbf{1})\circ(\mathbf{S}_{1}) is the mean of 𝐒~1{\tilde{\mathbf{S}}_{1}}. In addition, for the covariance of 𝐳1\mathbf{z}_{1}, we have

𝚺z1=𝔼[𝐳1𝐳1]𝔼[𝐳1]𝔼[𝐳1]\displaystyle\mathbf{\Sigma}_{z_{1}}=\mathbb{E}[\mathbf{z}_{1}\mathbf{z}_{1}^{\top}]-\mathbb{E}[\mathbf{z}_{1}]\mathbb{E}[\mathbf{z}_{1}^{\top}]
=𝔼[𝐒~1𝐱𝐱𝐒~1]𝐦𝐳1𝐦𝐳1\displaystyle\quad\quad=\mathbb{E}[\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}}_{1}^{\top}]-\mathbf{m}_{\mathbf{z}_{1}}\mathbf{m}_{\mathbf{z}_{1}}^{\top} (13)

The ijij-th element of 𝔼[𝐒~1𝐱𝐱𝐒~1]\mathbb{E}[\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}}_{1}^{\top}] equals 𝔼(𝐬~1i𝐗𝐬~1j)\mathbb{E}(\tilde{\mathbf{s}}_{1_{i}}^{\top}\mathbf{X}\tilde{\mathbf{s}}_{1_{j}}), where 𝐗=𝐱𝐱\mathbf{X}=\mathbf{x}\mathbf{x}^{\top} and 𝐒~1=[𝐬~11|𝐬~12||𝐬~1N]\tilde{\mathbf{S}}_{1}=[\tilde{\mathbf{s}}_{1_{1}}|\tilde{\mathbf{s}}_{1_{2}}|\cdots|\tilde{\mathbf{s}}_{1_{N}}]^{\top}. From the cyclic property of trace, and the fact that trace is a linear operator, we have

(𝔼[𝐒~1𝐱𝐱𝐒~1])i,j=𝔼[𝐬~1i𝐗𝐬~1j]=tr(𝐗𝔼(𝐬~1j𝐬~1i))\displaystyle(\mathbb{E}[\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}}_{1}^{\top}])_{i,j}=\mathbb{E}[\tilde{\mathbf{s}}_{1_{i}}^{\top}\mathbf{X}\tilde{\mathbf{s}}_{1_{j}}]=\mathrm{tr}(\mathbf{X}\mathbb{E}(\tilde{\mathbf{s}}_{1_{j}}\tilde{\mathbf{s}}_{1_{i}}^{\top}))
=tr(𝐗𝐂1ji)\displaystyle=\mathrm{tr}(\mathbf{X}\mathbf{C}_{1_{{ji}}}) (14)

where 𝐂1ji=𝔼[𝐬1j𝐬1i]\mathbf{C}_{1_{{ji}}}=\mathbb{E}[\mathbf{s}_{1_{j}}\mathbf{s}_{1_{i}}^{\top}] is the cross-correlation between 𝐬~1j,𝐬~1i\tilde{\mathbf{s}}_{1_{j}},\tilde{\mathbf{s}}_{1_{i}}. Consequently:

(𝚺z1)i,j=tr(𝐗𝐂1ji)(𝐌𝐳1)i,ji,j=1,,N\displaystyle(\mathbf{\Sigma}_{z_{1}})_{i,j}=\mathrm{tr}(\mathbf{X}\mathbf{C}_{1_{{ji}}})-(\mathbf{M}_{\mathbf{z}_{1}})_{i,j}\quad\forall{i,j=1,\cdots,N} (15)

where 𝐌𝐳1=𝐦𝐳1𝐦𝐳1\mathbf{M}_{\mathbf{z}_{1}}=\mathbf{m}_{\mathbf{z}_{1}}\mathbf{m}_{\mathbf{z}_{1}}^{\top}. In the next iteration, we have:

𝐲(2)=(𝐒2+𝐒~2)𝐲(1)=𝐒2𝐒1𝐱+𝐒2𝐳1+𝐳2\mathbf{y}^{(2)}=(\mathbf{S}_{2}+\tilde{\mathbf{S}}_{2})\mathbf{y}^{(1)}=\mathbf{S}_{2}\mathbf{S}_{1}\mathbf{x}+\mathbf{S}_{2}\mathbf{z}_{1}+\mathbf{z}_{2} (16)

where 𝐳2=Δ𝐒~2𝐒1𝐱+𝐒~2𝐳1\mathbf{z}_{2}\overset{\Delta}{=}\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{x}+\tilde{\mathbf{S}}_{2}\mathbf{z}_{1}, which implies that 𝐳2\mathbf{z}_{2} is correlated with 𝐳1\mathbf{z}_{1}. Indeed, 𝐦𝐳2=𝐦𝐒~2(𝐒1+𝐦𝐒~1)𝐱\mathbf{m}_{\mathbf{z}_{2}}=\mathbf{m}_{\tilde{\mathbf{S}}_{2}}(\mathbf{S}_{1}+\mathbf{m}_{\tilde{\mathbf{S}}_{1}})\mathbf{x} and

𝚺𝐳2=𝔼[𝐳2𝐳2]=𝔼[(𝐒~2𝐒1𝐱+𝐒~2𝐳1)\displaystyle\mathbf{\Sigma}_{\mathbf{z}_{2}}=\mathbb{E}[\mathbf{z}_{2}\mathbf{z}_{2}^{\top}]=\mathbb{E}[(\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{x}+\tilde{\mathbf{S}}_{2}\mathbf{z}_{1})
×(𝐱𝐒1𝐒~2+𝐳1𝐒~2)]𝐦𝐳2𝐦𝐳2\displaystyle\times(\mathbf{x}^{\top}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}+\mathbf{z}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top})]-\mathbf{m}_{\mathbf{z}_{2}}\mathbf{m}_{\mathbf{z}_{2}}^{\top} (17)

From 𝐳1=Δ𝐒~1𝐱\mathbf{z}_{1}\overset{\Delta}{=}\tilde{\mathbf{S}}_{1}\mathbf{x} and (IV-A), we have that

𝚺𝐳2=𝔼[𝐒~2𝐒1𝐱𝐱𝐒1𝐒~2+𝐒~2𝐒1𝐱𝐱𝐒~1𝐒~2+\displaystyle\mathbf{\Sigma}_{\mathbf{z}_{2}}=\mathbb{E}[\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{x}\mathbf{x}^{\top}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}+\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}+
𝐒~2𝐒~1𝐱𝐱𝐒1~𝐒~2+𝐒~2𝐒~1𝐱𝐱𝐒1𝐒~2]𝐌𝐳2\displaystyle\tilde{\mathbf{S}}_{2}\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\tilde{\mathbf{S}_{1}}^{\top}\tilde{\mathbf{S}}_{2}^{\top}+\tilde{\mathbf{S}}_{2}\tilde{\mathbf{S}}_{1}\mathbf{x}\mathbf{x}^{\top}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]-\mathbf{M}_{\mathbf{z}_{2}} (18)

where 𝐌𝐳2=𝐦𝐳2𝐦𝐳2\mathbf{M}_{\mathbf{z}_{2}}=\mathbf{m}_{\mathbf{z}_{2}}\mathbf{m}_{\mathbf{z}_{2}}^{\top}. By taking expectations and the linearity property of the expectation operator, we have:

𝚺𝐳2=𝔼[𝐒~2𝐒1𝐗𝐒1𝐒~2]+𝔼[𝐒~2𝐒1𝐗𝐦𝐒~1𝐒~2]+\displaystyle\mathbf{\Sigma}_{\mathbf{z}_{2}}=\mathbb{E}[\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{X}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]+\mathbb{E}[\tilde{\mathbf{S}}_{2}\mathbf{S}_{1}\mathbf{X}\mathbf{m}_{\tilde{\mathbf{S}}_{1}}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]+
𝔼[𝐒~2𝐒~1𝐗𝐒1~𝐒~2]+𝔼[𝐒~2𝐦𝐒~1𝐗𝐒1𝐒~2]𝐌𝐳2\displaystyle\mathbb{E}[\tilde{\mathbf{S}}_{2}\tilde{\mathbf{S}}_{1}\mathbf{X}\tilde{\mathbf{S}_{1}}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]+\mathbb{E}[\tilde{\mathbf{S}}_{2}\mathbf{m}_{\tilde{\mathbf{S}}_{1}}\mathbf{X}\mathbf{S}_{1}^{\top}\tilde{\mathbf{S}}_{2}^{\top}]-\mathbf{M}_{\mathbf{z}_{2}} (19)

Similar to (IV-A) and (15), for each entry of 𝚺𝐳2\mathbf{\Sigma}_{\mathbf{z}_{2}}, we have

(𝚺𝐳2)i,j=tr((𝐒1𝐗𝐒1+𝐒1𝐗𝐦𝐒~1+𝝌1+𝐦𝐒~1𝐗𝐒1)𝐂2ji)\displaystyle(\mathbf{\Sigma}_{\mathbf{z}_{2}})_{i,j}=\mathrm{tr}((\mathbf{S}_{1}\mathbf{X}\mathbf{S}_{1}^{\top}+\mathbf{S}_{1}\mathbf{X}\mathbf{m}_{\tilde{\mathbf{S}}_{1}}^{\top}+\boldsymbol{\chi}_{1}+\mathbf{m}_{\tilde{\mathbf{S}}_{1}}\mathbf{X}\mathbf{S}_{1}^{\top})\mathbf{C}_{2_{{ji}}})
(𝐌𝐳2)i,ji,j=1,,N\displaystyle-(\mathbf{M}_{\mathbf{z}_{2}})_{i,j}\quad\forall{i,j=1,\cdots,N} (20)

where 𝝌1=𝔼[𝐒~1𝐗𝐒~1]\boldsymbol{\chi}_{1}=\mathbb{E}[\tilde{\mathbf{S}}_{1}\mathbf{X}\tilde{\mathbf{S}}_{1}^{\top}] and 𝐂2ji\mathbf{C}_{2_{{ji}}} is the cross-correlation between 𝐬~2i,𝐬~2j\tilde{\mathbf{s}}_{2_{i}},\tilde{\mathbf{s}}_{2_{j}}. By using a similar approach to the one used in (IV-A), we have that (𝝌1)i,j=tr(𝐗𝐂1ji),i,j=1,,N(\boldsymbol{\chi}_{1})_{i,j}=\mathrm{tr}(\mathbf{X}\mathbf{C}_{1_{{ji}}}),\forall{i,j=1,\cdots,N}.

After LL iterations, the output of the successive method under the edge fluctuation model equals:

𝐲=𝐇𝐱+i=1L1(Πj=iL1𝐒j+1)𝐳i+𝐳L\mathbf{y}=\mathbf{H}\mathbf{x}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}+\mathbf{z}_{L} (21)

where 𝐇=Πi=1L𝐒i\mathbf{H}=\Pi_{i=1}^{L}\mathbf{S}_{i}. The first and second order statistics of 𝐳i,i=3,,L\mathbf{z}_{i},\forall{i=3,\cdots,L} can be obtained by the similar approach as the one used for 𝐳1,𝐳2\mathbf{z}_{1},\mathbf{z}_{2}.

Next, we show that, under the edge fluctuation model, the mean square error (MSE) of the the successive approach output after LL iterations expressed as follows is upper bounded.

𝔼[𝛀22]=𝔼[𝐳L+i=1L1(Πj=iL𝐒j+1)𝐳i22]\displaystyle\mathbb{E}[||\boldsymbol{\Omega}||_{2}^{2}]=\mathbb{E}[||\mathbf{z}_{L}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L}\mathbf{S}_{j+1})\mathbf{z}_{i}||_{2}^{2}] (22)

where 𝛀=𝐲𝐇𝐱\boldsymbol{\Omega}=\mathbf{y}-\mathbf{H}\mathbf{x}.

Proposition 1: The MSE of the output of the successive approach in (21) under the edge fluctuation model is upper-bounded by:

𝔼[]||𝛀||22]𝔼[tr(𝚿)]+2j=2LρLj+1tr(𝚺zj1+𝐦zj1)+\displaystyle\mathbb{E}[]||\boldsymbol{\Omega}||_{2}^{2}]\leq\mathbb{E}[\mathrm{tr}(\boldsymbol{\Psi})]+2\sum_{j=2}^{L}\rho^{L-j+1}\mathrm{tr}(\boldsymbol{\Sigma}_{z_{j-1}}+\mathbf{m}_{z_{j-1}})+
tr(𝚺zL+𝐦zL)\displaystyle\mathrm{tr}(\boldsymbol{\Sigma}_{z_{L}}+\mathbf{m}_{z_{L}}) (23)

where 𝚿=i=1L1(Πj=iL1𝐒j+1)𝐳ik=1,kiL1𝐳k(Πm=kL1𝐒m+1)+𝐳Lk=1L1𝐳k(Πj=kL1𝐒j+1)+i=1L1(Πj=iL1𝐒j+1)𝐳i𝐳L\boldsymbol{\Psi}=\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}\sum_{k=1,k\neq{i}}^{L-1}\mathbf{z}_{k}^{\top}(\Pi_{m=k}^{L-1}\mathbf{S}_{m+1})^{\top}\\ +\mathbf{z}_{L}\sum_{k=1}^{L-1}\mathbf{z}_{k}^{\top}(\Pi_{j=k}^{L-1}\mathbf{S}_{j+1})^{\top}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}\mathbf{z}_{L}^{\top}.

Proof.

Please see Appendix A. ∎

From Proposition 1, the MSE of the successive approach output under the edge fluctuation model is upper-bounded by ρ\rho and some other parameters i.e. 𝚿,𝐦zi\boldsymbol{\Psi},\mathbf{m}_{z_{i}} and 𝚺zi,i=1,,N\mathbf{\Sigma}_{z_{i}},\forall{i=1,\cdots,N}, which are dependent on the statistics of 𝐒~i,i=1,,N{\tilde{\mathbf{S}}_{i}},\forall{i=1,\cdots,N} and the shift matrices based on Proposition 1.

IV-B Proposed method: the missing values estimator

In this section, to deal with the edge fluctuations effect, we focus to estimate the missing values via a kernel-based estimator. For this, the main task can be expressed as follows: given 𝐱,𝒢,𝐇\mathbf{x},\mathcal{G},\mathbf{H}, the goal is to estimate the missing values due to the edge fluctuations. To this end, an online kernel-based method is proposed such that the node whose neighbors’ information is missed, estimates the missing value based on its own available information.

Let us consider that in the ii-th iteration of the successive approach, the nn-th node misses the information from the nn^{\prime}-th node which is given by

xn(i1)=n′′𝒩n(𝐒i)n,n′′xn′′(i2)+(𝐒i)n,n′′xn(i2)\displaystyle x_{n^{\prime}}^{(i-1)}=\sum_{n^{\prime\prime}\in\mathcal{N}_{n^{\prime}}}({\mathbf{S}_{i})_{n^{\prime},n^{\prime\prime}}}x_{n^{\prime\prime}}^{(i-2)}+({\mathbf{S}_{i})_{n^{\prime},n^{\prime\prime}}}x_{n^{\prime}}^{(i-2)} (24)

From (24), we can conclude that xn(i2)x_{n^{\prime}}^{(i-2)} and xn(i2)x_{n}^{(i-2)} are the most relevant information that is available for the nn-th node to estimate the missing value. Rest information of the nn-th node could be also informative since the nn and nn^{\prime} nodes can have one-hop or two-hops common neighbors. Moreover, due to the fact that the network is connected, all nodes are connected to each other with a number of hops. Consequently, in our method, the nn-th node utilizes 𝐮n=[xn(i2),xn(i2),𝐜n(i2)]|𝒩n|+1×1\mathbf{u}_{n}=[x_{n^{\prime}}^{(i-2)},x_{n}^{(i-2)},\mathbf{c}_{n}^{(i-2)}]^{\top}\in\mathbb{R}^{|\mathcal{N}_{n}|+1\times{1}} to estimate the missing value (information of the nn^{\prime} node in the i1i-1-th iteration) where 𝐜n\mathbf{c}_{n} is a vector that contains the neighbours of the nn-th node except the nn^{\prime}-th one.

To estimate the missing values, in supervised learning, an estimator function is sought via the training samples. To end this, flexible non-parametric models are needed for good results. A general class of models is based on functions of the form [29]

f(𝐪)=k=1Kαk(𝐪,𝐪k)\displaystyle f(\mathbf{q})=\sum_{k=1}^{K}\alpha_{k}\mathcal{F}(\mathbf{q},\mathbf{q}_{k}) (25)

where \mathcal{F} is a non-linear function, α1,,αK\alpha_{1},\cdots,\alpha_{K}\in\mathbb{R} are coefficients, 𝐪1,,𝐪Kd\mathbf{q}_{1},\cdots,\mathbf{q}_{K}\in\mathbb{R}^{d} are the training samples and KK is the number of samples. Kernel methods are a famous example of an approach using functions of the form (25[30].

In kernel-based learning, it is assumed that the estimator function belongs to a reproducing kernel Hilbert space (RKHS) expressed as [30]

:={f:f(𝐪)=k=1Kαkκ(𝐪,𝐪k),αk}\displaystyle\mathcal{H}:=\{f:f(\mathbf{q})=\sum_{k=1}^{K}\alpha_{k}\kappa(\mathbf{q},\mathbf{q}_{k}),\alpha_{k}\in\mathbb{R}\} (26)

where κ:N×N\kappa:\mathbb{R}^{N}\times\mathbb{R}^{N}\to\mathbb{R} is a pre-selected symmetric and positive definite function. Given 𝐲\mathbf{y} and matrix samples 𝐐=[𝐪1,.𝐪K]\mathbf{Q}=[\mathbf{q}_{1},\cdots.\mathbf{q}_{K}], the kernel-based estimate is usually obtained by solving the following problem:

f^=arg minf{1/Kk=1K(f(𝐪k)yk)+λ||f||2}\displaystyle\hat{f}=\underset{f\in\mathcal{H}}{\text{arg min}}\{1/K\sum_{k=1}^{K}\mathcal{L}(f(\mathbf{q}_{k})-y_{k})+\lambda||f||^{2}_{\mathcal{H}}\} (27)

where \mathcal{L} is a pre-selected cost function, for regression the least-squares is utilized. Moreover, ||f||2=m=1m=1αmαmκ(𝐪m.𝐪m)||f||^{2}_{\mathcal{H}}=\sum_{m=1}\sum_{m^{\prime}=1}\alpha_{m}\alpha_{m^{\prime}}\kappa(\mathbf{q}_{m}.\mathbf{q}_{m^{\prime}}) denotes the RKHS norm used as a regularizer to deal with overfitting and λ\lambda is a regularization parameter.

Due to the fact that \mathcal{H} is infinite dimensional, f^\hat{f} cannot be obtained directly by solving (27). However, based on the representer theorem [31], (27) admits a solution in form of

f^(𝐪)=k=1Kαkκ(𝐪,𝐪k):=𝜶𝐭(𝐪)\displaystyle\hat{f}(\mathbf{q})=\sum_{k=1}^{K}\alpha_{k}\kappa(\mathbf{q},\mathbf{q}_{k}):=\boldsymbol{\alpha}^{\top}\mathbf{t}(\mathbf{q}) (28)

where 𝐭(𝐪):=[κ(𝐪,𝐪1),,κ(𝐪,𝐪K)]\mathbf{t}(\mathbf{q}):=[\kappa(\mathbf{q},\mathbf{q}_{1}),\cdots,\kappa(\mathbf{q},\mathbf{q}_{K})]^{\top}. By substituting (28) into (27), 𝜶\boldsymbol{\alpha} is obtained via solving the following problem:

 min𝜶K{1/Kk=1K(𝜶𝐭(𝐪k)yk)+λ𝜶𝚪𝜶}\displaystyle\underset{\boldsymbol{\alpha}\in\mathbb{R}^{K}}{\text{ min}}\{1/K\sum_{k=1}^{K}\mathcal{L}(\boldsymbol{\alpha}^{\top}\mathbf{t}(\mathbf{q}_{k})-y_{k})+\lambda\boldsymbol{\alpha}^{\top}\boldsymbol{\Gamma}\boldsymbol{\alpha}\} (29)

where 𝚪K×K\boldsymbol{\Gamma}\in\mathbb{R}^{K\times{K}} whose (i,i)(i,i^{\prime})-th entry is κ(𝐪i,𝐪i)\kappa(\mathbf{q}_{i},\mathbf{q}_{i^{\prime}}).

In our proposed method, each node for each of its neighbours employs a function estimator to estimate the missing value. This can be obtained by using estimators in form of (28) such that the ii-th node employs the following estimator for each of its neighbours:

fij(𝐮i)=m=1Kαm(ij)κ(𝐮i,𝐮m)\displaystyle{f}_{ij}(\mathbf{u}_{i})=\sum_{m=1}^{K}\alpha_{m}^{(ij)}\kappa(\mathbf{u}_{i},\mathbf{u}_{m}) (30)

where j𝒩ij\in\mathcal{N}_{i}, {𝐮m}m=1K\{\mathbf{u}_{m}\}_{m=1}^{K} is the training samples. The estimators are trained by solving optimization problem (29) that admits a closed-form solution as follows for kernel regression:

𝜶^=(𝚪+λK𝐈)1𝐲\displaystyle\hat{\boldsymbol{\alpha}}=(\boldsymbol{\Gamma}+\lambda{K}\mathbf{I})^{-1}\mathbf{y} (31)

From (31), to obtain 𝜶^\hat{\boldsymbol{\alpha}}, each node for each of its neighbors should invert a K×KK\times{K} matrix which is computationally heavy especially when the number of samples is high. Indeed, each node, when an edge fluctuation occurs, has to inverse a K×KK\times{K} matrix whose computation complexity equals 𝒪(K3)\mathcal{O}(K^{3}). Also, they have to save a high number of scalars, for instance, the ii-th node should save K×(|𝒩i|+1){K}\times(|\mathcal{N}_{i}|+1) scalars.

To tackle this issue, we employ the random feature approach [32], which is an approximation of κ(𝐮,𝐮m)\kappa(\mathbf{u},\mathbf{u}_{m}) obtained from the shift-invariant kernels i.e. κ(𝐮n,𝐮m)=κ(𝐮n𝐮m)\kappa(\mathbf{u}_{n},\mathbf{u}_{m})=\kappa(\mathbf{u}_{n}-\mathbf{u}_{m}). In this paper, we employ shift-invariant and bounded kernels. The examples of that kind of kernels are Gaussian, Laplacian and Cauchy kernels [32]. The Fourier transforms of the shift-invariant kernels denoted by Θ(𝐰)\Theta(\mathbf{w}) exist in closed form as follows:

κ(𝐮n,𝐮m)=Θ(𝐰)ej𝐰(𝐮n𝐮m)𝑑𝐰\displaystyle\kappa(\mathbf{u}_{n},\mathbf{u}_{m})=\int\Theta(\mathbf{w})e^{j\mathbf{w}(\mathbf{u}_{n}-\mathbf{u}_{m})}d\mathbf{w} (32)

From the definition of the expected value, (32) equals 𝔼𝐰[ej𝐰(𝐮n𝐮m)]\mathbb{E}_{\mathbf{w}}[e^{j\mathbf{w}(\mathbf{u}_{n}-\mathbf{u}_{m})}]. The main idea of the random feature approach is that by drawing a sufficient number of samples {𝐰i=1D}\{\mathbf{w}_{i=1}^{D}\} from Θ(𝐰)\Theta(\mathbf{w}), κ(𝐮n,𝐮m)\kappa(\mathbf{u}_{n},\mathbf{u}_{m}) can be approximated by

κ^(𝐮n,𝐮m)=𝚫𝐖(𝐮n)𝚫𝐖(𝐮m)\displaystyle\hat{\kappa}(\mathbf{u}_{n},\mathbf{u}_{m})=\boldsymbol{\Delta}_{\mathbf{W}}^{\top}(\mathbf{u}_{n})\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{m}) (33)

where 𝐖:=[𝐰1,,𝐰D]\mathbf{W}:=[\mathbf{w}_{1},\cdots,\mathbf{w}_{D}] and

𝚫𝐖(𝐮)=1D[sin(𝐰1𝐮),,sin(𝐰D𝐮),\displaystyle\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u})=\frac{1}{\sqrt{D}}[\mathrm{sin}(\mathbf{w}_{1}^{\top}\mathbf{u}),\cdots,\mathrm{sin}(\mathbf{w}_{D}^{\top}\mathbf{u}),
cos(𝐰1𝐮),,cos(𝐰D𝐮)]\displaystyle\mathrm{cos}(\mathbf{w}_{1}^{\top}\mathbf{u}),\cdots,\mathrm{cos}(\mathbf{w}_{D}^{\top}\mathbf{u})]^{\top} (34)

From (32) and taking expectation from both sides of (33) with respect to 𝐖\mathbf{W}, we can conclude that κ^(.)\hat{\kappa}(.) is unbiased. Moreover, the variance of κ^(.)\hat{\kappa}(.) is proportional with 1/D1/D, which implies that it tends to zero when the number of random features tends to infinity [33].

By substituting (33) to (30), we have the following estimator

f^ij(𝐮i)=m=1Kαm(ij)𝚫𝐖(𝐮m)𝚫𝐖(𝐮i):=(𝜷(ij))𝚫𝐖(𝐮i)\displaystyle\hat{f}_{ij}(\mathbf{u}_{i})=\sum_{m=1}^{K}\alpha_{m}^{(ij)}\boldsymbol{\Delta}_{\mathbf{W}}^{\top}(\mathbf{u}_{m})\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}):=(\boldsymbol{\beta}^{(ij)})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i})
i=1,N,j𝒩i\displaystyle\quad\forall{i=1,\cdots{N}},\forall{j\in\mathcal{N}_{i}} (35)

From (IV-B), (𝜷(ij))(\boldsymbol{\beta}^{(ij)}) is obtained by solving the following problem [33]:

 min𝜷(ij)2D1Km=1K𝒞((𝜷(ij))𝚫𝐖(𝐮i(m)),ym)\displaystyle\underset{\boldsymbol{\beta}^{(ij)}\in\mathbb{R}^{2D}}{\text{ min}}\frac{1}{K}\sum_{m=1}^{K}\mathcal{C}((\boldsymbol{\beta}^{(ij)})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}^{(m)}),y_{m})
i=1,N,j𝒩i\displaystyle\forall{i=1,\cdots{N}},\forall{j\in\mathcal{N}_{i}} (36)

where 𝐮i(m)\mathbf{u}_{i}^{(m)} denotes the mm-th training sample. Also, we have:

𝒞((𝜷(ij))𝚫𝐖(𝐮i(m)),ym)=((𝜷(ij))𝚫𝐖(𝐮i(m))ym)\displaystyle\mathcal{C}((\boldsymbol{\beta}^{(ij)})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}^{(m)}),y_{m})=\mathcal{L}((\boldsymbol{\beta}^{(ij)})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}^{(m)})-y_{m})
+λ𝜷(ij)22\displaystyle+\lambda||\boldsymbol{\beta}^{(ij)}||_{2}^{2} (37)

where 𝜷(ij)22:=nkαn(ij)αk(ij)𝚫𝐖(𝐮n)𝚫𝐖(𝐮k)f2||\boldsymbol{\beta}^{(ij)}||_{2}^{2}:=\sum_{n}\sum_{k}\alpha_{n}^{(ij)}\alpha_{k}^{(ij)}\boldsymbol{\Delta}_{\mathbf{W}}^{\top}(\mathbf{u}_{n})\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{k})\simeq||f||_{\mathcal{H}}^{2}. The random feature approach can be employed in an online manner where the nodes estimate the missing values online. In this approach, in each iteration of the successive approach denoted by ll, each node solves optimization problem (IV-B) via online gradient descent [34] to update its parameters which is expressed as follows:

𝜷l+1(ij)=𝜷l(ij)ηl𝒞((𝜷l(ij))𝚫𝐖(𝐮i),𝐱jl1))\displaystyle\boldsymbol{\beta}^{(ij)}_{l+1}=\boldsymbol{\beta}^{(ij)}_{l}-\eta_{l}\nabla\mathcal{C}((\boldsymbol{\beta}^{(ij)}_{l})^{\top}\boldsymbol{\Delta}_{\mathbf{W}}(\mathbf{u}_{i}),\mathbf{x}_{j}^{l-1}))
i=1,N,j𝒩i\displaystyle\forall{i=1,\cdots{N}},\forall{j\in\mathcal{N}_{i}} (38)

where ηl\eta_{l} is the step-size of gradient descent. In (IV-B), ym=xjl1y_{m}=x_{j}^{l-1} is used. Then, if the ii-th node misses the jj-th node information (j𝒩ij\in\mathcal{N}_{i}), the ii-th node estimates the missing value via (IV-B). At the end of LL iteration, the complexity of (IV-B) is 𝒪(LD)\mathcal{O}({L}D).

Please note that for the scenario, where edge fluctuations happen in the first iteration, the estimators are trained offline for just finding 𝜷1(ij),j𝒩i\boldsymbol{\beta}^{(ij)}_{1},\forall{j\in\mathcal{N}_{i}}. Then, the offline trained parameters are used for initialization. For the rest of iterations, the parameters of the estimators are updated online via (IV-B). Finally, the proposed approach in this section can be used for also different types of graph filters such as FIR [2] or edge-variant graph filters [21].

V Conclusion

This paper proposes a new approach to compute the linear transformations as fast as possible by leveraging a set of successive different shift operators. To compute the desired transformation by a small number of iterations, a new optimization problem is proposed to design the shift operators. An exhaustive simulation study demonstrates that the proposed approach can effectively compute the desired transformation markedly faster than existing algorithms. Furthermore, this work studies the effect of edge fluctuations, which is overlooked by the existing works and a common issue in the wireless sensor networks due to the random links or node failures. To deal with that issue, an online learning method is proposed which enables nodes to estimate the missing values due to the edge fluctuations.

Appendix A Proof of Proposition 1

Proof.

From (22) and the fact that 𝐚22=tr(𝐚𝐚)||\mathbf{a}||_{2}^{2}=\mathrm{tr}(\mathbf{a}\mathbf{a}^{\top}), we have:

𝔼[||𝛀||22]=𝔼[tr((𝐳L+i=1L1(Πj=iL1𝐒j+1)𝐳i)\displaystyle\mathbb{E}[||\boldsymbol{\Omega}||_{2}^{2}]=\mathbb{E}[\mathrm{tr}((\mathbf{z}_{L}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i})
(𝐳L+k=1L1𝐳k(Πm=kL1𝐒m+1))]\displaystyle(\mathbf{z}^{\top}_{L}+\sum_{k=1}^{L-1}\mathbf{z}^{\top}_{k}(\Pi_{m=k}^{L-1}\mathbf{S}_{m+1})^{\top})] (39)

where 𝛀=𝐲𝐇𝐱\boldsymbol{\Omega}=\mathbf{y}-\mathbf{H}\mathbf{x}. By expanding (A), we have:

𝔼[||𝛀||22]=𝔼[tr(𝚿+i=1L1(Πj=iL1𝐒j+1)𝐳i𝐳i(Πm=iL1𝐒m+1)+\displaystyle\mathbb{E}[||\boldsymbol{\Omega}||_{2}^{2}]=\mathbb{E}[\mathrm{tr}(\boldsymbol{\Psi}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}\mathbf{z}_{i}^{\top}(\Pi_{m=i}^{L-1}\mathbf{S}_{m+1})^{\top}+
𝐳L𝐳L)]\displaystyle\mathbf{z}_{L}\mathbf{z}_{L}^{\top})] (40)

where 𝚿=i=1L1(Πj=iL1𝐒j+1)𝐳ik=1,kiL1𝐳k(Πm=kL1𝐒m+1)+𝐳Lk=1L1𝐳k(Πj=kL1𝐒j+1)+i=1L1(Πj=iL1𝐒j+1)𝐳i𝐳L\boldsymbol{\Psi}=\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}\sum_{k=1,k\neq{i}}^{L-1}\mathbf{z}_{k}^{\top}(\Pi_{m=k}^{L-1}\mathbf{S}_{m+1})^{\top}\\ +\mathbf{z}_{L}\sum_{k=1}^{L-1}\mathbf{z}_{k}^{\top}(\Pi_{j=k}^{L-1}\mathbf{S}_{j+1})^{\top}+\sum_{i=1}^{L-1}(\Pi_{j=i}^{L-1}\mathbf{S}_{j+1})\mathbf{z}_{i}\mathbf{z}_{L}^{\top}.

By using the cyclic property of the trace and the fact that trace is a linear operator, we have:

𝔼[𝛀22]=tr(𝔼[𝐳L𝐳L]+𝔼[𝚿]+k=1L1𝚼k𝚼k𝔼[𝐳k𝐳k])\displaystyle\mathbb{E}[||\boldsymbol{\Omega}||_{2}^{2}]=\mathrm{tr}(\mathbb{E}[\mathbf{z}_{L}\mathbf{z}^{\top}_{L}]+\mathbb{E}[\boldsymbol{\Psi}]+\sum_{k=1}^{L-1}\boldsymbol{\Upsilon}_{k}^{\top}\boldsymbol{\Upsilon}_{k}\mathbb{E}[\mathbf{z}_{k}\mathbf{z}_{k}^{\top}]) (41)

where 𝚼i=Πm=iL1𝐒m+1\boldsymbol{\Upsilon}_{i}=\Pi_{m=i}^{L-1}\mathbf{S}_{m+1}. Finally, by applying the inequality tr(𝐀𝐁)𝐀2tr(𝐁)\mathrm{tr}(\mathbf{A}\mathbf{B})\leq||\mathbf{A}||_{2}\mathrm{tr}(\mathbf{B}) (which holds for any square matrix 𝐀\mathbf{A} and positive semi-definite matrix 𝐁\mathbf{B} [35]) and the following properties of the spectral norm 𝐀+𝐁2=𝐀2+𝐁2||\mathbf{A}+\mathbf{B}||_{2}=||\mathbf{A}||_{2}+||\mathbf{B}||_{2}, 𝐀𝐁2𝐀2𝐁2||\mathbf{A}\mathbf{B}||_{2}\leq||\mathbf{A}||_{2}||\mathbf{B}||_{2}, we have

𝔼[𝛀22]tr(𝔼[𝚿])+2j=2LρLj+1tr(𝔼[𝐳j1𝐳j1])+\displaystyle\mathbb{E}[||\boldsymbol{\Omega}||_{2}^{2}]\leq\mathrm{tr}(\mathbb{E}[\boldsymbol{\Psi}])+2\sum_{j=2}^{L}\rho^{L-j+1}\mathrm{tr}(\mathbb{E}[\mathbf{z}_{j-1}\mathbf{z}_{j-1}^{\top}])+
tr(𝔼[𝐳L𝐳L])\displaystyle\mathrm{tr}(\mathbb{E}[\mathbf{z}_{L}\mathbf{z}_{L}^{\top}]) (42)

From 𝔼(𝐳j1𝐳j1)=𝚺zj1+𝐦zj1\mathbb{E}(\mathbf{z}_{j-1}\mathbf{z}_{j-1}^{\top})=\boldsymbol{\Sigma}_{z_{j-1}}+\mathbf{m}_{z_{j-1}} and (A), we have:

𝔼[𝛀22]𝔼[tr(𝚿)]+2j=2LρLj+1tr(𝚺zj1+𝐦zj1)+\displaystyle\mathbb{E}[||\boldsymbol{\Omega}||_{2}^{2}]\leq\mathbb{E}[\mathrm{tr}(\boldsymbol{\Psi})]+2\sum_{j=2}^{L}\rho^{L-j+1}\mathrm{tr}(\boldsymbol{\Sigma}_{z_{j-1}}+\mathbf{m}_{z_{j-1}})+
tr(𝚺zL+𝐦zL)\displaystyle\mathrm{tr}(\boldsymbol{\Sigma}_{z_{L}}+\mathbf{m}_{z_{L}}) (43)

References

  • [1] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, vol. 30, pp. 83–98, May 2013.
  • [2] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs:frequency analysis,” IEEE Trans. on Signal Processing, vol. 61, pp. 1644–1656, Apr 2013.
  • [3] S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević, “Discrete signal processing on graphs: Sampling theory,” IEEE Transactions on Signal Processing, vol. 63, no. 24, pp. 6510–6523, 2015.
  • [4] S. Safavi and U. A. Khan, “Revisiting finite-time distributed algorithms via successive nulling of eigenvalues,” vol. 22, pp. 54–57, Jan. 2015.
  • [5] T. Weerasinghe, D. Romero, C. Asensio-Marco, and B. Beferull-Lozano, “Fast decentralized subspace projection via graph filtering,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Calgary,Canada), pp. 4639–4643, Apr. 2018.
  • [6] S. Segarra, A. G. Marques, G. Leus, and A. Ribeiro, “Reconstruction of graph signals through percolation from seeding nodes,” IEEE Transactions on Signal Processing, vol. 64, no. 16, pp. 4363–4378, 2016.
  • [7] S. Barbarossa, G. Scutari, and T. Battisti, “Distributed signal subspace projection algorithms with maximum convergence rate for sensor networks with topological constraints,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Taipei,Taiwan), pp. 2893–2896, Apr. 2009.
  • [8] P. Di Lorenzo, S. Barbarossa, and S. Sardellitti, “Distributed signal processing and optimization based on in-network subspace projections,” IEEE Transactions on Signal Processing, vol. 68, pp. 2061–2076, 2020.
  • [9] C. A.-M. F. Camaro-Nogues, D. Alonso-Roman and B. Beferull-Lozano, “Reducing the observation error in a wsn through a consensus-based subspace projection,” in WCNC, (Shanghai,China), pp. 3643–3648, Apr. 2013.
  • [10] A. Y. Kibangou, “Graph laplacian based matrix design for finite-time distributed average consensus,” in 2012 American Control Conference (ACC), pp. 1901–1906, 2012.
  • [11] A. Sandryhaila, S. Kar, and J. M. F. Moura, “Finite-time distributed consensus through graph filters,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Florence, Italy), pp. 1080–1084, May 2014.
  • [12] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “Vertex-frequency analysis on graphs,” Applied and Computational Harmonic Analysis, vol. 40, no. 2, pp. 260–291, 2016.
  • [13] S. K. Narang and A. Ortega, “Perfect reconstruction two-channel wavelet filter banks for graph structured data,” IEEE Transactions on Signal Processing, vol. 60, no. 6, pp. 2786–2799, 2012.
  • [14] B. Girault, Signal Processing on Graphs - Contributions to an Emerging Field. PhD thesis, École Normale Supérieure de Lyon, 2015.
  • [15] D. Romero, M. Ma, and G. B. Giannakis, “Kernel-based reconstruction of graph signals,” IEEE Transactions on Signal Processing, vol. 65, no. 3, pp. 764–778, 2017.
  • [16] M. Onuki, S. Ono, M. Yamagishi, and Y. Tanaka, “Graph signal denoising via trilateral filter on graph spectral domain,” IEEE Transactions on Signal and Information Processing over Networks, vol. 2, no. 2, pp. 137–148, 2016.
  • [17] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear network operators,” IEEE Trans. Sig. Process., vol. 65, pp. 4117–4131, May 2017.
  • [18] S. Mollaebrahim, C. Asensio-Marco, D. Romero, and B. Beferull-Lozano, “Decentralized subspace projection in large networks,” in Global Conference on Signal and Information Processing (GlobalSIP), (Anaheim,USA), pp. 788–792, Feb. 2018.
  • [19] M. Zamalloa and B. Krishnamachari, “An analysis of unreliability and asymmetry in low-power wireless links,” ACM Trans. Sen. Netw., vol. 3, p. 1–34, June 2007.
  • [20] L. Sang, A. Arora, and H. Zhang, “On link asymmetry and one-way estimation in wireless sensor networks,” ACM Trans. Sen. Netw., vol. 6, p. 12–25, Mar. 2010.
  • [21] M. Coutino, E. Isufi, and G. Leus, “Advances in distributed graph filtering,” IEEE Transactions on Signal Processing, vol. 67, pp. 2320–2333, May. 2019.
  • [22] E. Isufi and G. Leus, “Distributed sparsified graph filters for denoising and diffusion tasks,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5865–5869, 2017.
  • [23] E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Filtering random graph processes over random time-varying graphs,” IEEE Transactions on Signal Processing, vol. 65, no. 16, pp. 4406–4421, 2017.
  • [24] F. Gama, E. Isufi, A. Ribeiro, and G. Leus, “Controllability of bandlimited graph processes over random time varying graphs,” IEEE Transactions on Signal Processing, vol. 67, no. 24, pp. 6440–6454, 2019.
  • [25] L. B. Saad and B. Beferull-Lozano, “Accurate graph filtering in wireless sensor networks,” IEEE Internet of Things Journal, pp. 1–1, 2020.
  • [26] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology inference from spectral templates,” IEEE Trans. on Signal and Information Processing over Networks,, vol. 3, p. 467–483, Sep 2017.
  • [27] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: Frequency analysis,” IEEE Trans. on Signal Processing, vol. 62, pp. 3042–3054, Jun 2014.
  • [28] Y. Xu and W. Yin, “A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion,” SIAM Journal on Imaging Sciences, vol. 6, no. 3, pp. 1758–1789, 2013.
  • [29] V. N. Vapnik, Statistical Learning Theory. Wiley-Interscience, 1998.
  • [30] B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.
  • [31] B. Schölkopf, R. Herbrich, and A. J. Smola, “A generalized representer theorem,” in Proceedings of the 14th Annual Conference on Computational Learning Theory, p. 416–426, 2001.
  • [32] A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Advances in Neural Information Processing Systems 20, pp. 1177–1184, 2008.
  • [33] Y. Shen, T. Chen, and G. B. Giannakis, “Random feature-based online multi-kernel learning in environments with unknown dynamics,” J. Mach. Learn. Res., vol. 20, p. 773–808, Jan. 2019.
  • [34] E. Hazan, “Introduction to online convex optimization,” Found. Trends Optim., vol. 2, p. 157–325, Aug. 2016.
  • [35] Sheng-De Wang, Te-Son Kuo, and Chen-Fa Hsu, “Trace bounds on the solution of the algebraic matrix riccati and lyapunov equation,” IEEE Transactions on Automatic Control, vol. 31, no. 7, pp. 654–656, 1986.