On Local Distributions in Graph Signal Processing

T. Mitchell Roddenberry, Fernando Gama,
Richard G. Baraniuk, Santiago Segarra All authors are with the Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA. SS and TMR are supported by NSF CCF-2008555.

(February 2022)

Abstract

Graph filtering is the cornerstone operation in graph signal processing (GSP). Thus, understanding it is key in developing potent GSP methods. Graph filters are local and distributed linear operations, whose output depends only on the local neighborhood of each node. Moreover, a graph filter’s output can be computed separately at each node by carrying out repeated exchanges with immediate neighbors. Graph filters can be compactly written as polynomials of a graph shift operator (typically, a sparse matrix description of the graph). This has led to relating the properties of the filters with the spectral properties of the corresponding matrix – which encodes global structure of the graph. In this work, we propose a framework that relies solely on the local distribution of the neighborhoods of a graph. The crux of this approach is to describe graphs and graph signals in terms of a measurable space of rooted balls. Leveraging this, we are able to seamlessly compare graphs of different sizes and coming from different models, yielding results on the convergence of spectral densities, transferability of filters across arbitrary graphs, and continuity of graph signal properties with respect to the distribution of local substructures.

1 Introduction

Graph filters are a fundamental building block in graph signal processing, where cascaded applications of a graph shift operator model diffusion on the nodes of a graph [1, 2]. The analogy between filtering in discrete time and filtering on graphs has led to a fruitful research direction, with applications including power systems [3], robotics [4], neuroscience [5], and recommender systems [6]. Due to their typical implementation as low-degree matrix polynomials, graph filters are local operators, where the output of a graph filter at a given node is strictly dependent on the connectivity structure and signal values on the node’s local neighborhood. This highlights an invariance property of graph filters, typically summarized by the property of permutation equivariance. However, the equivariance of graph filters is much stronger than not being sensitive to permutations of nodes. If the same filter is applied to two different graphs, and two nodes within each of those graphs have identical neighborhoods, then the graph filter output at those nodes will be identical as well [7]. Indeed, graph filters in their usual implementation are equivariant to local substructures, which have been shown to be of primary importance in real-world networks [8], often leading to useful properties such as scale invariance and robustness [9].

The importance of local substructures is an incipient development in the literature. This view has been considered before in the graph wavelet literature [10, 11, 12], where wavelet atoms are constructed by applying graph filters to impulse functions on each node in the graph, yielding a dictionary of atoms with known spectral properties that also exhibit (approximate) spatial localization. As a representational tool for graphs predating the development of graph signal processing, graph kernel methods have put forth the idea of graphs as bags of motifs [13], such as in the paper on graphlets [14]. Works such as [15, 16] have considered the extension of graph kernels for graphs with continuous labels, typically via a neighborhood aggregation or discretization approach. In [17], equivariance to local structures is proposed as a more useful invariant for graph neural networks, as opposed to permutation equivariance. For instance, [18] considers neural networks for graph classification that act on small subgraphs over an entire graph, and [19] considers how to design graph neural networks that can recognize structures more expressive than those of the Weisfeiler-Lehman test. The treatment of graphs as distributions of ego-networks in [20] was used to devise a novel loss function for training of graph neural networks based on the maximization of mutual information. Additionally, local structures at each node need not be deterministic, as [21] considers random walk features and defines the notion of an estimable parameter under such a model.

In this work, we develop machinery for reasoning about basic notions in graph signal processing, with the primacy of local substructures in mind. After reviewing basic definitions for graph signal processing in Section 2, we make the following theoretical contributions:

1.

We introduce rooted graphs and rooted graph filters as vehicles for describing localized graph filters. In doing so, we develop a measure-theoretic view of graph signal processing that considers distributions of rooted balls and signals supported on them (Section 3).
2.

Within the proposed framework, we illustrate how convergence of Fourier spectra of graph signals can be understood in terms of weak convergence of measures (Section 4).
3.

We apply the proposed framework to yield a principled understanding of the transferability of graph summaries via integral probability metrics (Section 5).
4.

To highlight the flexibility of the proposed approach, we extend the relationship between distributions of local graph structures and the Fourier spectra of graph signals to weighted graphs (Section 6).
5.

We identify graphings and signals supported on them as the appropriate limiting objects in the proposed framework, and develop basic notions of graphing signal processing. We prove that the proposed framework applies directly to graphing signal processing in a natural way (Section 7), yielding a suitable spectral theory.

2 Graph Signal Processing

Consider an undirected graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ where $\mathcal{V}$ is the set of nodes and $\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}$ is the set of edges. Since the graph is undirected, it holds that if edge $(u,v)\in\mathcal{E}$ for $u,v\in\mathcal{V}$ , then $(v,u)\in\mathcal{E}$ . We further assume the graph to have no self-loops, i.e., $(v,v)\notin\mathcal{E}$ for all $v\in\mathcal{V}$ , and to be unweighted. An extension to weighted graphs can be found in Section 6.

The neighborhood $\mathcal{N}(\mathcal{V}^{\prime})$ of a given collection of nodes $\mathcal{V}^{\prime}\subseteq\mathcal{V}$ is defined as

\mathcal{N}(\mathcal{V}^{\prime})=\{u\in\mathcal{V}:(u,v)\in\mathcal{E}\text{ for some }v\in\mathcal{V}^{\prime}\}\cup\mathcal{V}^{\prime}.

(1)

That is, the neighborhood of a collection of nodes is that set of nodes $\mathcal{V}^{\prime}$ as well as those nodes immediately connected to it. The $k$ -hop neighborhood $\mathcal{N}^{k}(\mathcal{V}^{\prime})$ can then be conveniently defined in a recursive manner as $\mathcal{N}^{k}(\mathcal{V}^{\prime})=\mathcal{N}(\mathcal{N}^{k-1}(\mathcal{V}^{\prime}))$ for integers $k\geq 1$ and with $\mathcal{N}^{0}(\mathcal{V}^{\prime})=\mathcal{V}^{\prime}$ . Note that the $k$ -hop neighborhood of a singleton $\mathcal{V}^{\prime}=\{v\}$ is denoted as $\mathcal{N}^{k}(v)$ , and that its degree can be easily computed as $\deg(v)=|\mathcal{N}(v)|-1$ .

Graph signals can be associated with a graph structure and are defined as a map $\mathbf{x}:\mathcal{V}\to\mathbb{R}$ between the node set $\mathcal{V}$ and the real numbers $\mathbb{R}$ . That is, a graph signal simply attaches a single real number $[{\mathbf{x}}]_{v}\in\mathbb{R}$ to each node of the graph $v\in\mathcal{V}$ . For a given graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , we can define the space of graph signals as $\mathbb{X}(\mathcal{G})=\{\mathbf{x}:\mathcal{V}\to\mathbb{R}\}$ .

A graph shift operator (GSO) $\mathbf{S}$ can also be associated with the graph structure $\mathcal{G}$ as a means of relating graph signals explicitly with the underlying graph support. More precisely, the GSO is defined as a linear operator between graph signals $\mathbf{S}:\mathbb{X}(\mathcal{G})\to\mathbb{X}(\mathcal{G})$ such that the output graph signal $\mathbf{y}=\mathbf{S}\mathbf{x}$ is computed as

[{\mathbf{y}}]_{v}=\sum_{u\in\mathcal{N}(v)}{[\mathbf{S}]}_{vu}[{\mathbf{x}}]_{u}.

(2)

The operation (2) shifts the signal around the graph and is analogous to the time-shift in discrete-time signal processing. Note that the GSO can be completely characterized by specifying the adjacency rule that assigns the values ${[\mathbf{S}]}_{vu}$ for every $(u,v)\in\mathcal{E}$ . Examples of GSOs that have found widespread use in the literature include the adjacency matrix, the Laplacian matrix, the Markov transition matrix, and their normalized counterparts. While it is technically possible to design adjacency rules that lead to arbitrary values of ${[\mathbf{S}]}_{vu}$ , we only consider rules that are determined exclusively by the combinatorial structure of the graph.

A graph filter can then be built from the GSO. More specifically, given a collection of $(K+1)$ scalar coefficients $\{h_{k}\}_{k=0}^{K}$ , a $K$ -tap graph filter $\mathcal{H}(\mathbf{S}):\mathbb{X}(\mathcal{G})\to\mathbb{X}(\mathcal{G})$ is defined as the linear mapping between two graph signals $\mathbf{y}=\mathcal{H}(\mathbf{S})\mathbf{x}$ given by

\mathcal{H}(\mathbf{S})=\sum_{k=0}^{K}h_{k}\mathbf{S}^{k}

(3)

where $\mathbf{S}^{k}$ denotes $k$ repeated applications of the GSO (see (2)) to the input signal $\mathbf{x}$ . The operator $\mathbf{S}^{0}$ is understood to be the identity map on $\mathbb{X}(\mathcal{G})$ .

Graph filters are linear and local operators. They are linear in the input graph signal $\mathbf{x}$ . They are local in that they only require information up to the $K$ -hop neighborhood. More specifically, the output of the graph filter at each node $[{\mathbf{y}}]_{v}$ can be computed by $K$ repeated exchanges of information with one-hop neighbors $\mathcal{N}(v)$ , thus collecting the signal values contained in up to the $K$ -hop neighborhood $\mathcal{N}^{K}(v)$ . Each node can compute their output separately from the rest. The key observation is that the nodes need not know the global structure of the whole graph $\mathcal{G}$ but only their local neighborhood. Thus, graph filters can be analyzed and understood entirely from looking at this local neighborhood structure.

3 A Local Framework for GSP

Figure 1: The space

\Omega_{K}

of rooted

K

-balls and signals, constructed via the disjoint union of signal spaces enumerated by rooted

K

-balls. (Left) A graph on nodes

\{a,b,\ldots,h\}

, with graph signal denoted by the node coloring. (Right) The space

\Omega_{1}

, and the distibution

\mu

of points obtained via the pushforward of

\Sigma_{1}

. Observe that nodes

\{a,d,g,h\}

all have isomorphic rooted

1

-balls, and thus all correspond to points in

\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1})

(depicted here in random positions within Euclidean space). Similarly, nodes

\{b,e\}

are mapped to points in

\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2})

, and

\{c,f\}

are mapped to points in

\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{4})

. Since the given graph is triangle-free, no points are mapped to

\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{3})

The local nature of graph filters calls for a local framework to analyze their effects. Towards this end, define a rooted $K$ -ball $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(r)=(\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K},\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{E}$\kern-1.00006pt}}}_{K},r)$ as a graph with a root $r\in\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K}$ such that all nodes are at most $K$ -hops away from the root, i.e., $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K}=\mathcal{N}^{K}(r)$ . Note that, since the rooted $K$ -ball $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(r)$ is a graph, the notions of graph shift operator and graph signal are immediately well-defined. We denote these as $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}(r)$ and $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(r)$ , respectively. For ease of exposition, the specification of the root node will be dropped, unless necessary to avoid confusion.

Of particular interest is the case of rooted $K$ -balls obtained as the induced subgraph of the $K$ -hop neighborhood of a node. More specifically, given a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , a rooted $K$ -ball $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}$ can be constructed by selecting a node $r\in\mathcal{V}$ , setting $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K}\equiv\mathcal{N}^{K}(r)$ and $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{E}$\kern-1.00006pt}}}_{K}\equiv\mathcal{E}\cap\mathcal{N}^{K}(r)\times\mathcal{N}^{K}(r)$ . In this context, the graph signal $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}$ corresponding to the rooted $K$ -ball can be tied to the graph signal $\mathbf{x}$ defined on $\mathcal{G}$ by setting $[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}]_{v}=[{\mathbf{x}}]_{v}\ \text{for all}\ v\in\mathcal{N}^{K}(r)$ .

We can now formalize the notion that a graph filter only relies on local information.

Proposition 1.

Let $\mathcal{G}=(\mathcal{V},\mathcal{E})$ be a graph with graph signal $\mathbf{x}$ and a GSO $\mathbf{S}$ . For a node $r\in\mathcal{V}$ , denote the corresponding rooted $K$ -ball as $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}$ , with graph signal $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}$ and GSO $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}$ , respectively. Then, for any $K$ -tap filter $\mathcal{H}$ , the following equality holds

\big{[}\mathcal{H}(\mathbf{S})\mathbf{x}\big{]}_{r}=\big{[}\mathcal{H}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K})\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}\big{]}_{r}.

(4)

We note that for (4) to hold, the adjacency rule used to construct $\mathbf{S}$ and $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}$ has to be the same, and it is assumed to depend only on the combinatorial structure of the graphs; see Section 2.

Proposition 1 formalizes the well-known fact that a $K$ -tap graph filter evaluated at a node only depends on the subgraph of $K$ -hops centered at that node (i.e., the ego-network of the node [22]). In this sense, it is not necessary to understand how a graph filter behaves on the global structure of a graph. Rather, one only needs to understand the behavior of a graph filter locally, in particular on the rooted $K$ -balls of the graph. This naturally leads to a weaker form of the property of permutation equivariance, discussed next.

For two graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ and $\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},\mathcal{E}^{\prime})$ , we say that a map $\phi:\mathcal{V}\to\mathcal{V}^{\prime}$ is a $K$ -morphism if it preserves the structure of rooted $K$ -balls for all nodes in $\mathcal{V}$ . The structure of two rooted $K$ -balls $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}=(\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K},\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{E}$\kern-1.00006pt}}}_{K},r)$ and $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,^{\prime}_{K}=(\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}^{\prime}_{K},\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{E}$\kern-1.00006pt}}}^{\prime}_{K},r^{\prime})$ with signals $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}$ and $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}^{\prime}$ , respectively, is preserved, if there exists an isomorphism $\psi:\mathcal{V}_{K}\to\mathcal{V}^{\prime}_{K}$ such that $\psi(r)=r^{\prime}$ , $(\psi(u),\psi(v))\in\mathcal{E}^{\prime}_{K}$ if and only if $(u,v)\in\mathcal{E}_{K}$ , and $[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}]_{v}=[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}^{\prime}]_{\psi(v)}$ for all $v\in\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K}$ . Then, a $K$ -morphism $\phi:\mathcal{V}\to\mathcal{V}^{\prime}$ between two graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ and $\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},\mathcal{E}^{\prime})$ with associated signals $\mathbf{x}$ and $\mathbf{x}^{\prime}$ , respectively, satisfies that the rooted $K$ -balls $(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v))$ and $(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}^{\prime}(\phi(v)),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}^{\prime}(\phi(v)))$ are isomorphic for all nodes $v\in\mathcal{V}$ . Note that $K$ -morphisms are not necessarily invertible, nor are they necessarily surjective.

Graph filters are invariant under $K$ -morphisms, as it follows from Proposition 1, where we see that the same filter applied to two different graphs yields the same result if the local structures are the same.

Corollary 1.

Consider two graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ and $\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},\mathcal{E}^{\prime})$ with corresponding signals $\mathbf{x}$ and $\mathbf{x}^{\prime}$ . Denote by $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v)$ the $K$ -rooted ball for root $v\in\mathcal{V}$ , and by $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}(v)$ and $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v)$ the corresponding GSO and graph signal, respectively. If there exists a $K$ -morphism $\phi:\mathcal{V}\to\mathcal{V}^{\prime}$ , then for any $K$ -hop filter $\mathcal{H}$ it holds that

\Big{[}\mathcal{H}\Big{(}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}\big{(}v\big{)}\Big{)}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}\big{(}v\big{)}\Big{]}_{v}=\Big{[}\mathcal{H}\Big{(}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}\big{(}\phi(v)\big{)}\Big{)}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}\big{(}\phi(v)\big{)}\Big{]}_{\phi(v)}

(5)

for all $v\in\mathcal{V}$ .

Note that Corollary 1 is a generalization of the permutation equivariance property that graph filters have [7]. More generally, Corollary 1 opens up ways to compare the performance of a fixed graph filter across two different graphs with different associated graph signals. It would be expected then, that if two graphs have similar distributions of rooted $K$ -balls with associated signal, then a fixed graph filter acting on one should yield similar results on the other. We formalize these notions in the ensuing discussion.

3.1 Distributions of Rooted Balls

Proposition 1 states that the output of a $K$ -tap graph filter at each node depends only on the rooted $K$ -ball at that node and the corresponding graph signal values. Therefore, to characterize the effect of graph filtering, it suffices to characterize the distribution of rooted $K$ -balls of a graph. Towards this end, we will construct a sample space, $\sigma$ -field, and a corresponding measure to describe this distribution.

Denote by $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k}$ a rooted $k$ -ball and by $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k})$ its corresponding signal space. Consider now the space constructed by the disjoint union of all signal spaces on all possible rooted $k$ -balls (up to isomorphism) with $k\leq K$ , given by¹¹1In this context, each signal space $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k})$ is considered modulo the action of the automorphism group of $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k}$ . This is clarified in Appendix A.

\Omega_{K}=\coprod_{\,\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathcal{G}$\kern-0.70004pt}}}\,_{k}:k\leq K}\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k}).

(6)

The construction of the space $\Omega_{K}$ is illustrated in Fig. 1. Essentially, one considers a different space $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ (typically, $|\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}|$ -dimensional vectors $\mathbf{x}$ ) for each possible rooted $K$ -ball $\mathcal{G}$ , and then puts them all into a common space where each original space is isolated. An element $\omega\in\Omega_{K}$ is described by both the rooted $K$ -ball and the graph signal space it indexes, i.e., $\omega=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,))$ . Making the identification $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)\equiv\mathbb{R}^{|\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathcal{V}$\kern-0.70004pt}}}|}$ , the point $\omega$ can be thought as a $|\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}|$ -dimensional vector associated with the rooted $K$ -ball $\mathcal{G}$ . Note that, while it may happen that $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)\cong\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,^{\prime})$ because $|\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}|=|\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}^{\prime}|$ , these are different component spaces in the disjoint union $\Omega_{K}$ as they are indexed by different rooted balls $\mathcal{G}$ and $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,^{\prime}$ , respectively.

To define the corresponding $\sigma$ -field $\sigma(\Omega_{K})$ associated with $\Omega_{K}$ , note that the disjoint union of graph signal spaces (i.e., Euclidean spaces) has a natural topology defined over it, namely the disjoint union topology. Thus, the Borel $\sigma$ -field $\sigma(\Omega_{K})$ generated from the topology of $\Omega_{K}$ is a proper $\sigma$ -field. Intuitively, measurable sets in $\Omega_{K}$ are constructed by selecting some rooted $K$ -ball $\mathcal{G}$ and a Borel set in the space of graph signals $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ , then considering the corresponding set in $\Omega_{K}$ .

A measure $\mu$ can be properly defined now that the $\sigma$ -field associated with $\Omega_{K}$ has been constructed. Let $\Sigma_{K}:\mathcal{V}\to\Omega_{K}$ be a map that takes a node $v\in\mathcal{V}$ and returns the corresponding rooted $K$ -ball and signal, i.e.,

\Sigma_{K}(v)=\big{(}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v)\big{)}.

(7)

Note that $\Sigma_{K}(v)\in\Omega_{K}$ . Then, for a Borel set $\mathcal{A}\in\sigma(\Omega_{K})$ , the measure $\mu:\sigma(\Omega_{K})\to[0,1]$ is defined as

\mu(\mathcal{A})=\frac{1}{|\mathcal{V}|}\big{|}\big{\{}v\in\mathcal{V}:\Sigma_{K}(v)\in\mathcal{A}\big{\}}\big{|}.

(8)

Recall that each element in $\Omega_{K}$ can be thought of as a rooted $K$ -ball with an associated signal. Formally, $\mu$ is the pushforward of the uniform measure on the nodes of $\mathcal{G}$ by the sampling map $\Sigma_{K}$ , which can be denoted as $\mu=(\Sigma_{K})_{*}(\mathcal{G},\mathbf{x})$ [23]. We observe that any measure on the nodes of the graph can be used to replace the RHS of (8).

The space $\Omega_{K}$ serves as a natural domain in which to characterize $K$ -tap graph filters. Indeed, Proposition 1 indicates that the behavior of a $K$ -tap graph filter can be fully characterized by its behavior over $\Omega_{K}$ . That is to say, any given $K$ -tap graph filter $\mathcal{H}$ has a corresponding map $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{H}$\kern-1.00006pt}}}\,:\Omega_{K}\to\mathbb{R}$ such that for any graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with GSO $\mathbf{S}$ and signal $\mathbf{x}$ , (4) can be rewritten as:

\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{H}$\kern-1.00006pt}}}\,(\Sigma_{K}(r))=[\mathcal{H}(\mathbf{S})\mathbf{x}]_{r}.

(9)

To use the terminology of quotient spaces [24], (9) indicates that a $K$ -tap graph filter passes, or descends, to a unique map on $\Omega_{K}$ via the map $\Sigma_{K}$ . Thus, in the context of Proposition 1 and graph filtering, it is sufficient to characterize local operations over $\Omega_{K}$ , rather than with respect to the global structure of a given graph $\mathcal{G}$ .

Now that the local framework based on rooted balls has been introduced, we proceed to show how it can be leveraged to obtain theoretical results that provide insight into the inner workings of graph signal processing. In Section 4 we discuss how to understand power spectrum densities, while in Section 5 we discuss transferability of graph filters across graphs of different size.

Remark 1 (Distributions of edge sets).

In [25], the authors consider a pushforward probability measure into the Fourier domain when there is an underlying probability distribution of edges in a fixed graph. This allows for the modeling of graph signals under uncertainty on the edge set. Such a distribution of edges can be incorporated into the distribution $\mu$ via the pushforward map $(\Sigma_{K})_{*}$ , allowing for a description of the distribution of rooted balls in a graph under a probability distribution on the edges.

Remark 2 (Rooted subtrees).

Many authors have considered graph processing methods that are equivariant to local subtree structures [19], such as architectures whose computation resembles the $1$ -Weisfeiler-Lehman test[13]. In the proposed framework, these subtree structures are strictly “less expressive” than rooted ball structures, in the sense that the rooted subtree of depth $K$ centered at a given node can be determined completely by the rooted $K$ -ball centered at that node. However, the rooted $K$ -ball cannot be determined by the rooted subtree of depth $K$ , as evidenced by the inability of the $1$ -Weisfeiler-Lehman test to distinguish certain structures [26]. With this in mind, the proposed framework could certainly be applied where the space $\Omega_{K}$ is constructed with rooted subtrees, rather than rooted balls. However, given that these rooted balls are more expressive than rooted subtrees [19], we state all proceeding results using the rooted ball construction.

4 Power Spectral Density

One of the key ideas in graph signal processing is that of the graph Fourier transform of a graph signal [2]. For a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ on $n$ nodes, consider the GSO $\mathbf{S}$ to be the graph Laplacian, which is a positive semidefinite Hermitian matrix. Thus, it admits the following eigendecomposition

\mathbf{S}=\sum_{j=1}^{n}\lambda_{j}\mathbf{u}_{j}\mathbf{u}_{j}^{\mathsf{T}},

(10)

for eigenvalues $0=\lambda_{1}\leq\ldots\leq\lambda_{n}$ with corresponding pairwise orthogonal eigenvectors $\mathbf{u}_{1},\ldots,\mathbf{u}_{n}$ . One can check that for any of these eigenvectors, $\langle\mathbf{u}_{j},\mathbf{S}\mathbf{u}_{j}\rangle=\lambda_{j}$ . The graph Laplacian induces a quadratic form on a graph signal that measures a useful notion of smoothness [2]. For a given graph signal $\mathbf{x}$ on $\mathcal{G}$ , the quadratic form can be shown to be equal to

\langle\mathbf{x},\mathbf{S}\mathbf{x}\rangle=\sum_{v\in\mathcal{V}}\sum_{u\in\mathcal{N}(v)}\left([{\mathbf{x}}]_{v}-[{\mathbf{x}}]_{u}\right)^{2}.

(11)

That is, the quadratic form of the graph Laplacian measures the sum of the squared differences of the graph signal at neighboring pairs of nodes. A signal is called smooth if the quadratic form takes a small value relative to its norm. For instance, the eigenvectors of $\mathbf{S}$ with small corresponding eigenvalues are smooth signals.

We note that this notion of smoothness can be extended to normalized versions of the graph Laplacian in a similar fashion. Furthermore, a notion of smoothness can also be defined using the adjacency matrix [27]. In any case, for ease of exposition and conceptual simplicity, we assume that $\mathbf{S}$ is the graph Laplacian throughout this section, but we remark that the results derived herein are also valid for any other GSO built from adjacency rules that rely on the combinatorial structure of the graph.

Since the eigenvectors $\mathbf{u}_{1},\ldots,\mathbf{u}_{n}$ form an orthonormal basis for $\mathbb{R}^{n}$ , any graph signal $\mathbf{x}$ admits a unique representation as a linear combination of the eigenvectors of $\mathbf{S}$ . That is, for any graph signal $\mathbf{x}$ , there exists a set of coefficients $\{\tilde{x}_{j}\}_{j=1}^{n}$ such that the following holds:

	$\displaystyle\mathbf{x}$	$\displaystyle=\sum_{j=1}^{n}\tilde{x}_{j}\mathbf{u}_{j}$		(12)
	$\displaystyle\langle\mathbf{x},\mathbf{S}\mathbf{x}\rangle$	$\displaystyle=\sum_{j=1}^{n}\lambda_{j}\tilde{x}_{j}^{2}.$		(13)

Thus, the representation of a graph signal as a weighted sum of the eigenvectors can be used to conveniently compute the quadratic form in terms of the spectrum. In an analogy to complex exponential functions being the eigenfunctions of the Laplace operator in discrete-time signal processing, we call the representation of the graph signal $\mathbf{x}$ by the coefficients $\tilde{x}_{j}$ the graph Fourier transform (GFT).

When coupled with their corresponding eigenvalues $\lambda_{j}$ , the coefficients $\tilde{x}_{j}$ can be used to describe the distribution of energy in a graph signal, in a way concordant with the notion of smoothness described by the Laplacian quadratic form (11). To capture this, define the normalized power spectral distribution of a graph signal . Given a graph signal $\mathbf{x}$ with Fourier coefficients $\tilde{x}_{j}$ and corresponding Laplacian eigenvalues $0=\lambda_{1}\leq\ldots\leq\lambda_{n}$ , define

	$\displaystyle P_{\mathbf{x}}:\mathbb{R}$	$\displaystyle\to\mathbb{R}^{+}$		(14)
	$\displaystyle\lambda$	$\displaystyle\mapsto\frac{1}{n}\sum_{j:\lambda_{j}\leq\lambda}\tilde{x}_{j}^{2}.$		(14)

One can easily see that $P_{\mathbf{x}}$ is a monotone, right-continuous function, with $P_{\mathbf{x}}(\lambda)=0$ for $\lambda<0$ , and at most $n$ discontinuities (one at each eigenvalue $\lambda_{j}$ ). Moreover, there is a convenient expression for the moments of $P_{\mathbf{x}}$ :

\mathbf{m}_{K}(\mathbf{x})\coloneqq\int_{\mathbb{R}}\lambda^{K}\mathrm{d}P_{\mathbf{x}}(\lambda)=\frac{1}{n}\langle\mathbf{x},\mathbf{S}^{K}\mathbf{x}\rangle.

(15)

That is to say, the moments of the normalized power spectral distribution are given by scaled quadratic forms of powers of the GSO. Observe that the quadratic form (11) can be expressed as $\langle\mathbf{x},\mathbf{S}\mathbf{x}\rangle=n\cdot\mathbf{m}_{1}(\mathbf{x})$ . Indeed, the moments of the graph Fourier transform are an essential quantity in the design of low-order graph filters [28]. This points to a very useful idea: the powers of the GSO are the most primitive types of graph filters, so the proposed local framework provides a path for understanding these moments in terms of functions on the space of distributions of rooted balls.

To see this, let us examine (15). Notice that

\frac{1}{n}\langle\mathbf{x},\mathbf{S}^{K}\mathbf{x}\rangle=\frac{1}{n}\sum_{v\in\mathcal{V}}[{\mathbf{x}}]_{v}[\mathbf{S}^{K}\mathbf{x}]_{v}.

(16)

As before, let $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}(v)$ be the GSO of the $K$ -ball rooted at node $v$ . By Proposition 1, the above equation can be written as

\frac{1}{n}\sum_{v\in\mathcal{V}}[{\mathbf{x}}]_{v}[\mathbf{S}^{K}\mathbf{x}]_{v}=\frac{1}{n}\sum_{v\in\mathcal{V}}[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}]_{v}[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}^{K}(v)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v)]_{v}.

(17)

That is, the moments of the normalized power spectral distribution of a graph signal can be written as an average of operations that resemble local filtering. In the proposed local framework, this has the following interpretation.

Proposition 2.

Let $\mathcal{G}$ be a graph and $\mathbf{x}$ be a graph signal. For a given $K\geq 0$ , define the map

	$\displaystyle\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}:\Omega_{K}$	$\displaystyle\to\mathbb{R}$		(18)
	$\displaystyle((\mathcal{V},\mathcal{E},r),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{y}$\kern-1.00006pt}}}\,)$	$\displaystyle\mapsto[{\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{y}$\kern-1.00006pt}}}\,}]_{r}[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}^{K}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{y}$\kern-1.00006pt}}}\,]_{r},$		(19)

where $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}$ denotes the Laplacian of the rooted $K$ -ball.

Letting $\mu$ be the probability distribution on $\Omega_{K}$ determined by $(\mathcal{G},\mathbf{x})$ , the $K^{th}$ moment of the normalized power spectral distribution $P_{\mathbf{x}}$ is given by the equation

\mathbf{m}_{K}(\mathbf{x})=\int_{\Omega_{K}}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}(\omega)\mathrm{d}\mu(\omega)=\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}].

(20)

Having reduced the moments of the normalized power spectral distribution to an average over the distribution of rooted balls, we can now reason about the convergence of graph Fourier distributions in terms of weak convergence of measure for graphs with bounded degree $D$ . Although the limit of dense graphs has been considered before via graphon models [29, 30], this cannot capture the behavior of bounded degree graphs, as all infinite sequences of growing graphs of bounded degree converge to the zero graphon [31]. However, in the case of sparse graphs, understanding the descension of maps to the space $\Omega_{K}$ is a feasible approach, with the following compactness property.

Lemma 1.

For given integers $K\geq 0,D\geq 1$ and any collection of graphs in $\mathbb{G}_{D}$ with uniformly bounded signals, there exists a compact subspace $\Gamma\subset\Omega_{K}$ such that the support of the probability measures corresponding to the graphs/graph signals is contained in $\Gamma$ .

The proof of Lemma 1 can be found in Appendix C. The constraint of a graph having bounded degree, for instance coming from physical constraints in a real-world system, corresponds to a compact description in the space $\Omega_{K}$ . From this, the following convergence result holds.

Theorem 1.

Let integers $K\geq 0,D\geq 1$ be given. Let $\{(\mathcal{G}_{j},\mathbf{x}_{j})\}_{j=1}^{\infty}$ be a sequence of graphs and graph signals such that each $\mathcal{G}_{j}\in\mathbb{G}_{D}$ and the collection of signals $\mathbf{x}_{j}$ is uniformly bounded. Let $\mu_{j}$ be the probability measure on $\Omega_{K}$ associated with $(\mathcal{G}_{j},\mathbf{x}_{j})$ for each $j$ , and $P_{j}$ the corresponding normalized power spectral distribution function.

If the measures $\mu_{j}$ converge weakly, then the $K$ th moments of the normalized power spectral distribution functions converge. Moreover, if weak convergence of measure holds for all $K\geq 0$ , then the normalized power spectral distribution functions converge weakly.

The proof of Theorem 1 can be found in Appendix B. Theorem 1 casts the power spectral density of a graph signal in terms of the distribution of local structures in the graph. In particular, we treat the power spectral density as a function from graphs with graph signals to Borel measures on a compact subset of the real line, then prove that this function is continuous with respect to the distribution of rooted balls in the graph, where continuity is understood to be with respect to the weak topology for both the distribution in $\Omega_{K}$ and the power distribution of the signal. Although the frequency domain representation of a graph signal is typically understood to be a representation in terms of the global graph structure, owing to the global support of the graph Laplacian’s eigenvectors, this results indicates that it is in essence still subject to fundamentally local phenomena, namely the rooted balls in the graph.

This characterization of the power spectral distribution in terms of the distribution of rooted balls also reflects the local nature of graph filtering. Indeed, as discussed in [28], the moments of the power spectral distribution are key in the understanding of graph filters. More broadly, when designing a graph filter, we often aim to construct a matched filter in the frequency domain, so that the response of the filter aligns with that of the signal, subject to some noise model (additive white Gaussian noise, for instance). Additionally, constraints such as a filter having $K$ taps reduces the space of possible frequency responses to degree $K$ polynomials in the frequency domain, so that the moments become the basic values of interest. This is an immediate consequence of graph filters admitting a distributed implementation [28], so that the expression of a graph filter’s performance in terms of the power spectrum of the signal is actually a distillation of the distribution of rooted balls in $\Omega_{K}$ .

We have shown how connecting the moments of the power spectral distribution to the space $\Omega_{K}$ yields insights about how features of graphs and graph signals are dependent only on localized information. In particular, a function $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}:\Omega_{K}\to\mathbb{R}$ was constructed whose expected value under the pushforward measure of a graph computes the moments of the power spectral distribution. This machinery can be immediately extended to other functions on graphs that act locally, allowing us to compare the behavior of a given function on two graphs in terms of their distributions of rooted balls.

5 Transferability

Let $J$ be a function mapping a graph and its corresponding graph signal to a real number. This function is typically known as a graph summary and examples include the graph power spectral distribution discussed in the previous section, as well as the average node degree or node centrality values [32], the clustering coefficient [33], or the conductance [34]. We can also consider cost functions for a given task as graph summaries, since they ultimately take a graph and its signal as input, and output a real number [35, 36]. In this context, we want to study how the function $J$ changes across two different graphs with different graph signals $(\mathcal{G}_{1},\mathbf{x}_{1})$ and $(\mathcal{G}_{2},\mathbf{x}_{2})$ .

To study the transference of the graph summary $J$ across different graphs and graph signals, let us assume that there exists a function $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}:\Omega_{K}\to\mathbb{R}$ such that for any pair $(\mathcal{G},\mathbf{x})$ whose associated probability distribution on $\Omega_{K}$ is denoted by $\mu$ , we have $J(\mathcal{G},\mathbf{x})=\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]$ . This assumption is not too severe, essentially saying that the graph summary is an average over a function on the nodes, with the value at each node determined strictly by the structure of its local neighborhood. Then, consider the transfer equation

\left|J(\mathcal{G}_{1},\mathbf{x}_{1})-J(\mathcal{G}_{2},\mathbf{x}_{2})\right|=\left|\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\right|.

(21)

as a way of measuring the transferability of $J$ , and where $\mu$ and $\nu$ are the distributions of rooted balls on the space $\Omega_{K}$ of the graphs $(\mathcal{G}_{1},\mathbf{x}_{1})$ and $(\mathcal{G}_{2},\mathbf{x}_{2})$ , respectively.

To understand (21) in a quantitative manner demands a stronger structure on $\Omega_{K}$ than the disjoint union topology. To this end, we then endow $\Omega_{K}$ with the structure of a metric space, in a way that preserves the original topology. For an arbitrary constant $C>0$ , define a metric $d_{C}$ on $\Omega_{K}$ as follows. For $\omega_{1}=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1},\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{1})$ and $\omega_{2}=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2},\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{2})\in\Omega_{K}$ , put

d_{C}(\omega_{1},\omega_{2})=\begin{cases}\min\{\|\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{1}-\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{2}\|_{2},2C\}&\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1}=\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2}\\ C&\text{otherwise},\end{cases}

(22)

where $\|\cdot\|_{2}$ is inherited from the identification of $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ with Euclidean space.²²2This metric is again understood modulo the automorphism group of the rooted ball, discussed in Appendix A. Note that $\|\cdot\|_{2}$ is well defined since both $\mathbf{x}_{1}$ and $\mathbf{x}_{2}$ are vectors in the same $|\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}|$ -dimensional space when $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1}=\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2}$ . One can check that $d_{C}$ is indeed a metric on $\Omega_{K}$ for any $C>0$ . The metric $d_{C}$ is constructed in a way that preserves the local metric structure of each signal space $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ , while also endowing the whole space $\Omega_{K}$ with a global metric structure, at the cost of some local distortion in order to maintain the triangle inequality, as dictated by the constant $C$ . When $C$ is chosen to be very large, the local metric structure on each rooted ball of the space is well-preserved, but these rooted balls are otherwise kept “far apart” from each other. On the other hand, when $C$ is very small, the rooted balls become “close,” but the local metric structure on each of them is distorted (i.e. signals belonging to the same rooted ball that are beyond a distance of $2C$ are indistinguishable from signals that belong to different rooted balls). Importantly, the metric topology on $\Omega_{K}$ induced by $d_{C}$ is equal to the disjoint union topology (6), so that all notions of continuity and weak convergence of measure are preserved.

Given the additional metric structure on $\Omega_{K}$ , we can quantitatively characterize the transfer equation (21) by comparing distributions of rooted balls on the metric space $(\Omega_{K},d_{C})$ . To do so, we make the following assumption on the function $J$ .

Assumption 1.

For all rooted $K$ -balls $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,\in\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D}$ , $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,)$ is $L$ -Lipschitz continuous with respect to the second argument $\mathbf{x}$ over the space of bounded signals $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,[-1,1])$ .

This is not a difficult assumption to satisfy. For instance, it is sufficient for $J$ to be continuously differentiable on each signal space for this to hold. In fact, the set of Lipschitz continuous functions on a compact space is dense in the set of continuous functions with respect to the uniform norm [37].

Theorem 2.

Let graphs $\mathcal{G}_{1},\mathcal{G}_{2}\in\mathbb{G}_{D}$ with corresponding bounded signals $\mathbf{x}_{1},\mathbf{x}_{2}$ be given. Denote their corresponding measures in $\Omega_{K}$ by $\mu$ and $\nu$ , respectively. For a function $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}:\Omega_{K}\to[0,1]$ that satisfies 1, it holds that

\big{|}\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\big{|}\leq L\cdot W_{1}\left(\mu,\nu;\frac{1}{L}\right),

(23)

where $W_{1}(\mu,\nu;C)$ denotes the $1$ -Wasserstein distance between $\mu$ and $\nu$ under the metric $d_{C}$ .

The proof of Theorem 2 can be found in Appendix D. Theorem 2 bounds the transfer equation (21) in terms of the $1$ -Wasserstein distance between the two distributions of rooted balls. The appearance of the $1$ -Wasserstein distance in this setting is intuitive: the dual formulation of the $1$ -Wasserstein distance defines the distance between two distributions via an integral probability metric over $1$ -Lipschitz functions [38], yielding a transferability bound that holds for all Lipschitz functions. We illustrate the notion of the $1$ -Wasserstein distance between graphs in Fig. 2.

Figure 2: The

1

-Wasserstein distance reflects structural similarity between graphs. (Top) Three graphs (a,b,c) with signals indicated by node coloring. (Bottom) Distributions of rooted

1

-balls in each graph (a,b,c), represented by histograms. The

x

-axis corresponds to the space

\Omega_{1}

, and the

y

-axis shows the density of each rooted

1

-ball in the graph. The

1

-Wasserstein distance, then, essentially describes the distance between histograms, subject to a metric structure on

\Omega_{1}

. For instance, graph (a) is expected to have a large

1

-Wasserstein distance from graphs (b,c), since there is minimal overlap between their histograms. On the other hand, graphs (b,c) have substantial overlap, so their

1

-Wasserstein distance will be smaller.

5.1 Examples and Discussion

Theorem 2 indicates that smoother functions $J$ generally transfer better than ones with large Lipschitz constant. Some examples of graph summaries that fulfill the conditions of Theorem 2 follow.

Example: Power spectral distribution. For any $K\geq 0$ , the spectral moment function $\mathbf{m}_{K}$ as defined in Section 4 descends to the expectation of the function $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}:\Omega_{K}\to\mathbb{R}$ . For any rooted $K$ -ball $\mathcal{G}$ , it holds that $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\cdot)$ is a polynomial of the graph signal in the second argument, and can thus be shown to satisfy 1. Thus, the difference in the spectral moments of two graphs can be bounded in terms of the Wasserstein distance of their respective distributions of rooted balls. Moreover, observe that for $K\geq M\geq 0$ , the moment function $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{M}$ can be defined on $\Omega_{K}$ in a way that preserves the identity (20), so that we can compare the similarity of spectral moments between graph signals by comparing the Lipschitz constants of the functions $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}$ and $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{M}$ on the same space $\Omega_{K}$ . Indeed, one can see that $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}$ has a larger Lipschitz constant than $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{M}$ , so that the bound given by Theorem 2 is tighter for the lower moments of the power spectral distribution, contingent upon a relatively small $1$ -Wasserstein distance between the distributions of rooted balls.

Example: MSE of a graph filter. Let $\mathcal{H}$ be a fixed $K$ -tap graph filter, and let $\sigma^{2}>0$ be given. For any given graph $\mathcal{G}$ on $n$ nodes with associated signal $\mathbf{x}$ and shift operator $\mathbf{S}$ , put the function $J$ as

J(\mathcal{G},\mathbf{x})=\mathbb{E}\left[\frac{1}{n}\big{\|}\mathbf{x}-\mathcal{H}(\mathbf{S})(\mathbf{x}+\boldsymbol{\eta})\big{\|}_{2}^{2}\right],

(24)

where the expectation is taken over the random vector $\boldsymbol{\eta}\sim\mathcal{N}(0,\sigma^{2}\mathbf{I})$ . In words, $J$ is the mean squared error when applying $\mathcal{H}$ to $(\mathcal{G},\mathbf{x})$ under additive white Gaussian noise. Define $J$ on rooted $K$ -balls $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,=(\mathcal{V},\mathcal{E},r)$ with shift operator $\mathbf{S}$ as

\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,)=([{\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,}]_{r}-[{\mathbf{\mathcal{H}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,}}]_{r})^{2}+\sigma^{2}[(\mathcal{H}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,))^{2}]_{r}.

(25)

If we denote the measure on $\Omega_{K}$ associated with $(\mathcal{G},\mathbf{x})$ by $\mu$ , one can show that

J(\mathcal{G},\mathbf{x})=\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}].

(26)

Observe that $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,)$ is a polynomial of the values of $\mathbf{x}$ for each rooted $K$ -ball $\mathcal{G}$ , so that it is Lipschitz continuous on each bounded signal space, thus satisfying 1. Moreover, the Lipschitz constant of $J$ over each signal space $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ can be shown to be proportional to $\|(\mathbf{I}-\mathcal{H}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,))\boldsymbol{\delta}_{r}\|_{2}$ , where $\boldsymbol{\delta}_{r}$ is the signal taking value $1$ on the root node and $0$ elsewhere. That is to say, the Lipschitz constant is directly influenced by the “spread” of the impulse response of the filter. Intuitively, this indicates that while filters with large coefficients in their higher-order terms may suit a particular signal well, they may transfer across graphs poorly. This aligns with existing results on the stability of graph filters [7, 39].

Estimation of the transferability bound. The computation of the $1$ -Wasserstein distance in (23) may prove problematic for large graphs. However, given the definition of the $1$ -Wasserstein distance as an integral probability metric, it is not difficult to compute an emprical estimate of $W_{1}(\mu,\nu;C)$ . By [38, Theorem 6], if $\{\omega_{j}^{\mu}\}_{j=1}^{m}$ and $\{\omega_{j}^{\nu}\}_{j=1}^{n}$ are i.i.d. samples drawn from the respective distributions $\mu$ and $\nu$ , then the $1$ -Wasserstein distance between the empirical distributions can be computed via a linear program. To draw i.i.d. samples from these distributions, one can choose a node uniformly at random (with replacement) from the node set of the underlying graph and then take the rooted $K$ -ball via the map $\Sigma_{K}$ . By [38, Corollary 10], the compact support of the distributions $\mu,\nu$ guarantees almost sure convergence of the empirical estimator to the true Wasserstein distance as $m,n\to\infty$ .

Tighter bounds. In Theorem 2, the difference in expectation between two distributions of rooted balls is characterized in terms of their Wasserstein distance, which is defined in terms of a metric imposed upon $\Omega_{K}$ . Given that the metric $d_{C}$ is defined for essentially arbitrary $C$ , one can derive a tighter bound on the transfer equation, with less stringent assumptions on the function $J$ . First, note that for two graphs $\mathcal{G}_{1},\mathcal{G}_{2}\in\mathbb{G}_{D}$ and bounded signals $\mathbf{x}_{1},\mathbf{x}_{2}$ , the corresponding measures $\mu,\nu$ have support in the compact subspace $\Gamma\subset\Omega_{K}$ , where $\Gamma$ is as described in Lemma 1. Because of this, any continuous function on $\Gamma$ is bounded. For a function $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}:\Omega_{K}\to\mathbb{R}$ satisfying 1, define the range of $J$ as

A=\sup_{\omega\in\Gamma}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\omega)-\inf_{\omega\in\Gamma}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\omega).

(27)

In the same vein as Theorem 2, it holds that

\big{|}\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\big{|}\leq\inf_{C\in(0,1]}\frac{L}{C}\cdot W_{1}\left(\mu,\nu;\frac{AC}{L}\right).

(28)

This bound illustrates an interplay between the smoothness of $J$ , the range of $J$ , and the regime of similarity between the two graphs. If two graphs have similar distributions of rooted balls, then the smoothness of $J$ plays a stronger role in bounding transferability. If they have highly distinct distributions of rooted balls, the range of $J$ , as described by the value $A$ , plays a stronger role in bounding transferability.

Quantifying weak convergence. It is well known that in cases of bounded support, the Wasserstein distance between probability measures metrizes the weak topology on the set of probability measures on that space, i.e., the topology induced by the definition of weak convergence [40]. Therefore, convergence in the $1$ -Wasserstein distance is equivalent to weak convergence of measures. For instance, this implies via Theorem 1 that the moments of the graph Fourier transform are continuous with respect to the $1$ -Wasserstein distance between distributions of rooted balls.

Extensions to graph neural networks. Although we have considered maps on graphs with real-valued signals, the analysis in this section is equally applicable when the nodes have other types of features such as categorical labeling. For example, one could consider $J$ to be the cross-entropy loss for a fixed graph neural network with $K$ layers, where the “graph signal” is taken to be a set of discrete input features and a ground-truth label against which the loss is evaluated [41].

Wasserstein graph kernels. Wasserstein distances between graphs have been considered in the graph kernel literature [16], where node embeddings in Euclidean space are computed from local structures about each node, followed by a computation of the Wasserstein distance in Euclidean space. Given that the proposed computation for nodes with continuous attributes in this paper is smooth with respect to the node attributes, one can show that the Wasserstein distance between distributions in our case descends to theirs via a continuous map.

By endowing the space $\Omega_{K}$ with an appropriate metric structure, we have applied the proposed framework to studying transferability of graph summaries that descend to maps on rooted $K$ -balls. In analyzing the relationship between the Lipschitz constant of the graph summary and the distribution of rooted balls, insights have been gleaned as to what drives commonalities between graphs in this regard. Although our focus in this section has been on fixed graph summaries, such as the loss of a fixed graph filter in Section 5.1, these ideas can be readily adapted to problems of designing graph summaries. For instance, the presence of the Lipschitz constant in Theorem 2 indicates that a graph filter or graph neural network will transfer better if it is designed to be smooth. This can be cast in the light of stability, where certain conditions on filter banks in graph neural networks can guarantee a response that is stable to structural perturbations [7, 39].

6 Extension to Weighted Graphs

In this section, we consider how ideas from the unweighted case (Sections 4 and 5) can be extended to weighted graphs. We define a weighted graph as a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ coupled with a weight function $w:\mathcal{E}\to\mathbb{R}_{\geq 0}$ . For a given graph $\mathcal{G}$ , the set of all possible weight functions on it is denoted $\mathbb{W}(\mathcal{G})=\{w:\mathcal{E}\to\mathbb{R}_{\geq 0}\}$ . We use the same notation for the set of weight functions on a rooted graph: $\mathbb{W}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ .

We now expand our definition of a rooted $K$ -ball with a signal to include weight functions. Define

\Omega_{K}^{W}=\coprod_{\,\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathcal{G}$\kern-0.70004pt}}}\,_{k}:k\leq K}\mathbb{W}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)\times\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,).

(29)

Similar to the definition of $\Omega_{K}$ in (6), the space $\Omega_{K}^{W}$ consists of all rooted $K$ -balls with both edge weights and graph signals on them. As before, a weighted graph with a signal $(\mathcal{G},w,\mathbf{x})$ yields elements of $\Omega_{K}^{W}$ via a suitably defined sampling map $\Sigma_{K}$ , defined for nodes $v\in\mathcal{V}$ :

\Sigma_{K}(v)=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v),\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$w$\kern-1.00006pt}}}_{K}(v),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v)),

(30)

where $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$w$\kern-1.00006pt}}}_{K}(v)$ denotes the restriction of $w$ to the edges of $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v)$ . As before, a given weighted graph with a signal defines a probability measure on $\Omega_{K}^{W}$ via the pushforward of the uniform measure by $\Sigma_{K}$ , which we denote by $\mu=(\Sigma_{K})_{*}(\mathcal{G},w,\mathbf{x})$ . For a weighted graph $(\mathcal{G},w)$ , define the weighted degree of a node $v\in\mathcal{V}$ as follows:

\mathrm{deg}(v)=\sum_{u\in\mathcal{N}(v)}w(v,u).

(31)

The set of weighted graphs whose nodes have weighted degree at most $D$ is denoted by $\mathbb{G}^{W}_{D}$ for real number $D>0$ .

We now develop a local description of the graph Fourier transform for weighted graphs, much like in Section 4. For a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with weight function $w$ , define the weighted graph Laplacian as a shift operator such that, for any $u,v\in\mathcal{V}$ ,

[\mathbf{S}]_{uv}=\begin{cases}\mathrm{deg}(v;w)&u=v\\ -w(u,v)&(u,v)\in\mathcal{E}\\ 0&\text{otherwise}.\end{cases}

(32)

The weighted Laplacian for a graph on $n$ nodes is positive semidefinite, and admits an eigendecomposition $\mathbf{S}=\sum_{j=1}^{n}\lambda_{j}\mathbf{u}_{j}\mathbf{u}_{j}^{\mathsf{T}}$ , with a Fourier representation for graph signals defined in a way analogous to (12). For a given graph signal $\mathbf{x}\in\mathbb{X}(\mathcal{G})$ , there is a power spectral distribution associated with $\mathbf{x}$ via the weighted Laplacian, whose moments are given by (15). We characterize the properties of the power spectral distribution in terms of the distribution of rooted $K$ -balls in the following theorem.

Theorem 3.

Let an integer $K\geq 0$ and a real number $D>0$ be given. Let $\{(\mathcal{G}_{j},w_{j},\mathbf{x}_{j})\}_{j=1}^{\infty}$ be a sequence of graphs, weight functions, and graph signals such that each graph is contained in $\mathbb{G}^{W}_{D}$ and the signals $\mathbf{x}_{j}$ are uniformly bounded. Let $\mu_{j}$ be the probability measure on $\Omega_{K}^{W}$ associated with $(\mathcal{G}_{j},w_{j},\mathbf{x}_{j})$ for each $j$ , and $P_{j}$ the corresponding normalized power spectral distribution function.

If the measures $\mu_{j}$ converge weakly, then the $K^{th}$ moments of the normalized power spectral distribution functions converge. Moreover, if weak convergence of measure holds for all $K\geq 0$ , then the normalized power spectral distribution functions converge weakly.

The proof of Theorem 3 can be found in Appendix E. Theorem 3 establishes that the power spectral distribution of a graph signal on a weighted graph is continuous with respect to the weak topology of distributions of rooted $K$ -balls, under boundedness assumptions. Unlike the regime of Theorem 1, the convergence of the power spectral distribution for weighted graphs here is not dependent on the combinatorial constraint of having bounded node degrees. Rather, any node may have an arbitrarily large number of neighbors, as long as the total influence remains bounded. This is a realistic assumption for many systems, where an agent may be allowed to exert influence on a large collection of other agents, but the total influence is bounded by some physical constraint, such as power consumption.

Remark 3 (Zero-weight edges).

In many applications, an edge with an assigned weight of zero is treated as a non-edge. For instance, the moment function $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}$ satisfies this condition. This motivates a quotient map on $\Omega_{K}^{W}$ that “glues” rooted balls with edge-weights that take value zero to rooted balls where those edges are not present. Any function that is continuous on $\Omega_{K}^{W}$ and also satisfying this condition will also be continuous under such a quotient map, so that our results still hold. The construction in this case is somewhat technical, so we omit it for simplicity of exposition.

7 In the Limit: Graphing Signal Processing

In Sections 5 and 6, the (weak) continuity of the power spectral density with respect to the distribution of rooted balls was established. Specifically, it was shown that weakly convergent sequences of distributions of rooted balls yielded weakly convergent power spectra. Given that these distributions correspond to underlying graphs, it remains to be established precisely what objects these sequences are converging to. In general, a weakly convergent sequence of finite graphs does not necessarily converge to a finite graph. When studying graph limits using graphon models, the homomorphism density of motifs is of primary concern [42]. This setting is slightly different, depending on the isomorphism density of rooted graphs. In this setting, a more appropriate model is that of a graphing. We show that the basic ideas in the discussion so far can be transferred directly to these limiting objects. Let us define the basic object of study for this section.

Definition 1.

A graphing of degree $D$ is a triplet $\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda)$ such that

1.

$\mathcal{V}$ is a sample space with a $\sigma$ -field $\mathcal{B}$ on $\mathcal{V}$
2.

$\lambda$ is a probability measure on $\mathcal{B}$
3.

$\mathcal{E}\in\mathcal{B}\times\mathcal{B}$ is such that, for all $A,A^{\prime}\in\mathcal{B}$ , we have

\int_{A}\mathrm{deg}(v,A^{\prime})d\lambda(v)=\int_{A^{\prime}}\mathrm{deg}(v,A)d\lambda(v),

(33)

where $\mathrm{deg}(v,A)=|\{u\in A:(u,v)\in\mathcal{E}\}|$ , with the condition that $\mathrm{deg}(v,\mathcal{V})\leq D$ for all $v\in\mathcal{V}$ .

A graphing is a way to describe a graph with a potentially uncountable number of nodes (elements of $\mathcal{V}$ ), yet with bounded node degrees. We tacitly assume that all graphings considered are of degree $D$ , for some $D\geq 1$ . Unlike a graphon, which describes dense graphs with unbounded node degrees, the structures that arise from a graphing are typically sparse, and notions of graph filtering and graph shifts reduce to finite sums, rather than continuous integrals. Given a graphing $\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda)$ , a graphing signal is a map $\mathbf{x}:\mathcal{V}\to\mathbb{R}$ between the nodes $\mathcal{V}$ and the real numbers $\mathbb{R}$ such that $\mathbf{x}$ is a measurable function. Endowing this with the usual vector space structure, we define the space of graphing signals as

\mathbb{X}(\mathbf{G})=\{\mathbf{x}:\mathcal{V}\to\mathbb{R}\ \big{|}\ \mathbf{x}\text{ is measurable}\}.

(34)

Much like how a graphon describes a model for dense random graphs, a graphing carries with it a distribution of random rooted graphs. As before, let $\mathcal{N}$ be the neighborhood operator, returning all of the one-hop neighbors of a node $v\in\mathcal{V}$ , so that $\mathcal{N}^{K}$ is the $K$ -hop neighborhood operator. For a graphing $\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda)$ with associated signal $\mathbf{x}\in\mathbb{X}(\mathbf{G})$ , the rooted $K$ -ball at a node $r\in\mathcal{V}$ is defined in the same way as in Section 3, denoted by $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(r)$ , with the same notation used to denote the corresponding graph signal $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(r)$ . Due to the bounded degree condition on the graphing, $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(r)$ is always a finite rooted graph of maximum degree at most $D$ . With this in mind, we define a sampling map $\Sigma_{K}:\mathcal{V}\to\Omega_{K}$ for graphings much like the sampling map (7) for finite graphs:

\Sigma_{K}(v)=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,(v)).

(35)

The sampling map $\Sigma_{K}$ is illustrated in Fig. 3 for a simple graphing. Much like for finite graphs, we can pushforward the measure $\lambda$ to yield a probability measure $\mu$ on $\Omega_{K}$ . That is to say, $\mu=(\Sigma_{K})_{*}(\mathbf{G},\mathbf{x})$ , in the same way as before, where we adopt $\lambda$ as the probability measure over $\mathcal{V}$ to be pushed forward. This differs slightly from the case of finite graphs, where we implicitly assume that the initial measure was the uniform probability measure on the finite node set.

Figure 3: Sampling of a random rooted graph from a graphing with an associated signal. (Upper left) A graphing

\mathbf{G}=([0,1),\mathcal{E},\lambda)

. We sample from

\mathbf{G}

by first selecting a root node

\rho

at random (colored in black), and then taking the connected component of

\rho

in the edge set. That is, starting with the node

\rho

, we see that the nodes

v_{1},v_{2}

are neighbors of

\rho

. Then, the nodes

v_{3},v_{4}

are neighbors of

v_{1},v_{2}

, with the process stopping there. (Lower left) A graphing signal

\mathbf{x}:[0,1)\to\mathbb{R}

. After sampling a node and edge set from the graphing, we sample this function at the corresponding values. (Right) The drawn rooted graph and signal sampled from the graphing.

To establish the basic building blocks for signal processing on graphings, we first construct a Laplacian for graphing signals. Unlike graph signals, graphing signals are not necessarily supported on a finite, or even countable space. We specify the graphing Laplacian as a linear operator on the space of graphing signals in the following way. For any $\mathbf{x}\in\mathbb{X}(\mathbf{G})$ and $v\in\mathcal{V}$ , put

[\mathbf{S}\mathbf{x}]_{v}=\sum_{u\in\mathcal{N}(v)}[{\mathbf{x}}]_{v}-[{\mathbf{x}}]_{u}.

(36)

By the degree boundedness condition from Definition 1, the sum in (36) is well-defined. Due to its singular nature, it is difficult to define a “graphing Fourier transform” directly from the Laplacian: for instance, even fairly tame graphing structures may have a Laplacian whose eigenvalues have uncountable multiplicity. This, for instance, makes the notion of the power spectral distribution of a graphing signal unwieldy.

However, the spectral properties of a graphing signal with respect to the underlying graphing structure can be studied indirectly. Let $\mathbf{G}$ be a graphing with graphing signal $\mathbf{x}\in\mathbb{X}(\mathbf{G})$ and Laplacian $\mathbf{S}$ . Then, for integers $K\geq 0$ , we tentatively define

\mathbf{m}_{K}(\mathbf{x})=\int_{\mathcal{V}}[{\mathbf{x}}]_{v}[\mathbf{S}^{K}\mathbf{x}]_{v}\mathrm{d}\lambda(v),

(37)

where $\mathbf{S}^{K}$ again indicates $K$ repeated applications of the Laplacian. See that this definition resembles that of the moments of the power spectral distribution of a graph signal in (15). We define the values $\mathbf{m}_{K}(\mathbf{x})$ tentatively, as it is not obvious when they are finite, or even well-defined. In the following discussion, we establish sufficient conditions under which the sequence $\{\mathbf{m}_{K}(\mathbf{x})\}_{K=0}^{\infty}$ exists and corresponds to the moments of a distribution function.

Much like in Sections 4 and 5, we control the behavior of the graphing by bounding its size. We first do this by determining a suitable notion of boundedness for a graphing signal.

Definition 2.

Let $\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda)$ be a graphing and $\mathbf{x}\in\mathbb{X}(\mathbf{G})$ be a graphing signal. The signal $\mathbf{x}$ is said to be locally essentially bounded if it is bounded on almost all of the connected components of $\mathbf{G}$ . That is, there exists an $a\geq 0$ such that for all $K\geq 0$ ,

\lambda\left(\mathcal{N}^{K}((\mathbf{x}^{-1}[-a,a])^{\mathsf{C}})\right)=0,

(38)

where $(\mathbf{x}^{-1}[-a,a])^{\mathsf{C}}$ denotes the set of nodes $v\in\mathcal{V}$ such that $[{\mathbf{x}}]_{v}\notin[-a,a]$ , and $\mathcal{N}^{K}$ is the $K$ -hop neighborhood of a set. Denote the set of all such graphing signals by $\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ .

In words, local essential boundedness not only controls the size of the set of nodes with large signal value (essential boundedness), but also controls the size of the neighborhoods with which those nodes can interact. Local essential boundedness is an analog of boundedness for finite graph signals adapted to graphing signals. For instance, a graphing signal that is strictly bounded is locally essentially bounded. This condition, however, is stronger than being bounded almost everywhere (see Lemma 4), due to the neighborhood condition in (38). The necessity of this condition stems from the highly singular nature of the graphing Laplacian as an operator on signals.

We will also find it useful to consider graphings that “resemble” finite graphs in some sense.

Definition 3.

Let $\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda)$ be a graphing with graphing signal $\mathbf{x}\in\mathbb{X}(\mathbf{G})$ , and $\{(\mathcal{G}_{j},\mathbf{x}_{j})\}_{j=1}^{\infty}$ a sequence of graphs with associated graph signals. For all $K\geq 0$ , put $\mu_{K}=(\Sigma_{K})_{*}(\mathbf{G},\mathbf{x})$ , and $\mu_{K}(j)=(\Sigma_{K})_{*}(\mathcal{G}_{j},\mathbf{x}_{j})$ . Then, we say that the sequence of graphs $\{(\mathcal{G}_{j},\mathbf{x}_{j})\}_{j=1}^{\infty}$ converges weakly to $(\mathbf{G},\mathbf{x})$ if for all $K\geq 0$ , $\mu_{K}(j)$ converges to $\mu_{K}$ as $j$ tends to infinity. We denote weak convergence of graphs to a graphing by $(\mathcal{G}_{j},\mathbf{x}_{j})\rightharpoonup(\mathbf{G},\mathbf{x})$ .

Much like Theorem 1, we can characterize the properties of a graphing based on the limiting properties of a sequence of graphs that converges to it. We state this formally below.

Theorem 4.

Let a graphing $\mathbf{G}$ of degree $D$ with signal $\mathbf{x}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ be given. Suppose there exists a sequence of graphs and graph signals $\{(\mathcal{G}_{j},\mathbf{x}_{j})\}_{j=1}^{\infty}$ such that $(\mathcal{G}_{j},\mathbf{x}_{j})\rightharpoonup(\mathbf{G},\mathbf{x})$ . Then, there exists a unique distribution function $P_{\mathbf{x}}$ supported on $[0,2D]$ such that the moments of $P_{\mathbf{x}}$ are given by the sequence $\{\mathbf{m}_{K}(\mathbf{x})\}_{K=0}^{\infty}$ .

The proof of Theorem 4 can be found in Appendix F. Theorem 4 defines the power spectral distribution for sufficiently nice graphing signals. In particular, if the graphing and graphing signal are the limit of a sequence of graphs and graph signals, then asking that the power spectral distribution be continuous with respect to the weak topology on distributions forces the distribution to take on a unique value. Namely, the power spectral distribution of the limit of a sequence of graphs and graph signals is merely the weak limit of the power spectral distributions of the graphs as defined in (14). To show that this definition properly preserves the graph power spectral distribution of finite graphs, we illustrate with a simple example.

Example 1.

From a finite graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ on $n$ nodes and a graph signal $\mathbf{x}\in\mathbb{X}(\mathcal{G})$ , a graphing $\mathbf{G}$ with graphing signal $\mathbf{y}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ can be constructed in the following way (cf. [42, Example 18.16]). Identify $\mathcal{V}$ with the integers $1,2,\ldots,n$ . Let $\mathcal{W}=[0,1)$ with the usual Borel $\sigma$ -field and the uniform (Lebesgue) probability measure $\lambda$ . We construct a graphing $\mathbf{G}=(\mathcal{W},\mathcal{F},\lambda)$ in the following way. Partition $\mathcal{W}$ into intervals $I_{i}=[(i-1)/n,i/n)$ indexed by $i\in\mathcal{V}$ . Then, for each $(i,j)\in\mathcal{E}$ with $i<j$ , and every point $t\in I_{i}$ , put the points $(t,t+(j-i)/n)\in I_{i}\times I_{j},(t+(j-i)/n,t)\in I_{j}\times I_{i}$ into the edge set $\mathcal{F}$ . Additionally, assign $[{\mathbf{y}}]_{t}=[{\mathbf{x}}]_{i}$ .

Under this construction, one can show that for all $K\geq 0$ , the distributions on $\Omega_{K}$ corresponding to $(\mathcal{G},\mathbf{x})$ and $(\mathbf{G},\mathbf{y})$ are equal, i.e., $(\Sigma_{K})_{*}(\mathcal{G},\mathbf{x})=(\Sigma_{K})_{*}(\mathbf{G},\mathbf{y})$ . It holds, then, that $\mathbf{m}_{K}(\mathbf{x})=\mathbf{m}_{K}(\mathbf{y})$ for all $K$ , so that the power spectral distribution corresponding to the graphing signal $\mathbf{y}$ is identical to that of the graph signal $\mathbf{x}$ .

In Sections 4, 5 and 6, we considered how the distribution of rooted balls in graphs yields useful topological or metric structure with which graphs can be related to each other, particularly in light of graph summaries such as the power spectral distribution. In all of these cases, the topology of weak convergence was used to show that a convergent sequence of graphs in the sense of the distribution of rooted balls is also convergent in the sense of any appropriate graph summary. This left the question open: what do the graphs converge to? By considering graphings as limiting objects for sparse graphs, we have shown that the extension of graph summaries, such as the power spectral distribution, to graphings maintains continuity in the natural way.

Remark 4 (Hyperfinite graphings).

Theorem 4 hinges on the graphing $\mathbf{G}$ and signal $\mathbf{x}$ being the limit of a sequence of finite graphs and graph signals. As noted in [42, Conjecture 19.8], the question of whether all graphings are limits of sequences of graphs is still an open question. However, there is a class of graphings for which this is known to hold: namely, hyperfinite graphings. We refer the interested reader to [42, Chapter 21] for more details.

8 Conclusion

Taking into account the folklore knowledge that low-order graph filters only use local information on graphs, we have developed a local framework in which many aspects of graph signal processing are illuminated. In particular, by mapping a graph and graph signal to a probability distribution on a fixed space, we allow for the application of tools from probability theory to characterize similarities between graphs based on their local structures, even if they do not have a common node set. Using the topology of weak convergence of probability distributions, we define a topology on the space of graphs and graph signals, where notions of continuity can be understood: this was most obviously demonstrated in Section 4, where it was proved that the power spectral distribution of a graph signal is weakly continuous with respect to the distribution of rooted balls in a graph. This approach allows for relatively painless treatment of limiting objects, such as graphings, by simply decomposing them into their constituent distributions.

The analysis carried out in this paper provides proper justification for transferring graph filters across multiple graphs, given the assumption that they share local structural properties. With the particular suitability of this approach to highly sparse graphs, this complements existing work on transferability based on graphon models or assumptions of graphs as discretizing manifold structures. Leveraging the framework of this paper, it would be interesting to study more exotic types of functions, such as graph neural networks and graph kernels, or to incorporate the properties encoded by distributions of local structures into tasks such as network topology inference.

Appendix A Metric Topology of the Signal Space

Let a rooted ball $\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,=(\mathcal{V},\mathcal{E},r)$ be given. Denote by $\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ the automorphism group of maps $\{\phi:\mathcal{V}\to\mathcal{V}\}$ , i.e., the set of all isomorphisms from $\mathcal{G}$ to itself, with the group operation given by composition. The group $\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ acts on $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ in a natural way. For any $\phi\in\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,),\mathbf{x}\in\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ , define the action of $\phi$ on $\mathbf{x}$ as the graph signal $\phi(\mathbf{x})$ satisfying

[\phi(\mathbf{x})]_{\phi(v)}=[{\mathbf{x}}]_{v}

(39)

for all $v\in\mathcal{V}$ . Next, define an equivalence relation on $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ where $\mathbf{x}\!\sim\!\phi(\mathbf{x})$ for all $\phi\in\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,),\mathbf{x}\in\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ , and take the quotient space $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)/\!\sim$ , whose elements are the equivalence classes under this relation. Define the metric $\|\cdot\|_{2}$ on $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)/\!\sim$ so that for any $[\mathbf{x}],[\mathbf{y}]\in\mathbb{X}/\!\sim$ ,

\|[\mathbf{x}]-[\mathbf{y}]\|_{2}=\min_{\begin{subarray}{c}\mathbf{x}^{\prime}\in[\mathbf{x}]\\ \mathbf{y}^{\prime}\in[\mathbf{y}]\end{subarray}}\|\mathbf{x}^{\prime}-\mathbf{y}^{\prime}\|_{2},

(40)

where $\|\mathbf{x}^{\prime}-\mathbf{y}^{\prime}\|_{2}$ is the norm on $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ inherited from identification with $\mathbb{R}^{|\mathcal{V}|}$ . As $\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ is a finite group, this minimum is well-defined, and satisfies the triangle inequality, thus yielding a valid metric. When defining the space $\Omega_{K}$ in Section 3.1 we treat $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ with the structure of $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)/\!\sim$ , from which the topology on $\Omega_{K}$ is inherited. Similarly, we use the metric $\|\cdot\|_{2}$ in Section 5 as defined on $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)/\!\sim$ .

Appendix B Proof of Theorem 1

By Lemma 1, the sequence $\{\mu_{j}\}_{j=1}^{\infty}$ has compact support. Since $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}$ is a continuous function, this implies that $\{\mathbb{E}_{\mu_{j}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]\}_{j=1}^{\infty}$ is a convergent sequence by the definition of weak convergence of probability measures. Applying Proposition 2, the moments $\mathbf{m}_{K}(\mathbf{x}_{j})=\mathbb{E}_{\mu_{j}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]$ converge, as desired.

For the case where weak convergence holds for all $K\geq 0$ , we complete the proof with the following lemma.

Lemma 2.

Let $A=[0,a]$ be a compact subset of the real line, for some $a>0$ . Let $\{\{\mathbf{m}_{K}(j)\}_{K=0}^{\infty}\}_{j=1}^{\infty}$ be an array of numbers such that $\{\mathbf{m}_{K}(j)\}_{K=0}^{\infty}$ is the sequence of moments of a finite measure $P_{j}$ supported on $A$ , and that $\mathbf{m}_{K}(j)\overset{j}{\to}\mathbf{m}_{K}$ for each $K$ , where $\{\mathbf{m}_{K}\}_{K=0}^{\infty}$ is some sequence of real numbers.

Then, the sequence $\{\mathbf{m}_{K}\}_{K=0}^{\infty}$ uniquely corresponds to a distribution $P$ supported on $A$ , such that the sequence of distributions $P_{j}$ converges weakly to $P$ .

Lemma 2 is a routine derivation following from the Hausdorff moment property [43, Theorem 3.15] and the Weierstrass approximation theorem [44], so we omit the proof. In particular, let $A=[0,2D]$ . The moments $\mathbf{m}_{K}(\mathbf{x}_{j})$ correspond to the power spectral distributions of the graph signals $\mathbf{x}_{j}$ , which are supported on $A$ by the bounded degree assumption. Since $\mathbf{m}_{K}(\mathbf{x}_{j})$ converges for all $K\geq 0$ , we have that the sequence of limits $\{\mathbf{m}_{K}\}_{K=0}^{\infty}$ is the sequence of moments of a distribution $P$ supported on $A$ . Moreover, we have that $P$ is the weak limit of the power spectral distributions of the graph signals $\mathbf{x}_{n}$ , as desired.

Appendix C Proof of Lemma 1

Denote by $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D}$ the set of rooted graphs such that the underlying graph has node degree bounded by $D$ . For any $K\geq 0$ and any graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ contained in $\mathbb{G}_{D}$ , note that for all $v\in\mathcal{V}$ , it holds that the rooted $K$ -ball centered at $v$ is contained in $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D}$ . Moreover, it is straightforward to show that there are only finitely many rooted $K$ -balls contained in $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D}$ , up to isomorphism. For any $a\geq 0$ and any rooted graph $\mathcal{G}$ , denote the set of graph signals taking values in $[-a,a]$ by $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,a)$ . It is clear that $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,a)$ is compact as a subset of $\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)$ .

Using these constructions, it immediately follows that for any graph $\mathcal{G}\in\mathbb{G}_{D}$ with graph signal $\mathbf{x}\in\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,a)$ for some $a>0$ , the associated distribution $\mu=(\Sigma_{K})_{*}(\mathcal{G},\mathbf{x})$ satisfies

\operatorname{supp}(\mu)\subseteq\coprod_{\,\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathcal{G}$\kern-0.70004pt}}}\,_{k}\in\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathbb{G}$\kern-0.70004pt}}}_{D}:k\leq K}\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,a).

(41)

Since there are only finitely many rooted $K$ -balls in $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D}$ , the above set is a disjoint union of finitely many compact spaces, and is thus a compact space itself, by the properties of the disjoint union of topological spaces. If a family of graphs in $\{\mathcal{G}_{j}\}_{j\in J}\in\mathbb{G}_{D}$ has uniformly bounded signals $\{\mathbf{x}_{j}\}_{j\in J}$ for some index set $J$ , this means that there is some $a\geq 0$ such that $\mathbf{x}_{j}\in\mathbb{X}(\mathcal{G}_{j},a)$ for all $j\in J$ , so that the condition in (41) holds for all graph/graph signals in the family, as desired.

Appendix D Proof of Theorem 2

We first establish that for a function $J$ satisfying 1, the metric $d_{C}$ preserves Lipschitz continuity over the subspace $\Gamma\subset\Omega_{K}$ as defined in Lemma 1.

Lemma 3.

Let $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}:\Omega_{K}\to\mathbb{R}$ be a function satisfying 1. Then, for any $C>0$ , $J$ has a Lipschitz constant at most $\max\{L,A/C\}$ on the subspace $\Gamma$ with the metric $d_{C}$ , where

A=\sup_{\omega\in\Gamma}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\omega)-\inf_{\omega\in\Gamma}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\omega).

(42)

Lemma 3 can be shown by directly applying the definition of Lipschitz continuity, so we omit the proof. By the dual formulation of the $1$ -Wasserstein distance [38], Lemma 3 implies that, for $C>0$ ,

\big{|}\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\big{|}\leq\max\left\{L,\frac{A}{C}\right\}\cdot W_{1}\left(\mu,\nu;C\right),

(43)

owing to the fact that $\operatorname{supp}(\mu),\operatorname{supp}(\nu)\subseteq\Gamma$ , by assumption. Observing that $W_{1}(\mu,\nu;C)$ is monotonically increasing with respect to $C$ , we can restrict our view to $C\geq A/L$ . Applying a simple change of variables yields the expression

\big{|}\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\big{|}\leq\inf_{C\in(0,1]}\frac{L}{C}\cdot W_{1}\left(\mu,\nu;\frac{AC}{L}\right).

(44)

Taking $C=1$ and noting that $A=1$ by assumption yields the bound (23), as desired.

Appendix E Proof of Theorem 3

One can check that $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}$ is continuous and bounded on $\bigcup_{j=1}^{\infty}\operatorname{supp}(\mu_{j})$ due to the weighted degree bound $D$ and the uniform boundedness of the signals, so that $\{\mathbb{E}_{\mu_{j}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]\}_{j=1}^{\infty}$ is a convergent sequence by the definition of weak convergence of probability measures. Applying Proposition 2, the moments $\mathbf{m}_{K}(\mathbf{x}_{j})=\mathbb{E}_{\mu_{j}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]$ converge, as desired.

When weak convergence holds for all $K\geq 0$ , the proof is completed by appealing to Lemma 2 as in the proof of Theorem 1.

Appendix F Proof of Theorem 4

We begin by establishing two results regarding locally essentially bounded signals $\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ and the graphing Laplacian $\mathbf{S}$ .

Lemma 4.

For a graphing $\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda)$ , the set $\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ is a subset of $\mathcal{L}^{\infty}(\mathcal{V},\lambda)$ , where $\mathcal{L}^{\infty}(\mathcal{V},\lambda)$ denotes the space of essentially bounded functions on $\mathcal{V}$ with respect to the probability measure $\lambda$ .

We omit the proof of Lemma 4, as it follows directly from Definition 2.

Lemma 5.

$\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ is closed under application of the Laplacian $\mathbf{S}$ , i.e., for any $\mathbf{x}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ , we have that $\mathbf{S}\mathbf{x}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ .

The proof of Lemma 5 can be found in Appendix G.

For any $K\geq 0$ , the graphing signal $\mathbf{S}^{K}\mathbf{x}$ is contained in $\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ by Lemma 5. The product of two functions in $\mathcal{L}^{\infty}(\mathcal{V},\lambda)$ is contained in $\mathcal{L}^{\infty}(\mathcal{V},\lambda)$ , so that the graphing signal $\mathbf{y}:v\to[{\mathbf{x}}]_{v}\cdot[\mathbf{S}^{K}\mathbf{x}]_{v}$ is contained in $\mathcal{L}^{\infty}(\mathcal{V},\lambda)$ , by Lemma 4. Thus, $\mathbf{m}_{K}(\mathbf{x})=\mathbb{E}_{\lambda}[\mathbf{y}]$ has finite value for all $K\geq 0$ .

Let $K\geq 0$ be given, and let $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}:\Omega_{K}\to\mathbb{R}$ be the function as defined in Proposition 2. One can check that $\mathbf{m}_{K}(\mathbf{x})=\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]$ , where $\mu=(\Sigma_{K})_{*}(\mathbf{G},\mathbf{x})$ . By the bounded degree property of the graphing and the assumption that $\mathbf{x}\in\mathcal{L}^{\infty}(\mathcal{V},\lambda)$ (as a consequence of Lemma 4), Lemma 1 implies that $\operatorname{supp}(\mu)$ is contained in a compact subset of $\Omega_{K}$ . For each finite graph and graph signal $(\mathcal{G}_{n},\mathbf{x}_{n})$ , denote the $K^{th}$ moment of the signal $\mathbf{x}_{n}$ by $\mathbf{m}_{K}(\mathbf{x}_{n})$ , and the pushforward measure of $\Sigma_{K}$ on the space $\Omega_{K}$ by $\mu_{n}$ , so that $\mathbf{m}_{K}(\mathbf{x}_{n})=\mathbb{E}_{\mu_{n}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]$ , by Proposition 2. The assumption that $(\mathcal{G}_{n},\mathbf{x}_{n})\rightharpoonup(\mathbf{G},\mathbf{x})$ implies that $\mu_{n}$ weakly converges to $\mu$ . Since $\mu$ has compact support and $\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}$ is a continuous function, this implies that $\mathbb{E}_{\mu_{n}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]\to\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]$ . In other words, $\mathbf{m}_{K}(\mathbf{x}_{n})\to\mathbf{m}_{K}(\mathbf{x})$ .

Having satisfied the conditions of Theorem 1, we have that the power spectral distributions of the finite graphs converge to a unique distribution supported on $[0,2D]$ , and that the moments of this distribution are given by $\{\mathbf{m}_{K}(\mathbf{x})\}_{K=0}^{\infty}$ by Lemma 2, as desired.

Appendix G Proof of Lemma 5

Let a degree $D$ graphing $\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda)$ with graphing signal $\mathbf{x}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ be given. By definition, there exists an $a\geq 0$ such that for all $K\geq 0$ , we have

\lambda\left(\mathcal{N}^{K}((\mathbf{x}^{-1}[-a,a])^{\mathsf{C}})\right)=0.

(45)

Put $b=2Da$ , and define the graphing signal $\mathbf{y}=\mathbf{S}\mathbf{x}$ , where $\mathbf{S}$ denotes the graphing Laplacian. Observe, due to the bounded degree $D$ of the graphing, that

(\mathbf{y}^{-1}[-b,b])^{\mathsf{C}}\subseteq\mathcal{N}\left((\mathbf{x}^{-1}[-a,a])^{\mathsf{C}}\right),

(46)

so that, for all $K\geq 0$ ,

\mathcal{N}^{K}\left((\mathbf{y}^{-1}[-b,b])^{\mathsf{C}}\right)\subseteq\mathcal{N}^{K+1}\left((\mathbf{x}^{-1}[-a,a])^{\mathsf{C}}\right).

(47)

By the definition of local essential boundedness, the right hand side of (47) has measure zero (under the probability measure $\lambda$ ), which implies that the left hand side also has measure zero. Since $K$ was arbitrary, this establishes that $\mathbf{y}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G})$ .

References

[1] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, 2013, doi:10.1109/MSP.2012.2235192.
[2] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, 2018, doi:10.1109/JPROC.2018.2820126.
[3] D. Owerko, F. Gama, and A. Ribeiro, “Optimal power flow using graph neural networks,” in IEEE Int. Conf. Acoust., Speech and Signal Process. IEEE, May 2020, pp. 5930–5934, doi:10.1109/ICASSP40776.2020.9053140.
[4] F. Gama, Q. Li, E. Tolstaya, A. Prorok, and A. Ribeiro, “Synthesizing decentralized controllers with graph neural networks and imitation learning,” Oct. 2021, eprint:arXiv 2012.14906v3.
[5] W. Huang, T. A. W. Bolton, J. D. Medaglia, D. S. Bassett, A. Ribeiro, and D. Van De Ville, “A graph signal processing perspective on functional brain imaging,” Proc. IEEE, vol. 106, no. 5, pp. 868–885, 2018, doi:10.1109/JPROC.2018.2798928.
[6] J. Ma, W. Huang, S. Segarra, and A. Ribeiro, “Diffusion filtering of graph signals and its use in recommendation systems,” in IEEE Int. Conf. Acoust., Speech and Signal Process. IEEE, 2016, pp. 4563–4567, doi:10.1109/ICASSP.2016.7472541.
[7] F. Gama, J. Bruna, and A. Ribeiro, “Stability properties of graph neural networks,” IEEE Trans. Signal Process., vol. 68, pp. 5680–5695, 2020, doi:10.1109/TSP.2020.3026980.
[8] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, “Network motifs: Simple building blocks of complex networks,” Science, vol. 298, no. 5594, pp. 824–827, 2002, doi:10.1126/science.298.5594.824.
[9] C. N. Tzouanas, S. Kim, K. N. Badhiwala, B. W. Avants, and J. T. Robinson, “Hydra vulgaris shows stable responses to thermal stimulation despite large changes in the number of neurons,” iScience, vol. 24, no. 6, p. 102490, 2021, doi:10.1016/j.isci.2021.102490.
[10] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Appl. Comput. Harmonic Anal., vol. 30, no. 2, pp. 129–150, 2011, doi:10.1016/j.acha.2010.04.005.
[11] D. I. Shuman, C. Wiesmeyr, N. Holighaus, and P. Vandergheynst, “Spectrum-adapted tight graph wavelet and vertex-frequency frames,” IEEE Trans. Signal Process., vol. 63, no. 16, pp. 4223–4235, 2015, doi:10.1109/TSP.2015.2424203.
[12] T. M. Roddenberry, F. Frantzen, M. T. Schaub, and S. Segarra, “Hodgelets: Localized spectral representations of flows on simplicial complexes,” 2021, eprint:arXiv 2109.08728.
[13] N. M. Kriege, F. D. Johansson, and C. Morris, “A survey on graph kernels,” Appl. Netw. Sci., vol. 5, no. 1, pp. 1–42, 2020, doi:10.1007/s41109-019-0195-3.
[14] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt, “Efficient graphlet kernels for large graph comparison,” in Int. Conf. Artificial Intell., Statist. PMLR, 2009, pp. 488–495.
[15] C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel, “Faster kernels for graphs with continuous attributes via hashing,” in Int. Conf. Data Mining. IEEE, 2016, pp. 1095–1100, doi:10.1109/ICDM.2016.0142.
[16] M. Togninalli, E. Ghisu, F. Llinares-López, B. A. Rieck, and K. Borgwardt, “Wasserstein Weisfeiler-Lehman graph kernels,” Conf. Neural Inform. Process. Syst., vol. 9, 2020.
[17] P. de Haan, T. Cohen, and M. Welling, “Natural graph networks,” 2020, eprint:arXiv 2007.08349.
[18] B. Bevilacqua, F. Frasca, D. Lim, B. Srinivasan, C. Cai, G. Balamurugan, M. M. Bronstein, and H. Maron, “Equivariant subgraph aggregation networks,” 2021, eprint:arXiv 2110.02910.
[19] M. Zhang and P. Li, “Nested graph neural networks,” Conf. Neural Inform. Process. Syst., vol. 34, 2021.
[20] Q. Zhu, C. Yang, Y. Xu, H. Wang, C. Zhang, and J. Han, “Transfer learning of graph neural networks with ego-graph information maximization,” Conf. Neural Inform. Process. Syst., vol. 34, 2021.
[21] T. Maehara and H. NT, “Learning on random balls is sufficient for estimating (some) graph parameters,” Conf. Neural Inform. Process. Syst., vol. 34, 2021.
[22] J. Leskovec and J. Mcauley, “Learning to discover social circles in ego networks,” Conf. Neural Inform. Process. Syst., vol. 25, 2012.
[23] T. Tao, An introduction to measure theory. Amer. Math. Soc., 2011, vol. 126, doi:10.1090/gsm/126.
[24] J. M. Lee, An Introduction to Smooth Manifolds. Springer, 2012, doi:10.1007/978-1-4419-9982-5.
[25] F. Ji, W. P. Tay, and A. Ortega, “Graph signal processing over a probability space of shift operators,” 2021, eprint:arXiv 2108.09192.
[26] N. Shervashidze, P. Schweitzer, E. J. Van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-Lehman graph kernels,” J. Mach. Learning Res., vol. 12, no. 9, 2011.
[27] F. Gama and A. Ribeiro, “Ergodicity in stationary graph processes: A weak law of large numbers,” IEEE Trans. Signal Process., vol. 67, no. 10, pp. 2761–2774, Apr. 2019, doi:10.1109/TSP.2019.2908909.
[28] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear network operators,” IEEE Trans. Signal Process., vol. 65, no. 15, pp. 4117–4131, 2017, doi:10.1109/TSP.2017.2703660.
[29] M. Avella-Medina, F. Parise, M. T. Schaub, and S. Segarra, “Centrality measures for graphons: Accounting for uncertainty in networks,” IEEE Trans. Network Sci. Eng., vol. 7, no. 1, pp. 520–537, 2018, doi:10.1109/TNSE.2018.2884235.
[30] L. Ruiz, L. Chamon, and A. Ribeiro, “Graphon signal processing,” IEEE Trans. Signal Process., vol. 69, pp. 4961–4976, 2021, doi:10.1109/TSP.2021.3106857.
[31] C. Borgs, J. T. Chayes, H. Cohn, and N. Holden, “Sparse exchangeable graphs and their limits via graphon processes,” J. Mach. Learning Res., vol. 18, no. 210, pp. 1–71, 2018. [Online]. Available: http://jmlr.org/papers/v18/16-421.html
[32] S. Segarra and A. Ribeiro, “Stability and continuity of centrality measures in weighted graphs,” IEEE Trans. Signal Process., vol. 64, no. 3, pp. 543–555, 2015, doi:TSP.2015.2486740.
[33] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, 1998, doi:10.1038/30918.
[34] B. Bollobás, Modern graph theory. Springer Science & Business Media, 1998, vol. 184, doi:10.1007/978-1-4612-0619-4.
[35] F. Gama, A. G. Marques, G. Mateos, and A. Ribeiro, “Rethinking sketching as sampling: A graph signal processing approach,” Signal Process., vol. 169, pp. 107 404 (1–15), Apr. 2020, doi:10.1016/j.sigpro.2019.107404.
[36] F. Gama and S. Sojoudi, “Distributed linear-quadratic control with graph neural networks,” Signal Process., Feb. 2022, doi:10.1016/j.sigpro.2022.108506.
[37] L. De Branges, “The Stone-Weierstrass theorem,” Proc.Amer. Math. Soc., vol. 10, no. 5, pp. 822–824, 1959, doi:10.1090/S0002-9939-1959-0113131-7.
[38] B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. Lanckriet, “On integral probability metrics, $\phi$ -divergences and binary classification,” 2009, eprint:arXiv 0901.2698.
[39] L. Ruiz, F. Gama, and A. Ribeiro, “Graph neural networks: Architectures, stability, and transferability,” Proc. IEEE, vol. 109, no. 5, pp. 660–682, 2021, doi:10.1109/JPROC.2021.3055400.
[40] C. Villani, Optimal transport. Springer, 2009, vol. 338, doi:10.1007/978-3-540-71050-9.
[41] F. Gama, E. Isufi, G. Leus, and A. Ribeiro, “Graphs, convolutions, and neural networks: From graph filters to graph neural networks,” IEEE Signal Process. Mag., vol. 37, no. 6, pp. 128–138, Nov. 2020, doi:10.1109/MSP.2020.3016143.
[42] L. Lovász, Large networks and graph limits. Amer. Math. Soc., 2012, vol. 60, doi:10.1090/coll/060.
[43] K. Schmüdgen, The moment problem. Springer, 2017, vol. 9, doi:10.1007/978-3-319-64546-9.
[44] K. Weierstrass, “Über die analytische Darstellbarkeit sogenannter willkürlicher Functionen einer reellen veränderlichen,” Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften zu Berlin, vol. 2, pp. 633–639, 1885.