This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Local Distributions in Graph Signal Processing

T. Mitchell Roddenberry, Fernando Gama,
Richard G. Baraniuk, Santiago Segarra
All authors are with the Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA. SS and TMR are supported by NSF CCF-2008555.
(February 2022)
Abstract

Graph filtering is the cornerstone operation in graph signal processing (GSP). Thus, understanding it is key in developing potent GSP methods. Graph filters are local and distributed linear operations, whose output depends only on the local neighborhood of each node. Moreover, a graph filter’s output can be computed separately at each node by carrying out repeated exchanges with immediate neighbors. Graph filters can be compactly written as polynomials of a graph shift operator (typically, a sparse matrix description of the graph). This has led to relating the properties of the filters with the spectral properties of the corresponding matrix – which encodes global structure of the graph. In this work, we propose a framework that relies solely on the local distribution of the neighborhoods of a graph. The crux of this approach is to describe graphs and graph signals in terms of a measurable space of rooted balls. Leveraging this, we are able to seamlessly compare graphs of different sizes and coming from different models, yielding results on the convergence of spectral densities, transferability of filters across arbitrary graphs, and continuity of graph signal properties with respect to the distribution of local substructures.

1 Introduction

Graph filters are a fundamental building block in graph signal processing, where cascaded applications of a graph shift operator model diffusion on the nodes of a graph [1, 2]. The analogy between filtering in discrete time and filtering on graphs has led to a fruitful research direction, with applications including power systems [3], robotics [4], neuroscience [5], and recommender systems [6]. Due to their typical implementation as low-degree matrix polynomials, graph filters are local operators, where the output of a graph filter at a given node is strictly dependent on the connectivity structure and signal values on the node’s local neighborhood. This highlights an invariance property of graph filters, typically summarized by the property of permutation equivariance. However, the equivariance of graph filters is much stronger than not being sensitive to permutations of nodes. If the same filter is applied to two different graphs, and two nodes within each of those graphs have identical neighborhoods, then the graph filter output at those nodes will be identical as well [7]. Indeed, graph filters in their usual implementation are equivariant to local substructures, which have been shown to be of primary importance in real-world networks [8], often leading to useful properties such as scale invariance and robustness [9].

The importance of local substructures is an incipient development in the literature. This view has been considered before in the graph wavelet literature [10, 11, 12], where wavelet atoms are constructed by applying graph filters to impulse functions on each node in the graph, yielding a dictionary of atoms with known spectral properties that also exhibit (approximate) spatial localization. As a representational tool for graphs predating the development of graph signal processing, graph kernel methods have put forth the idea of graphs as bags of motifs [13], such as in the paper on graphlets [14]. Works such as [15, 16] have considered the extension of graph kernels for graphs with continuous labels, typically via a neighborhood aggregation or discretization approach. In [17], equivariance to local structures is proposed as a more useful invariant for graph neural networks, as opposed to permutation equivariance. For instance, [18] considers neural networks for graph classification that act on small subgraphs over an entire graph, and [19] considers how to design graph neural networks that can recognize structures more expressive than those of the Weisfeiler-Lehman test. The treatment of graphs as distributions of ego-networks in [20] was used to devise a novel loss function for training of graph neural networks based on the maximization of mutual information. Additionally, local structures at each node need not be deterministic, as [21] considers random walk features and defines the notion of an estimable parameter under such a model.

In this work, we develop machinery for reasoning about basic notions in graph signal processing, with the primacy of local substructures in mind. After reviewing basic definitions for graph signal processing in Section 2, we make the following theoretical contributions:

  1. 1.

    We introduce rooted graphs and rooted graph filters as vehicles for describing localized graph filters. In doing so, we develop a measure-theoretic view of graph signal processing that considers distributions of rooted balls and signals supported on them (Section 3).

  2. 2.

    Within the proposed framework, we illustrate how convergence of Fourier spectra of graph signals can be understood in terms of weak convergence of measures (Section 4).

  3. 3.

    We apply the proposed framework to yield a principled understanding of the transferability of graph summaries via integral probability metrics (Section 5).

  4. 4.

    To highlight the flexibility of the proposed approach, we extend the relationship between distributions of local graph structures and the Fourier spectra of graph signals to weighted graphs (Section 6).

  5. 5.

    We identify graphings and signals supported on them as the appropriate limiting objects in the proposed framework, and develop basic notions of graphing signal processing. We prove that the proposed framework applies directly to graphing signal processing in a natural way (Section 7), yielding a suitable spectral theory.

2 Graph Signal Processing

Consider an undirected graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) where 𝒱\mathcal{V} is the set of nodes and 𝒱×𝒱\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V} is the set of edges. Since the graph is undirected, it holds that if edge (u,v)(u,v)\in\mathcal{E} for u,v𝒱u,v\in\mathcal{V}, then (v,u)(v,u)\in\mathcal{E}. We further assume the graph to have no self-loops, i.e., (v,v)(v,v)\notin\mathcal{E} for all v𝒱v\in\mathcal{V}, and to be unweighted. An extension to weighted graphs can be found in Section 6.

The neighborhood 𝒩(𝒱)\mathcal{N}(\mathcal{V}^{\prime}) of a given collection of nodes 𝒱𝒱\mathcal{V}^{\prime}\subseteq\mathcal{V} is defined as

𝒩(𝒱)={u𝒱:(u,v) for some v𝒱}𝒱.\mathcal{N}(\mathcal{V}^{\prime})=\{u\in\mathcal{V}:(u,v)\in\mathcal{E}\text{ for some }v\in\mathcal{V}^{\prime}\}\cup\mathcal{V}^{\prime}. (1)

That is, the neighborhood of a collection of nodes is that set of nodes 𝒱\mathcal{V}^{\prime} as well as those nodes immediately connected to it. The kk-hop neighborhood 𝒩k(𝒱)\mathcal{N}^{k}(\mathcal{V}^{\prime}) can then be conveniently defined in a recursive manner as 𝒩k(𝒱)=𝒩(𝒩k1(𝒱))\mathcal{N}^{k}(\mathcal{V}^{\prime})=\mathcal{N}(\mathcal{N}^{k-1}(\mathcal{V}^{\prime})) for integers k1k\geq 1 and with 𝒩0(𝒱)=𝒱\mathcal{N}^{0}(\mathcal{V}^{\prime})=\mathcal{V}^{\prime}. Note that the kk-hop neighborhood of a singleton 𝒱={v}\mathcal{V}^{\prime}=\{v\} is denoted as 𝒩k(v)\mathcal{N}^{k}(v), and that its degree can be easily computed as deg(v)=|𝒩(v)|1\deg(v)=|\mathcal{N}(v)|-1.

Graph signals can be associated with a graph structure and are defined as a map 𝐱:𝒱\mathbf{x}:\mathcal{V}\to\mathbb{R} between the node set 𝒱\mathcal{V} and the real numbers \mathbb{R}. That is, a graph signal simply attaches a single real number [𝐱]v[{\mathbf{x}}]_{v}\in\mathbb{R} to each node of the graph v𝒱v\in\mathcal{V}. For a given graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}), we can define the space of graph signals as 𝕏(𝒢)={𝐱:𝒱}\mathbb{X}(\mathcal{G})=\{\mathbf{x}:\mathcal{V}\to\mathbb{R}\}.

A graph shift operator (GSO) 𝐒\mathbf{S} can also be associated with the graph structure 𝒢\mathcal{G} as a means of relating graph signals explicitly with the underlying graph support. More precisely, the GSO is defined as a linear operator between graph signals 𝐒:𝕏(𝒢)𝕏(𝒢)\mathbf{S}:\mathbb{X}(\mathcal{G})\to\mathbb{X}(\mathcal{G}) such that the output graph signal 𝐲=𝐒𝐱\mathbf{y}=\mathbf{S}\mathbf{x} is computed as

[𝐲]v=u𝒩(v)[𝐒]vu[𝐱]u.[{\mathbf{y}}]_{v}=\sum_{u\in\mathcal{N}(v)}{[\mathbf{S}]}_{vu}[{\mathbf{x}}]_{u}. (2)

The operation (2) shifts the signal around the graph and is analogous to the time-shift in discrete-time signal processing. Note that the GSO can be completely characterized by specifying the adjacency rule that assigns the values [𝐒]vu{[\mathbf{S}]}_{vu} for every (u,v)(u,v)\in\mathcal{E}. Examples of GSOs that have found widespread use in the literature include the adjacency matrix, the Laplacian matrix, the Markov transition matrix, and their normalized counterparts. While it is technically possible to design adjacency rules that lead to arbitrary values of [𝐒]vu{[\mathbf{S}]}_{vu}, we only consider rules that are determined exclusively by the combinatorial structure of the graph.

A graph filter can then be built from the GSO. More specifically, given a collection of (K+1)(K+1) scalar coefficients {hk}k=0K\{h_{k}\}_{k=0}^{K}, a KK-tap graph filter (𝐒):𝕏(𝒢)𝕏(𝒢)\mathcal{H}(\mathbf{S}):\mathbb{X}(\mathcal{G})\to\mathbb{X}(\mathcal{G}) is defined as the linear mapping between two graph signals 𝐲=(𝐒)𝐱\mathbf{y}=\mathcal{H}(\mathbf{S})\mathbf{x} given by

(𝐒)=k=0Khk𝐒k\mathcal{H}(\mathbf{S})=\sum_{k=0}^{K}h_{k}\mathbf{S}^{k} (3)

where 𝐒k\mathbf{S}^{k} denotes kk repeated applications of the GSO (see (2)) to the input signal 𝐱\mathbf{x}. The operator 𝐒0\mathbf{S}^{0} is understood to be the identity map on 𝕏(𝒢)\mathbb{X}(\mathcal{G}).

Graph filters are linear and local operators. They are linear in the input graph signal 𝐱\mathbf{x}. They are local in that they only require information up to the KK-hop neighborhood. More specifically, the output of the graph filter at each node [𝐲]v[{\mathbf{y}}]_{v} can be computed by KK repeated exchanges of information with one-hop neighbors 𝒩(v)\mathcal{N}(v), thus collecting the signal values contained in up to the KK-hop neighborhood 𝒩K(v)\mathcal{N}^{K}(v). Each node can compute their output separately from the rest. The key observation is that the nodes need not know the global structure of the whole graph 𝒢\mathcal{G} but only their local neighborhood. Thus, graph filters can be analyzed and understood entirely from looking at this local neighborhood structure.

3 A Local Framework for GSP

Ω1\Omega_{1}𝕏( 𝒢1)2\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1})\cong\mathbb{R}^{2}𝕏( 𝒢2)3\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2})\cong\mathbb{R}^{3}𝕏( 𝒢3)3\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{3})\cong\mathbb{R}^{3}𝕏( 𝒢4)4\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{4})\cong\mathbb{R}^{4}\dots\dots\dots\dots 𝒢1\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1} 𝒢2\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2} 𝒢3\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{3} 𝒢4\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{4} 𝖺\mathsf{a} 𝖺\mathsf{a} 𝖽\mathsf{d} 𝖽\mathsf{d} 𝗀\mathsf{g} 𝗀\mathsf{g} 𝗁\mathsf{h} 𝗁\mathsf{h} 𝖻\mathsf{b} 𝖻\mathsf{b} 𝖾\mathsf{e} 𝖾\mathsf{e} 𝖼\mathsf{c} 𝖼\mathsf{c} 𝖿\mathsf{f} 𝖿\mathsf{f} Σ1\Sigma_{1}
Figure 1: The space ΩK\Omega_{K} of rooted KK-balls and signals, constructed via the disjoint union of signal spaces enumerated by rooted KK-balls. (Left) A graph on nodes {a,b,,h}\{a,b,\ldots,h\}, with graph signal denoted by the node coloring. (Right) The space Ω1\Omega_{1}, and the distibution μ\mu of points obtained via the pushforward of Σ1\Sigma_{1}. Observe that nodes {a,d,g,h}\{a,d,g,h\} all have isomorphic rooted 11-balls, and thus all correspond to points in 𝕏( 𝒢1)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1}) (depicted here in random positions within Euclidean space). Similarly, nodes {b,e}\{b,e\} are mapped to points in 𝕏( 𝒢2)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2}), and {c,f}\{c,f\} are mapped to points in 𝕏( 𝒢4)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{4}). Since the given graph is triangle-free, no points are mapped to 𝕏( 𝒢3)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{3}).

The local nature of graph filters calls for a local framework to analyze their effects. Towards this end, define a rooted KK-ball  𝒢K(r)=( 𝒱K, K,r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(r)=(\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K},\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{E}$\kern-1.00006pt}}}_{K},r) as a graph with a root r 𝒱Kr\in\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K} such that all nodes are at most KK-hops away from the root, i.e.,  𝒱K=𝒩K(r)\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K}=\mathcal{N}^{K}(r). Note that, since the rooted KK-ball  𝒢K(r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(r) is a graph, the notions of graph shift operator and graph signal are immediately well-defined. We denote these as  𝐒K(r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}(r) and  𝐱K(r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(r), respectively. For ease of exposition, the specification of the root node will be dropped, unless necessary to avoid confusion.

Of particular interest is the case of rooted KK-balls obtained as the induced subgraph of the KK-hop neighborhood of a node. More specifically, given a graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}), a rooted KK-ball  𝒢K\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K} can be constructed by selecting a node r𝒱r\in\mathcal{V}, setting  𝒱K𝒩K(r)\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K}\equiv\mathcal{N}^{K}(r) and  K𝒩K(r)×𝒩K(r)\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{E}$\kern-1.00006pt}}}_{K}\equiv\mathcal{E}\cap\mathcal{N}^{K}(r)\times\mathcal{N}^{K}(r). In this context, the graph signal  𝐱K\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K} corresponding to the rooted KK-ball can be tied to the graph signal 𝐱\mathbf{x} defined on 𝒢\mathcal{G} by setting [ 𝐱K]v=[𝐱]vfor allv𝒩K(r)[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}]_{v}=[{\mathbf{x}}]_{v}\ \text{for all}\ v\in\mathcal{N}^{K}(r).

We can now formalize the notion that a graph filter only relies on local information.

Proposition 1.

Let 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) be a graph with graph signal 𝐱\mathbf{x} and a GSO 𝐒\mathbf{S}. For a node r𝒱r\in\mathcal{V}, denote the corresponding rooted KK-ball as  𝒢K\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}, with graph signal  𝐱K\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K} and GSO  𝐒K\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}, respectively. Then, for any KK-tap filter \mathcal{H}, the following equality holds

[(𝐒)𝐱]r=[( 𝐒K) 𝐱K]r.\big{[}\mathcal{H}(\mathbf{S})\mathbf{x}\big{]}_{r}=\big{[}\mathcal{H}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K})\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}\big{]}_{r}. (4)

We note that for (4) to hold, the adjacency rule used to construct 𝐒\mathbf{S} and  𝐒K\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K} has to be the same, and it is assumed to depend only on the combinatorial structure of the graphs; see Section 2.

Proposition 1 formalizes the well-known fact that a KK-tap graph filter evaluated at a node only depends on the subgraph of KK-hops centered at that node (i.e., the ego-network of the node [22]). In this sense, it is not necessary to understand how a graph filter behaves on the global structure of a graph. Rather, one only needs to understand the behavior of a graph filter locally, in particular on the rooted KK-balls of the graph. This naturally leads to a weaker form of the property of permutation equivariance, discussed next.

For two graphs 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) and 𝒢=(𝒱,)\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},\mathcal{E}^{\prime}), we say that a map ϕ:𝒱𝒱\phi:\mathcal{V}\to\mathcal{V}^{\prime} is a KK-morphism if it preserves the structure of rooted KK-balls for all nodes in 𝒱\mathcal{V}. The structure of two rooted KK-balls  𝒢K=( 𝒱K, K,r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}=(\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K},\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{E}$\kern-1.00006pt}}}_{K},r) and  𝒢K=( 𝒱K, K,r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,^{\prime}_{K}=(\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}^{\prime}_{K},\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{E}$\kern-1.00006pt}}}^{\prime}_{K},r^{\prime}) with signals  𝐱K\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K} and  𝐱K\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}^{\prime}, respectively, is preserved, if there exists an isomorphism ψ:𝒱K𝒱K\psi:\mathcal{V}_{K}\to\mathcal{V}^{\prime}_{K} such that ψ(r)=r\psi(r)=r^{\prime}, (ψ(u),ψ(v))K(\psi(u),\psi(v))\in\mathcal{E}^{\prime}_{K} if and only if (u,v)K(u,v)\in\mathcal{E}_{K}, and [ 𝐱K]v=[ 𝐱K]ψ(v)[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}]_{v}=[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}^{\prime}]_{\psi(v)} for all v 𝒱Kv\in\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}_{K}. Then, a KK-morphism ϕ:𝒱𝒱\phi:\mathcal{V}\to\mathcal{V}^{\prime} between two graphs 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) and 𝒢=(𝒱,)\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},\mathcal{E}^{\prime}) with associated signals 𝐱\mathbf{x} and 𝐱\mathbf{x}^{\prime}, respectively, satisfies that the rooted KK-balls ( 𝒢K(v), 𝐱K(v))(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v)) and ( 𝒢K(ϕ(v)), 𝐱K(ϕ(v)))(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}^{\prime}(\phi(v)),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}^{\prime}(\phi(v))) are isomorphic for all nodes v𝒱v\in\mathcal{V}. Note that KK-morphisms are not necessarily invertible, nor are they necessarily surjective.

Graph filters are invariant under KK-morphisms, as it follows from Proposition 1, where we see that the same filter applied to two different graphs yields the same result if the local structures are the same.

Corollary 1.

Consider two graphs 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) and 𝒢=(𝒱,)\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},\mathcal{E}^{\prime}) with corresponding signals 𝐱\mathbf{x} and 𝐱\mathbf{x}^{\prime}. Denote by  𝒢K(v)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v) the KK-rooted ball for root v𝒱v\in\mathcal{V}, and by  𝐒K(v)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}(v) and  𝐱K(v)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v) the corresponding GSO and graph signal, respectively. If there exists a KK-morphism ϕ:𝒱𝒱\phi:\mathcal{V}\to\mathcal{V}^{\prime}, then for any KK-hop filter \mathcal{H} it holds that

[( 𝐒K(v)) 𝐱K(v)]v=[( 𝐒K(ϕ(v))) 𝐱K(ϕ(v))]ϕ(v)\Big{[}\mathcal{H}\Big{(}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}\big{(}v\big{)}\Big{)}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}\big{(}v\big{)}\Big{]}_{v}=\Big{[}\mathcal{H}\Big{(}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}\big{(}\phi(v)\big{)}\Big{)}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}\big{(}\phi(v)\big{)}\Big{]}_{\phi(v)} (5)

for all v𝒱v\in\mathcal{V}.

Note that Corollary 1 is a generalization of the permutation equivariance property that graph filters have [7]. More generally, Corollary 1 opens up ways to compare the performance of a fixed graph filter across two different graphs with different associated graph signals. It would be expected then, that if two graphs have similar distributions of rooted KK-balls with associated signal, then a fixed graph filter acting on one should yield similar results on the other. We formalize these notions in the ensuing discussion.

3.1 Distributions of Rooted Balls

Proposition 1 states that the output of a KK-tap graph filter at each node depends only on the rooted KK-ball at that node and the corresponding graph signal values. Therefore, to characterize the effect of graph filtering, it suffices to characterize the distribution of rooted KK-balls of a graph. Towards this end, we will construct a sample space, σ\sigma-field, and a corresponding measure to describe this distribution.

Denote by  𝒢k\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k} a rooted kk-ball and by 𝕏( 𝒢k)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k}) its corresponding signal space. Consider now the space constructed by the disjoint union of all signal spaces on all possible rooted kk-balls (up to isomorphism) with kKk\leq K, given by111In this context, each signal space 𝕏( 𝒢k)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k}) is considered modulo the action of the automorphism group of  𝒢k\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k}. This is clarified in Appendix A.

ΩK= 𝒢k:kK𝕏( 𝒢k).\Omega_{K}=\coprod_{\,\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathcal{G}$\kern-0.70004pt}}}\,_{k}:k\leq K}\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{k}). (6)

The construction of the space ΩK\Omega_{K} is illustrated in Fig. 1. Essentially, one considers a different space 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) (typically, | 𝒱||\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}|-dimensional vectors    𝐱\mathbf{x}  ) for each possible rooted KK-ball    𝒢\mathcal{G}  , and then puts them all into a common space where each original space is isolated. An element ωΩK\omega\in\Omega_{K} is described by both the rooted KK-ball and the graph signal space it indexes, i.e., ω=( 𝒢,𝕏( 𝒢))\omega=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)). Making the identification 𝕏( 𝒢)| 𝒱|\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)\equiv\mathbb{R}^{|\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathcal{V}$\kern-0.70004pt}}}|}, the point ω\omega can be thought as a | 𝒱||\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}|-dimensional vector associated with the rooted KK-ball    𝒢\mathcal{G}  . Note that, while it may happen that 𝕏( 𝒢)𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)\cong\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,^{\prime}) because | 𝒱|=| 𝒱||\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}|=|\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}^{\prime}|, these are different component spaces in the disjoint union ΩK\Omega_{K} as they are indexed by different rooted balls    𝒢\mathcal{G}   and  𝒢\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,^{\prime}, respectively.

To define the corresponding σ\sigma-field σ(ΩK)\sigma(\Omega_{K}) associated with ΩK\Omega_{K}, note that the disjoint union of graph signal spaces (i.e., Euclidean spaces) has a natural topology defined over it, namely the disjoint union topology. Thus, the Borel σ\sigma-field σ(ΩK)\sigma(\Omega_{K}) generated from the topology of ΩK\Omega_{K} is a proper σ\sigma-field. Intuitively, measurable sets in ΩK\Omega_{K} are constructed by selecting some rooted KK-ball    𝒢\mathcal{G}   and a Borel set in the space of graph signals 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,), then considering the corresponding set in ΩK\Omega_{K}.

A measure μ\mu can be properly defined now that the σ\sigma-field associated with ΩK\Omega_{K} has been constructed. Let ΣK:𝒱ΩK\Sigma_{K}:\mathcal{V}\to\Omega_{K} be a map that takes a node v𝒱v\in\mathcal{V} and returns the corresponding rooted KK-ball and signal, i.e.,

ΣK(v)=( 𝒢K(v), 𝐱K(v)).\Sigma_{K}(v)=\big{(}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v)\big{)}. (7)

Note that ΣK(v)ΩK\Sigma_{K}(v)\in\Omega_{K}. Then, for a Borel set 𝒜σ(ΩK)\mathcal{A}\in\sigma(\Omega_{K}), the measure μ:σ(ΩK)[0,1]\mu:\sigma(\Omega_{K})\to[0,1] is defined as

μ(𝒜)=1|𝒱||{v𝒱:ΣK(v)𝒜}|.\mu(\mathcal{A})=\frac{1}{|\mathcal{V}|}\big{|}\big{\{}v\in\mathcal{V}:\Sigma_{K}(v)\in\mathcal{A}\big{\}}\big{|}. (8)

Recall that each element in ΩK\Omega_{K} can be thought of as a rooted KK-ball with an associated signal. Formally, μ\mu is the pushforward of the uniform measure on the nodes of 𝒢\mathcal{G} by the sampling map ΣK\Sigma_{K}, which can be denoted as μ=(ΣK)(𝒢,𝐱)\mu=(\Sigma_{K})_{*}(\mathcal{G},\mathbf{x}) [23]. We observe that any measure on the nodes of the graph can be used to replace the RHS of (8).

The space ΩK\Omega_{K} serves as a natural domain in which to characterize KK-tap graph filters. Indeed, Proposition 1 indicates that the behavior of a KK-tap graph filter can be fully characterized by its behavior over ΩK\Omega_{K}. That is to say, any given KK-tap graph filter \mathcal{H} has a corresponding map  :ΩK\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{H}$\kern-1.00006pt}}}\,:\Omega_{K}\to\mathbb{R} such that for any graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) with GSO 𝐒\mathbf{S} and signal 𝐱\mathbf{x}, (4) can be rewritten as:

 (ΣK(r))=[(𝐒)𝐱]r.\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{H}$\kern-1.00006pt}}}\,(\Sigma_{K}(r))=[\mathcal{H}(\mathbf{S})\mathbf{x}]_{r}. (9)

To use the terminology of quotient spaces [24], (9) indicates that a KK-tap graph filter passes, or descends, to a unique map on ΩK\Omega_{K} via the map ΣK\Sigma_{K}. Thus, in the context of Proposition 1 and graph filtering, it is sufficient to characterize local operations over ΩK\Omega_{K}, rather than with respect to the global structure of a given graph 𝒢\mathcal{G}.

Now that the local framework based on rooted balls has been introduced, we proceed to show how it can be leveraged to obtain theoretical results that provide insight into the inner workings of graph signal processing. In Section 4 we discuss how to understand power spectrum densities, while in Section 5 we discuss transferability of graph filters across graphs of different size.

Remark 1 (Distributions of edge sets).

In [25], the authors consider a pushforward probability measure into the Fourier domain when there is an underlying probability distribution of edges in a fixed graph. This allows for the modeling of graph signals under uncertainty on the edge set. Such a distribution of edges can be incorporated into the distribution μ\mu via the pushforward map (ΣK)(\Sigma_{K})_{*}, allowing for a description of the distribution of rooted balls in a graph under a probability distribution on the edges.

Remark 2 (Rooted subtrees).

Many authors have considered graph processing methods that are equivariant to local subtree structures [19], such as architectures whose computation resembles the 11-Weisfeiler-Lehman test[13]. In the proposed framework, these subtree structures are strictly “less expressive” than rooted ball structures, in the sense that the rooted subtree of depth KK centered at a given node can be determined completely by the rooted KK-ball centered at that node. However, the rooted KK-ball cannot be determined by the rooted subtree of depth KK, as evidenced by the inability of the 11-Weisfeiler-Lehman test to distinguish certain structures [26]. With this in mind, the proposed framework could certainly be applied where the space ΩK\Omega_{K} is constructed with rooted subtrees, rather than rooted balls. However, given that these rooted balls are more expressive than rooted subtrees [19], we state all proceeding results using the rooted ball construction.

4 Power Spectral Density

One of the key ideas in graph signal processing is that of the graph Fourier transform of a graph signal [2]. For a graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) on nn nodes, consider the GSO 𝐒\mathbf{S} to be the graph Laplacian, which is a positive semidefinite Hermitian matrix. Thus, it admits the following eigendecomposition

𝐒=j=1nλj𝐮j𝐮j𝖳,\mathbf{S}=\sum_{j=1}^{n}\lambda_{j}\mathbf{u}_{j}\mathbf{u}_{j}^{\mathsf{T}}, (10)

for eigenvalues 0=λ1λn0=\lambda_{1}\leq\ldots\leq\lambda_{n} with corresponding pairwise orthogonal eigenvectors 𝐮1,,𝐮n\mathbf{u}_{1},\ldots,\mathbf{u}_{n}. One can check that for any of these eigenvectors, 𝐮j,𝐒𝐮j=λj\langle\mathbf{u}_{j},\mathbf{S}\mathbf{u}_{j}\rangle=\lambda_{j}. The graph Laplacian induces a quadratic form on a graph signal that measures a useful notion of smoothness [2]. For a given graph signal 𝐱\mathbf{x} on 𝒢\mathcal{G}, the quadratic form can be shown to be equal to

𝐱,𝐒𝐱=v𝒱u𝒩(v)([𝐱]v[𝐱]u)2.\langle\mathbf{x},\mathbf{S}\mathbf{x}\rangle=\sum_{v\in\mathcal{V}}\sum_{u\in\mathcal{N}(v)}\left([{\mathbf{x}}]_{v}-[{\mathbf{x}}]_{u}\right)^{2}. (11)

That is, the quadratic form of the graph Laplacian measures the sum of the squared differences of the graph signal at neighboring pairs of nodes. A signal is called smooth if the quadratic form takes a small value relative to its norm. For instance, the eigenvectors of 𝐒\mathbf{S} with small corresponding eigenvalues are smooth signals.

We note that this notion of smoothness can be extended to normalized versions of the graph Laplacian in a similar fashion. Furthermore, a notion of smoothness can also be defined using the adjacency matrix [27]. In any case, for ease of exposition and conceptual simplicity, we assume that 𝐒\mathbf{S} is the graph Laplacian throughout this section, but we remark that the results derived herein are also valid for any other GSO built from adjacency rules that rely on the combinatorial structure of the graph.

Since the eigenvectors 𝐮1,,𝐮n\mathbf{u}_{1},\ldots,\mathbf{u}_{n} form an orthonormal basis for n\mathbb{R}^{n}, any graph signal 𝐱\mathbf{x} admits a unique representation as a linear combination of the eigenvectors of 𝐒\mathbf{S}. That is, for any graph signal 𝐱\mathbf{x}, there exists a set of coefficients {x~j}j=1n\{\tilde{x}_{j}\}_{j=1}^{n} such that the following holds:

𝐱\displaystyle\mathbf{x} =j=1nx~j𝐮j\displaystyle=\sum_{j=1}^{n}\tilde{x}_{j}\mathbf{u}_{j} (12)
𝐱,𝐒𝐱\displaystyle\langle\mathbf{x},\mathbf{S}\mathbf{x}\rangle =j=1nλjx~j2.\displaystyle=\sum_{j=1}^{n}\lambda_{j}\tilde{x}_{j}^{2}. (13)

Thus, the representation of a graph signal as a weighted sum of the eigenvectors can be used to conveniently compute the quadratic form in terms of the spectrum. In an analogy to complex exponential functions being the eigenfunctions of the Laplace operator in discrete-time signal processing, we call the representation of the graph signal 𝐱\mathbf{x} by the coefficients x~j\tilde{x}_{j} the graph Fourier transform (GFT).

When coupled with their corresponding eigenvalues λj\lambda_{j}, the coefficients x~j\tilde{x}_{j} can be used to describe the distribution of energy in a graph signal, in a way concordant with the notion of smoothness described by the Laplacian quadratic form (11). To capture this, define the normalized power spectral distribution of a graph signal . Given a graph signal 𝐱\mathbf{x} with Fourier coefficients x~j\tilde{x}_{j} and corresponding Laplacian eigenvalues 0=λ1λn0=\lambda_{1}\leq\ldots\leq\lambda_{n}, define

P𝐱:\displaystyle P_{\mathbf{x}}:\mathbb{R} +\displaystyle\to\mathbb{R}^{+} (14)
λ\displaystyle\lambda 1nj:λjλx~j2.\displaystyle\mapsto\frac{1}{n}\sum_{j:\lambda_{j}\leq\lambda}\tilde{x}_{j}^{2}.

One can easily see that P𝐱P_{\mathbf{x}} is a monotone, right-continuous function, with P𝐱(λ)=0P_{\mathbf{x}}(\lambda)=0 for λ<0\lambda<0, and at most nn discontinuities (one at each eigenvalue λj\lambda_{j}). Moreover, there is a convenient expression for the moments of P𝐱P_{\mathbf{x}}:

𝐦K(𝐱)λKdP𝐱(λ)=1n𝐱,𝐒K𝐱.\mathbf{m}_{K}(\mathbf{x})\coloneqq\int_{\mathbb{R}}\lambda^{K}\mathrm{d}P_{\mathbf{x}}(\lambda)=\frac{1}{n}\langle\mathbf{x},\mathbf{S}^{K}\mathbf{x}\rangle. (15)

That is to say, the moments of the normalized power spectral distribution are given by scaled quadratic forms of powers of the GSO. Observe that the quadratic form (11) can be expressed as 𝐱,𝐒𝐱=n𝐦1(𝐱)\langle\mathbf{x},\mathbf{S}\mathbf{x}\rangle=n\cdot\mathbf{m}_{1}(\mathbf{x}). Indeed, the moments of the graph Fourier transform are an essential quantity in the design of low-order graph filters [28]. This points to a very useful idea: the powers of the GSO are the most primitive types of graph filters, so the proposed local framework provides a path for understanding these moments in terms of functions on the space of distributions of rooted balls.

To see this, let us examine (15). Notice that

1n𝐱,𝐒K𝐱=1nv𝒱[𝐱]v[𝐒K𝐱]v.\frac{1}{n}\langle\mathbf{x},\mathbf{S}^{K}\mathbf{x}\rangle=\frac{1}{n}\sum_{v\in\mathcal{V}}[{\mathbf{x}}]_{v}[\mathbf{S}^{K}\mathbf{x}]_{v}. (16)

As before, let  𝐒K(v)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}(v) be the GSO of the KK-ball rooted at node vv. By Proposition 1, the above equation can be written as

1nv𝒱[𝐱]v[𝐒K𝐱]v=1nv𝒱[ 𝐱K]v[ 𝐒KK(v) 𝐱K(v)]v.\frac{1}{n}\sum_{v\in\mathcal{V}}[{\mathbf{x}}]_{v}[\mathbf{S}^{K}\mathbf{x}]_{v}=\frac{1}{n}\sum_{v\in\mathcal{V}}[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}]_{v}[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}^{K}(v)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v)]_{v}. (17)

That is, the moments of the normalized power spectral distribution of a graph signal can be written as an average of operations that resemble local filtering. In the proposed local framework, this has the following interpretation.

Proposition 2.

Let 𝒢\mathcal{G} be a graph and 𝐱\mathbf{x} be a graph signal. For a given K0K\geq 0, define the map

 𝐦K:ΩK\displaystyle\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}:\Omega_{K} \displaystyle\to\mathbb{R} (18)
((𝒱,,r), 𝐲)\displaystyle((\mathcal{V},\mathcal{E},r),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{y}$\kern-1.00006pt}}}\,) [ 𝐲]r[ 𝐒KK 𝐲]r,\displaystyle\mapsto[{\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{y}$\kern-1.00006pt}}}\,}]_{r}[\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K}^{K}\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{y}$\kern-1.00006pt}}}\,]_{r}, (19)

where  𝐒K\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,_{K} denotes the Laplacian of the rooted KK-ball.

Letting μ\mu be the probability distribution on ΩK\Omega_{K} determined by (𝒢,𝐱)(\mathcal{G},\mathbf{x}), the KthK^{th} moment of the normalized power spectral distribution P𝐱P_{\mathbf{x}} is given by the equation

𝐦K(𝐱)=ΩK 𝐦K(ω)dμ(ω)=𝔼μ[ 𝐦K].\mathbf{m}_{K}(\mathbf{x})=\int_{\Omega_{K}}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}(\omega)\mathrm{d}\mu(\omega)=\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]. (20)

Having reduced the moments of the normalized power spectral distribution to an average over the distribution of rooted balls, we can now reason about the convergence of graph Fourier distributions in terms of weak convergence of measure for graphs with bounded degree DD. Although the limit of dense graphs has been considered before via graphon models [29, 30], this cannot capture the behavior of bounded degree graphs, as all infinite sequences of growing graphs of bounded degree converge to the zero graphon [31]. However, in the case of sparse graphs, understanding the descension of maps to the space ΩK\Omega_{K} is a feasible approach, with the following compactness property.

Lemma 1.

For given integers K0,D1K\geq 0,D\geq 1 and any collection of graphs in 𝔾D\mathbb{G}_{D} with uniformly bounded signals, there exists a compact subspace ΓΩK\Gamma\subset\Omega_{K} such that the support of the probability measures corresponding to the graphs/graph signals is contained in Γ\Gamma.

The proof of Lemma 1 can be found in Appendix C. The constraint of a graph having bounded degree, for instance coming from physical constraints in a real-world system, corresponds to a compact description in the space ΩK\Omega_{K}. From this, the following convergence result holds.

Theorem 1.

Let integers K0,D1K\geq 0,D\geq 1 be given. Let {(𝒢j,𝐱j)}j=1\{(\mathcal{G}_{j},\mathbf{x}_{j})\}_{j=1}^{\infty} be a sequence of graphs and graph signals such that each 𝒢j𝔾D\mathcal{G}_{j}\in\mathbb{G}_{D} and the collection of signals 𝐱j\mathbf{x}_{j} is uniformly bounded. Let μj\mu_{j} be the probability measure on ΩK\Omega_{K} associated with (𝒢j,𝐱j)(\mathcal{G}_{j},\mathbf{x}_{j}) for each jj, and PjP_{j} the corresponding normalized power spectral distribution function.

If the measures μj\mu_{j} converge weakly, then the KKth moments of the normalized power spectral distribution functions converge. Moreover, if weak convergence of measure holds for all K0K\geq 0, then the normalized power spectral distribution functions converge weakly.

The proof of Theorem 1 can be found in Appendix B. Theorem 1 casts the power spectral density of a graph signal in terms of the distribution of local structures in the graph. In particular, we treat the power spectral density as a function from graphs with graph signals to Borel measures on a compact subset of the real line, then prove that this function is continuous with respect to the distribution of rooted balls in the graph, where continuity is understood to be with respect to the weak topology for both the distribution in ΩK\Omega_{K} and the power distribution of the signal. Although the frequency domain representation of a graph signal is typically understood to be a representation in terms of the global graph structure, owing to the global support of the graph Laplacian’s eigenvectors, this results indicates that it is in essence still subject to fundamentally local phenomena, namely the rooted balls in the graph.

This characterization of the power spectral distribution in terms of the distribution of rooted balls also reflects the local nature of graph filtering. Indeed, as discussed in [28], the moments of the power spectral distribution are key in the understanding of graph filters. More broadly, when designing a graph filter, we often aim to construct a matched filter in the frequency domain, so that the response of the filter aligns with that of the signal, subject to some noise model (additive white Gaussian noise, for instance). Additionally, constraints such as a filter having KK taps reduces the space of possible frequency responses to degree KK polynomials in the frequency domain, so that the moments become the basic values of interest. This is an immediate consequence of graph filters admitting a distributed implementation [28], so that the expression of a graph filter’s performance in terms of the power spectrum of the signal is actually a distillation of the distribution of rooted balls in ΩK\Omega_{K}.

We have shown how connecting the moments of the power spectral distribution to the space ΩK\Omega_{K} yields insights about how features of graphs and graph signals are dependent only on localized information. In particular, a function  𝐦K:ΩK\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}:\Omega_{K}\to\mathbb{R} was constructed whose expected value under the pushforward measure of a graph computes the moments of the power spectral distribution. This machinery can be immediately extended to other functions on graphs that act locally, allowing us to compare the behavior of a given function on two graphs in terms of their distributions of rooted balls.

5 Transferability

Let JJ be a function mapping a graph and its corresponding graph signal to a real number. This function is typically known as a graph summary and examples include the graph power spectral distribution discussed in the previous section, as well as the average node degree or node centrality values [32], the clustering coefficient [33], or the conductance [34]. We can also consider cost functions for a given task as graph summaries, since they ultimately take a graph and its signal as input, and output a real number [35, 36]. In this context, we want to study how the function JJ changes across two different graphs with different graph signals (𝒢1,𝐱1)(\mathcal{G}_{1},\mathbf{x}_{1}) and (𝒢2,𝐱2)(\mathcal{G}_{2},\mathbf{x}_{2}).

To study the transference of the graph summary JJ across different graphs and graph signals, let us assume that there exists a function  J:ΩK\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}:\Omega_{K}\to\mathbb{R} such that for any pair (𝒢,𝐱)(\mathcal{G},\mathbf{x}) whose associated probability distribution on ΩK\Omega_{K} is denoted by μ\mu, we have J(𝒢,𝐱)=𝔼μ[ J]J(\mathcal{G},\mathbf{x})=\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]. This assumption is not too severe, essentially saying that the graph summary is an average over a function on the nodes, with the value at each node determined strictly by the structure of its local neighborhood. Then, consider the transfer equation

|J(𝒢1,𝐱1)J(𝒢2,𝐱2)|=|𝔼μ[ J]𝔼ν[ J]|.\left|J(\mathcal{G}_{1},\mathbf{x}_{1})-J(\mathcal{G}_{2},\mathbf{x}_{2})\right|=\left|\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\right|. (21)

as a way of measuring the transferability of JJ, and where μ\mu and ν\nu are the distributions of rooted balls on the space ΩK\Omega_{K} of the graphs (𝒢1,𝐱1)(\mathcal{G}_{1},\mathbf{x}_{1}) and (𝒢2,𝐱2)(\mathcal{G}_{2},\mathbf{x}_{2}), respectively.

To understand (21) in a quantitative manner demands a stronger structure on ΩK\Omega_{K} than the disjoint union topology. To this end, we then endow ΩK\Omega_{K} with the structure of a metric space, in a way that preserves the original topology. For an arbitrary constant C>0C>0, define a metric dCd_{C} on ΩK\Omega_{K} as follows. For ω1=( 𝒢1, 𝐱1)\omega_{1}=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1},\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{1}) and ω2=( 𝒢2, 𝐱2)ΩK\omega_{2}=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2},\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{2})\in\Omega_{K}, put

dC(ω1,ω2)={min{ 𝐱1 𝐱22,2C} 𝒢1= 𝒢2Cotherwise,d_{C}(\omega_{1},\omega_{2})=\begin{cases}\min\{\|\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{1}-\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{2}\|_{2},2C\}&\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1}=\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2}\\ C&\text{otherwise},\end{cases} (22)

where 2\|\cdot\|_{2} is inherited from the identification of 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) with Euclidean space.222This metric is again understood modulo the automorphism group of the rooted ball, discussed in Appendix A. Note that 2\|\cdot\|_{2} is well defined since both 𝐱1\mathbf{x}_{1} and 𝐱2\mathbf{x}_{2} are vectors in the same | 𝒱||\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{V}$\kern-1.00006pt}}}|-dimensional space when  𝒢1= 𝒢2\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{1}=\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{2}. One can check that dCd_{C} is indeed a metric on ΩK\Omega_{K} for any C>0C>0. The metric dCd_{C} is constructed in a way that preserves the local metric structure of each signal space 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,), while also endowing the whole space ΩK\Omega_{K} with a global metric structure, at the cost of some local distortion in order to maintain the triangle inequality, as dictated by the constant CC. When CC is chosen to be very large, the local metric structure on each rooted ball of the space is well-preserved, but these rooted balls are otherwise kept “far apart” from each other. On the other hand, when CC is very small, the rooted balls become “close,” but the local metric structure on each of them is distorted (i.e. signals belonging to the same rooted ball that are beyond a distance of 2C2C are indistinguishable from signals that belong to different rooted balls). Importantly, the metric topology on ΩK\Omega_{K} induced by dCd_{C} is equal to the disjoint union topology (6), so that all notions of continuity and weak convergence of measure are preserved.

Given the additional metric structure on ΩK\Omega_{K}, we can quantitatively characterize the transfer equation (21) by comparing distributions of rooted balls on the metric space (ΩK,dC)(\Omega_{K},d_{C}). To do so, we make the following assumption on the function  JJ .

Assumption 1.

For all rooted KK-balls  𝒢 𝔾D\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,\in\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D},  J( 𝒢, 𝐱)\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,) is LL-Lipschitz continuous with respect to the second argument    𝐱\mathbf{x}   over the space of bounded signals 𝕏( 𝒢,[1,1])\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,[-1,1]).

This is not a difficult assumption to satisfy. For instance, it is sufficient for  JJ to be continuously differentiable on each signal space for this to hold. In fact, the set of Lipschitz continuous functions on a compact space is dense in the set of continuous functions with respect to the uniform norm [37].

Theorem 2.

Let graphs 𝒢1,𝒢2𝔾D\mathcal{G}_{1},\mathcal{G}_{2}\in\mathbb{G}_{D} with corresponding bounded signals 𝐱1,𝐱2\mathbf{x}_{1},\mathbf{x}_{2} be given. Denote their corresponding measures in ΩK\Omega_{K} by μ\mu and ν\nu, respectively. For a function  J:ΩK[0,1]\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}:\Omega_{K}\to[0,1] that satisfies 1, it holds that

|𝔼μ[ J]𝔼ν[ J]|LW1(μ,ν;1L),\big{|}\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\big{|}\leq L\cdot W_{1}\left(\mu,\nu;\frac{1}{L}\right), (23)

where W1(μ,ν;C)W_{1}(\mu,\nu;C) denotes the 11-Wasserstein distance between μ\mu and ν\nu under the metric dCd_{C}.

The proof of Theorem 2 can be found in Appendix D. Theorem 2 bounds the transfer equation (21) in terms of the 11-Wasserstein distance between the two distributions of rooted balls. The appearance of the 11-Wasserstein distance in this setting is intuitive: the dual formulation of the 11-Wasserstein distance defines the distance between two distributions via an integral probability metric over 11-Lipschitz functions [38], yielding a transferability bound that holds for all Lipschitz functions. We illustrate the notion of the 11-Wasserstein distance between graphs in Fig. 2.

Ω1\Omega_{1}(a)(b)(c)Ω1\Omega_{1}Ω1\Omega_{1}Ω1\Omega_{1}(a)(b)(c)
Figure 2: The 11-Wasserstein distance reflects structural similarity between graphs. (Top) Three graphs (a,b,c) with signals indicated by node coloring. (Bottom) Distributions of rooted 11-balls in each graph (a,b,c), represented by histograms. The xx-axis corresponds to the space Ω1\Omega_{1}, and the yy-axis shows the density of each rooted 11-ball in the graph. The 11-Wasserstein distance, then, essentially describes the distance between histograms, subject to a metric structure on Ω1\Omega_{1}. For instance, graph (a) is expected to have a large 11-Wasserstein distance from graphs (b,c), since there is minimal overlap between their histograms. On the other hand, graphs (b,c) have substantial overlap, so their 11-Wasserstein distance will be smaller.

5.1 Examples and Discussion

Theorem 2 indicates that smoother functions  JJ generally transfer better than ones with large Lipschitz constant. Some examples of graph summaries that fulfill the conditions of Theorem 2 follow.

Example: Power spectral distribution. For any K0K\geq 0, the spectral moment function 𝐦K\mathbf{m}_{K} as defined in Section 4 descends to the expectation of the function  𝐦K:ΩK\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}:\Omega_{K}\to\mathbb{R}. For any rooted KK-ball    𝒢\mathcal{G}  , it holds that  𝐦K( 𝒢,)\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\cdot) is a polynomial of the graph signal in the second argument, and can thus be shown to satisfy 1. Thus, the difference in the spectral moments of two graphs can be bounded in terms of the Wasserstein distance of their respective distributions of rooted balls. Moreover, observe that for KM0K\geq M\geq 0, the moment function  𝐦M\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{M} can be defined on ΩK\Omega_{K} in a way that preserves the identity (20), so that we can compare the similarity of spectral moments between graph signals by comparing the Lipschitz constants of the functions  𝐦K\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K} and  𝐦M\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{M} on the same space ΩK\Omega_{K}. Indeed, one can see that  𝐦K\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K} has a larger Lipschitz constant than  𝐦M\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{M}, so that the bound given by Theorem 2 is tighter for the lower moments of the power spectral distribution, contingent upon a relatively small 11-Wasserstein distance between the distributions of rooted balls.

Example: MSE of a graph filter. Let \mathcal{H} be a fixed KK-tap graph filter, and let σ2>0\sigma^{2}>0 be given. For any given graph 𝒢\mathcal{G} on nn nodes with associated signal 𝐱\mathbf{x} and shift operator 𝐒\mathbf{S}, put the function JJ as

J(𝒢,𝐱)=𝔼[1n𝐱(𝐒)(𝐱+𝜼)22],J(\mathcal{G},\mathbf{x})=\mathbb{E}\left[\frac{1}{n}\big{\|}\mathbf{x}-\mathcal{H}(\mathbf{S})(\mathbf{x}+\boldsymbol{\eta})\big{\|}_{2}^{2}\right], (24)

where the expectation is taken over the random vector 𝜼𝒩(0,σ2𝐈)\boldsymbol{\eta}\sim\mathcal{N}(0,\sigma^{2}\mathbf{I}). In words, JJ is the mean squared error when applying \mathcal{H} to (𝒢,𝐱)(\mathcal{G},\mathbf{x}) under additive white Gaussian noise. Define  JJ on rooted KK-balls  𝒢=(𝒱,,r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,=(\mathcal{V},\mathcal{E},r) with shift operator    𝐒\mathbf{S}   as

 J( 𝒢, 𝐱)=([ 𝐱]r[( 𝐒) 𝐱]r)2+σ2[(( 𝐒))2]r.\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,)=([{\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,}]_{r}-[{\mathbf{\mathcal{H}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,}}]_{r})^{2}+\sigma^{2}[(\mathcal{H}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,))^{2}]_{r}. (25)

If we denote the measure on ΩK\Omega_{K} associated with (𝒢,𝐱)(\mathcal{G},\mathbf{x}) by μ\mu, one can show that

J(𝒢,𝐱)=𝔼μ[ J].J(\mathcal{G},\mathbf{x})=\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]. (26)

Observe that  J( 𝒢, 𝐱)\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,) is a polynomial of the values of    𝐱\mathbf{x}   for each rooted KK-ball    𝒢\mathcal{G}  , so that it is Lipschitz continuous on each bounded signal space, thus satisfying 1. Moreover, the Lipschitz constant of  JJ over each signal space 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) can be shown to be proportional to (𝐈( 𝐒))𝜹r2\|(\mathbf{I}-\mathcal{H}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{S}$\kern-1.00006pt}}}\,))\boldsymbol{\delta}_{r}\|_{2}, where 𝜹r\boldsymbol{\delta}_{r} is the signal taking value 11 on the root node and 0 elsewhere. That is to say, the Lipschitz constant is directly influenced by the “spread” of the impulse response of the filter. Intuitively, this indicates that while filters with large coefficients in their higher-order terms may suit a particular signal well, they may transfer across graphs poorly. This aligns with existing results on the stability of graph filters [7, 39].

Estimation of the transferability bound. The computation of the 11-Wasserstein distance in (23) may prove problematic for large graphs. However, given the definition of the 11-Wasserstein distance as an integral probability metric, it is not difficult to compute an emprical estimate of W1(μ,ν;C)W_{1}(\mu,\nu;C). By [38, Theorem 6], if {ωjμ}j=1m\{\omega_{j}^{\mu}\}_{j=1}^{m} and {ωjν}j=1n\{\omega_{j}^{\nu}\}_{j=1}^{n} are i.i.d. samples drawn from the respective distributions μ\mu and ν\nu, then the 11-Wasserstein distance between the empirical distributions can be computed via a linear program. To draw i.i.d. samples from these distributions, one can choose a node uniformly at random (with replacement) from the node set of the underlying graph and then take the rooted KK-ball via the map ΣK\Sigma_{K}. By [38, Corollary 10], the compact support of the distributions μ,ν\mu,\nu guarantees almost sure convergence of the empirical estimator to the true Wasserstein distance as m,nm,n\to\infty.

Tighter bounds. In Theorem 2, the difference in expectation between two distributions of rooted balls is characterized in terms of their Wasserstein distance, which is defined in terms of a metric imposed upon ΩK\Omega_{K}. Given that the metric dCd_{C} is defined for essentially arbitrary CC, one can derive a tighter bound on the transfer equation, with less stringent assumptions on the function  JJ . First, note that for two graphs 𝒢1,𝒢2𝔾D\mathcal{G}_{1},\mathcal{G}_{2}\in\mathbb{G}_{D} and bounded signals 𝐱1,𝐱2\mathbf{x}_{1},\mathbf{x}_{2}, the corresponding measures μ,ν\mu,\nu have support in the compact subspace ΓΩK\Gamma\subset\Omega_{K}, where Γ\Gamma is as described in Lemma 1. Because of this, any continuous function on Γ\Gamma is bounded. For a function  J:ΩK\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}:\Omega_{K}\to\mathbb{R} satisfying 1, define the range of  JJ as

A=supωΓ J(ω)infωΓ J(ω).A=\sup_{\omega\in\Gamma}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\omega)-\inf_{\omega\in\Gamma}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\omega). (27)

In the same vein as Theorem 2, it holds that

|𝔼μ[ J]𝔼ν[ J]|infC(0,1]LCW1(μ,ν;ACL).\big{|}\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\big{|}\leq\inf_{C\in(0,1]}\frac{L}{C}\cdot W_{1}\left(\mu,\nu;\frac{AC}{L}\right). (28)

This bound illustrates an interplay between the smoothness of  JJ , the range of  JJ , and the regime of similarity between the two graphs. If two graphs have similar distributions of rooted balls, then the smoothness of  JJ plays a stronger role in bounding transferability. If they have highly distinct distributions of rooted balls, the range of  JJ , as described by the value AA, plays a stronger role in bounding transferability.

Quantifying weak convergence. It is well known that in cases of bounded support, the Wasserstein distance between probability measures metrizes the weak topology on the set of probability measures on that space, i.e., the topology induced by the definition of weak convergence [40]. Therefore, convergence in the 11-Wasserstein distance is equivalent to weak convergence of measures. For instance, this implies via Theorem 1 that the moments of the graph Fourier transform are continuous with respect to the 11-Wasserstein distance between distributions of rooted balls.

Extensions to graph neural networks. Although we have considered maps on graphs with real-valued signals, the analysis in this section is equally applicable when the nodes have other types of features such as categorical labeling. For example, one could consider JJ to be the cross-entropy loss for a fixed graph neural network with KK layers, where the “graph signal” is taken to be a set of discrete input features and a ground-truth label against which the loss is evaluated [41].

Wasserstein graph kernels. Wasserstein distances between graphs have been considered in the graph kernel literature [16], where node embeddings in Euclidean space are computed from local structures about each node, followed by a computation of the Wasserstein distance in Euclidean space. Given that the proposed computation for nodes with continuous attributes in this paper is smooth with respect to the node attributes, one can show that the Wasserstein distance between distributions in our case descends to theirs via a continuous map.

By endowing the space ΩK\Omega_{K} with an appropriate metric structure, we have applied the proposed framework to studying transferability of graph summaries that descend to maps on rooted KK-balls. In analyzing the relationship between the Lipschitz constant of the graph summary and the distribution of rooted balls, insights have been gleaned as to what drives commonalities between graphs in this regard. Although our focus in this section has been on fixed graph summaries, such as the loss of a fixed graph filter in Section 5.1, these ideas can be readily adapted to problems of designing graph summaries. For instance, the presence of the Lipschitz constant in Theorem 2 indicates that a graph filter or graph neural network will transfer better if it is designed to be smooth. This can be cast in the light of stability, where certain conditions on filter banks in graph neural networks can guarantee a response that is stable to structural perturbations [7, 39].

6 Extension to Weighted Graphs

In this section, we consider how ideas from the unweighted case (Sections 4 and 5) can be extended to weighted graphs. We define a weighted graph as a graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) coupled with a weight function w:0w:\mathcal{E}\to\mathbb{R}_{\geq 0}. For a given graph 𝒢\mathcal{G}, the set of all possible weight functions on it is denoted 𝕎(𝒢)={w:0}\mathbb{W}(\mathcal{G})=\{w:\mathcal{E}\to\mathbb{R}_{\geq 0}\}. We use the same notation for the set of weight functions on a rooted graph: 𝕎( 𝒢)\mathbb{W}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,).

We now expand our definition of a rooted KK-ball with a signal to include weight functions. Define

ΩKW= 𝒢k:kK𝕎( 𝒢)×𝕏( 𝒢).\Omega_{K}^{W}=\coprod_{\,\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathcal{G}$\kern-0.70004pt}}}\,_{k}:k\leq K}\mathbb{W}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)\times\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,). (29)

Similar to the definition of ΩK\Omega_{K} in (6), the space ΩKW\Omega_{K}^{W} consists of all rooted KK-balls with both edge weights and graph signals on them. As before, a weighted graph with a signal (𝒢,w,𝐱)(\mathcal{G},w,\mathbf{x}) yields elements of ΩKW\Omega_{K}^{W} via a suitably defined sampling map ΣK\Sigma_{K}, defined for nodes v𝒱v\in\mathcal{V}:

ΣK(v)=( 𝒢K(v), wK(v), 𝐱K(v)),\Sigma_{K}(v)=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v),\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$w$\kern-1.00006pt}}}_{K}(v),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(v)), (30)

where  wK(v)\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$w$\kern-1.00006pt}}}_{K}(v) denotes the restriction of ww to the edges of  𝒢K(v)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v). As before, a given weighted graph with a signal defines a probability measure on ΩKW\Omega_{K}^{W} via the pushforward of the uniform measure by ΣK\Sigma_{K}, which we denote by μ=(ΣK)(𝒢,w,𝐱)\mu=(\Sigma_{K})_{*}(\mathcal{G},w,\mathbf{x}). For a weighted graph (𝒢,w)(\mathcal{G},w), define the weighted degree of a node v𝒱v\in\mathcal{V} as follows:

deg(v)=u𝒩(v)w(v,u).\mathrm{deg}(v)=\sum_{u\in\mathcal{N}(v)}w(v,u). (31)

The set of weighted graphs whose nodes have weighted degree at most DD is denoted by 𝔾DW\mathbb{G}^{W}_{D} for real number D>0D>0.

We now develop a local description of the graph Fourier transform for weighted graphs, much like in Section 4. For a graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) with weight function ww, define the weighted graph Laplacian as a shift operator such that, for any u,v𝒱u,v\in\mathcal{V},

[𝐒]uv={deg(v;w)u=vw(u,v)(u,v)0otherwise.[\mathbf{S}]_{uv}=\begin{cases}\mathrm{deg}(v;w)&u=v\\ -w(u,v)&(u,v)\in\mathcal{E}\\ 0&\text{otherwise}.\end{cases} (32)

The weighted Laplacian for a graph on nn nodes is positive semidefinite, and admits an eigendecomposition 𝐒=j=1nλj𝐮j𝐮j𝖳\mathbf{S}=\sum_{j=1}^{n}\lambda_{j}\mathbf{u}_{j}\mathbf{u}_{j}^{\mathsf{T}}, with a Fourier representation for graph signals defined in a way analogous to (12). For a given graph signal 𝐱𝕏(𝒢)\mathbf{x}\in\mathbb{X}(\mathcal{G}), there is a power spectral distribution associated with 𝐱\mathbf{x} via the weighted Laplacian, whose moments are given by (15). We characterize the properties of the power spectral distribution in terms of the distribution of rooted KK-balls in the following theorem.

Theorem 3.

Let an integer K0K\geq 0 and a real number D>0D>0 be given. Let {(𝒢j,wj,𝐱j)}j=1\{(\mathcal{G}_{j},w_{j},\mathbf{x}_{j})\}_{j=1}^{\infty} be a sequence of graphs, weight functions, and graph signals such that each graph is contained in 𝔾DW\mathbb{G}^{W}_{D} and the signals 𝐱j\mathbf{x}_{j} are uniformly bounded. Let μj\mu_{j} be the probability measure on ΩKW\Omega_{K}^{W} associated with (𝒢j,wj,𝐱j)(\mathcal{G}_{j},w_{j},\mathbf{x}_{j}) for each jj, and PjP_{j} the corresponding normalized power spectral distribution function.

If the measures μj\mu_{j} converge weakly, then the KthK^{th} moments of the normalized power spectral distribution functions converge. Moreover, if weak convergence of measure holds for all K0K\geq 0, then the normalized power spectral distribution functions converge weakly.

The proof of Theorem 3 can be found in Appendix E. Theorem 3 establishes that the power spectral distribution of a graph signal on a weighted graph is continuous with respect to the weak topology of distributions of rooted KK-balls, under boundedness assumptions. Unlike the regime of Theorem 1, the convergence of the power spectral distribution for weighted graphs here is not dependent on the combinatorial constraint of having bounded node degrees. Rather, any node may have an arbitrarily large number of neighbors, as long as the total influence remains bounded. This is a realistic assumption for many systems, where an agent may be allowed to exert influence on a large collection of other agents, but the total influence is bounded by some physical constraint, such as power consumption.

Remark 3 (Zero-weight edges).

In many applications, an edge with an assigned weight of zero is treated as a non-edge. For instance, the moment function  𝐦K\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K} satisfies this condition. This motivates a quotient map on ΩKW\Omega_{K}^{W} that “glues” rooted balls with edge-weights that take value zero to rooted balls where those edges are not present. Any function that is continuous on ΩKW\Omega_{K}^{W} and also satisfying this condition will also be continuous under such a quotient map, so that our results still hold. The construction in this case is somewhat technical, so we omit it for simplicity of exposition.

7 In the Limit: Graphing Signal Processing

In Sections 5 and 6, the (weak) continuity of the power spectral density with respect to the distribution of rooted balls was established. Specifically, it was shown that weakly convergent sequences of distributions of rooted balls yielded weakly convergent power spectra. Given that these distributions correspond to underlying graphs, it remains to be established precisely what objects these sequences are converging to. In general, a weakly convergent sequence of finite graphs does not necessarily converge to a finite graph. When studying graph limits using graphon models, the homomorphism density of motifs is of primary concern [42]. This setting is slightly different, depending on the isomorphism density of rooted graphs. In this setting, a more appropriate model is that of a graphing. We show that the basic ideas in the discussion so far can be transferred directly to these limiting objects. Let us define the basic object of study for this section.

Definition 1.

A graphing of degree DD is a triplet 𝐆=(𝒱,,λ)\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda) such that

  1. 1.

    𝒱\mathcal{V} is a sample space with a σ\sigma-field \mathcal{B} on 𝒱\mathcal{V}

  2. 2.

    λ\lambda is a probability measure on \mathcal{B}

  3. 3.

    ×\mathcal{E}\in\mathcal{B}\times\mathcal{B} is such that, for all A,AA,A^{\prime}\in\mathcal{B}, we have

Adeg(v,A)𝑑λ(v)=Adeg(v,A)𝑑λ(v),\int_{A}\mathrm{deg}(v,A^{\prime})d\lambda(v)=\int_{A^{\prime}}\mathrm{deg}(v,A)d\lambda(v), (33)

where deg(v,A)=|{uA:(u,v)}|\mathrm{deg}(v,A)=|\{u\in A:(u,v)\in\mathcal{E}\}|, with the condition that deg(v,𝒱)D\mathrm{deg}(v,\mathcal{V})\leq D for all v𝒱v\in\mathcal{V}.

A graphing is a way to describe a graph with a potentially uncountable number of nodes (elements of 𝒱\mathcal{V}), yet with bounded node degrees. We tacitly assume that all graphings considered are of degree DD, for some D1D\geq 1. Unlike a graphon, which describes dense graphs with unbounded node degrees, the structures that arise from a graphing are typically sparse, and notions of graph filtering and graph shifts reduce to finite sums, rather than continuous integrals. Given a graphing 𝐆=(𝒱,,λ)\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda), a graphing signal is a map 𝐱:𝒱\mathbf{x}:\mathcal{V}\to\mathbb{R} between the nodes 𝒱\mathcal{V} and the real numbers \mathbb{R} such that 𝐱\mathbf{x} is a measurable function. Endowing this with the usual vector space structure, we define the space of graphing signals as

𝕏(𝐆)={𝐱:𝒱|𝐱 is measurable}.\mathbb{X}(\mathbf{G})=\{\mathbf{x}:\mathcal{V}\to\mathbb{R}\ \big{|}\ \mathbf{x}\text{ is measurable}\}. (34)

Much like how a graphon describes a model for dense random graphs, a graphing carries with it a distribution of random rooted graphs. As before, let 𝒩\mathcal{N} be the neighborhood operator, returning all of the one-hop neighbors of a node v𝒱v\in\mathcal{V}, so that 𝒩K\mathcal{N}^{K} is the KK-hop neighborhood operator. For a graphing 𝐆=(𝒱,,λ)\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda) with associated signal 𝐱𝕏(𝐆)\mathbf{x}\in\mathbb{X}(\mathbf{G}), the rooted KK-ball at a node r𝒱r\in\mathcal{V} is defined in the same way as in Section 3, denoted by  𝒢K(r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(r), with the same notation used to denote the corresponding graph signal  𝐱K(r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,_{K}(r). Due to the bounded degree condition on the graphing,  𝒢K(r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(r) is always a finite rooted graph of maximum degree at most DD. With this in mind, we define a sampling map ΣK:𝒱ΩK\Sigma_{K}:\mathcal{V}\to\Omega_{K} for graphings much like the sampling map (7) for finite graphs:

ΣK(v)=( 𝒢K(v), 𝐱(v)).\Sigma_{K}(v)=(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,_{K}(v),\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{x}$\kern-1.00006pt}}}\,(v)). (35)

The sampling map ΣK\Sigma_{K} is illustrated in Fig. 3 for a simple graphing. Much like for finite graphs, we can pushforward the measure λ\lambda to yield a probability measure μ\mu on ΩK\Omega_{K}. That is to say, μ=(ΣK)(𝐆,𝐱)\mu=(\Sigma_{K})_{*}(\mathbf{G},\mathbf{x}), in the same way as before, where we adopt λ\lambda as the probability measure over 𝒱\mathcal{V} to be pushed forward. This differs slightly from the case of finite graphs, where we implicitly assume that the initial measure was the uniform probability measure on the finite node set.

Ω1\Omega_{1}ρ\rhov1v_{1}v2v_{2}v3v_{3}v4v_{4}ρ\rhov2v_{2}v4v_{4}v3v_{3}v1v_{1}
Figure 3: Sampling of a random rooted graph from a graphing with an associated signal. (Upper left) A graphing 𝐆=([0,1),,λ)\mathbf{G}=([0,1),\mathcal{E},\lambda). We sample from 𝐆\mathbf{G} by first selecting a root node ρ\rho at random (colored in black), and then taking the connected component of ρ\rho in the edge set. That is, starting with the node ρ\rho, we see that the nodes v1,v2v_{1},v_{2} are neighbors of ρ\rho. Then, the nodes v3,v4v_{3},v_{4} are neighbors of v1,v2v_{1},v_{2}, with the process stopping there. (Lower left) A graphing signal 𝐱:[0,1)\mathbf{x}:[0,1)\to\mathbb{R}. After sampling a node and edge set from the graphing, we sample this function at the corresponding values. (Right) The drawn rooted graph and signal sampled from the graphing.

To establish the basic building blocks for signal processing on graphings, we first construct a Laplacian for graphing signals. Unlike graph signals, graphing signals are not necessarily supported on a finite, or even countable space. We specify the graphing Laplacian as a linear operator on the space of graphing signals in the following way. For any 𝐱𝕏(𝐆)\mathbf{x}\in\mathbb{X}(\mathbf{G}) and v𝒱v\in\mathcal{V}, put

[𝐒𝐱]v=u𝒩(v)[𝐱]v[𝐱]u.[\mathbf{S}\mathbf{x}]_{v}=\sum_{u\in\mathcal{N}(v)}[{\mathbf{x}}]_{v}-[{\mathbf{x}}]_{u}. (36)

By the degree boundedness condition from Definition 1, the sum in (36) is well-defined. Due to its singular nature, it is difficult to define a “graphing Fourier transform” directly from the Laplacian: for instance, even fairly tame graphing structures may have a Laplacian whose eigenvalues have uncountable multiplicity. This, for instance, makes the notion of the power spectral distribution of a graphing signal unwieldy.

However, the spectral properties of a graphing signal with respect to the underlying graphing structure can be studied indirectly. Let 𝐆\mathbf{G} be a graphing with graphing signal 𝐱𝕏(𝐆)\mathbf{x}\in\mathbb{X}(\mathbf{G}) and Laplacian 𝐒\mathbf{S}. Then, for integers K0K\geq 0, we tentatively define

𝐦K(𝐱)=𝒱[𝐱]v[𝐒K𝐱]vdλ(v),\mathbf{m}_{K}(\mathbf{x})=\int_{\mathcal{V}}[{\mathbf{x}}]_{v}[\mathbf{S}^{K}\mathbf{x}]_{v}\mathrm{d}\lambda(v), (37)

where 𝐒K\mathbf{S}^{K} again indicates KK repeated applications of the Laplacian. See that this definition resembles that of the moments of the power spectral distribution of a graph signal in (15). We define the values 𝐦K(𝐱)\mathbf{m}_{K}(\mathbf{x}) tentatively, as it is not obvious when they are finite, or even well-defined. In the following discussion, we establish sufficient conditions under which the sequence {𝐦K(𝐱)}K=0\{\mathbf{m}_{K}(\mathbf{x})\}_{K=0}^{\infty} exists and corresponds to the moments of a distribution function.

Much like in Sections 4 and 5, we control the behavior of the graphing by bounding its size. We first do this by determining a suitable notion of boundedness for a graphing signal.

Definition 2.

Let 𝐆=(𝒱,,λ)\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda) be a graphing and 𝐱𝕏(𝐆)\mathbf{x}\in\mathbb{X}(\mathbf{G}) be a graphing signal. The signal 𝐱\mathbf{x} is said to be locally essentially bounded if it is bounded on almost all of the connected components of 𝐆\mathbf{G}. That is, there exists an a0a\geq 0 such that for all K0K\geq 0,

λ(𝒩K((𝐱1[a,a])𝖢))=0,\lambda\left(\mathcal{N}^{K}((\mathbf{x}^{-1}[-a,a])^{\mathsf{C}})\right)=0, (38)

where (𝐱1[a,a])𝖢(\mathbf{x}^{-1}[-a,a])^{\mathsf{C}} denotes the set of nodes v𝒱v\in\mathcal{V} such that [𝐱]v[a,a][{\mathbf{x}}]_{v}\notin[-a,a], and 𝒩K\mathcal{N}^{K} is the KK-hop neighborhood of a set. Denote the set of all such graphing signals by 𝕏LB(𝐆)\mathbb{X}_{\mathrm{LB}}(\mathbf{G}).

In words, local essential boundedness not only controls the size of the set of nodes with large signal value (essential boundedness), but also controls the size of the neighborhoods with which those nodes can interact. Local essential boundedness is an analog of boundedness for finite graph signals adapted to graphing signals. For instance, a graphing signal that is strictly bounded is locally essentially bounded. This condition, however, is stronger than being bounded almost everywhere (see Lemma 4), due to the neighborhood condition in (38). The necessity of this condition stems from the highly singular nature of the graphing Laplacian as an operator on signals.

We will also find it useful to consider graphings that “resemble” finite graphs in some sense.

Definition 3.

Let 𝐆=(𝒱,,λ)\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda) be a graphing with graphing signal 𝐱𝕏(𝐆)\mathbf{x}\in\mathbb{X}(\mathbf{G}), and {(𝒢j,𝐱j)}j=1\{(\mathcal{G}_{j},\mathbf{x}_{j})\}_{j=1}^{\infty} a sequence of graphs with associated graph signals. For all K0K\geq 0, put μK=(ΣK)(𝐆,𝐱)\mu_{K}=(\Sigma_{K})_{*}(\mathbf{G},\mathbf{x}), and μK(j)=(ΣK)(𝒢j,𝐱j)\mu_{K}(j)=(\Sigma_{K})_{*}(\mathcal{G}_{j},\mathbf{x}_{j}). Then, we say that the sequence of graphs {(𝒢j,𝐱j)}j=1\{(\mathcal{G}_{j},\mathbf{x}_{j})\}_{j=1}^{\infty} converges weakly to (𝐆,𝐱)(\mathbf{G},\mathbf{x}) if for all K0K\geq 0, μK(j)\mu_{K}(j) converges to μK\mu_{K} as jj tends to infinity. We denote weak convergence of graphs to a graphing by (𝒢j,𝐱j)(𝐆,𝐱)(\mathcal{G}_{j},\mathbf{x}_{j})\rightharpoonup(\mathbf{G},\mathbf{x}).

Much like Theorem 1, we can characterize the properties of a graphing based on the limiting properties of a sequence of graphs that converges to it. We state this formally below.

Theorem 4.

Let a graphing 𝐆\mathbf{G} of degree DD with signal 𝐱𝕏LB(𝐆)\mathbf{x}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G}) be given. Suppose there exists a sequence of graphs and graph signals {(𝒢j,𝐱j)}j=1\{(\mathcal{G}_{j},\mathbf{x}_{j})\}_{j=1}^{\infty} such that (𝒢j,𝐱j)(𝐆,𝐱)(\mathcal{G}_{j},\mathbf{x}_{j})\rightharpoonup(\mathbf{G},\mathbf{x}). Then, there exists a unique distribution function P𝐱P_{\mathbf{x}} supported on [0,2D][0,2D] such that the moments of P𝐱P_{\mathbf{x}} are given by the sequence {𝐦K(𝐱)}K=0\{\mathbf{m}_{K}(\mathbf{x})\}_{K=0}^{\infty}.

The proof of Theorem 4 can be found in Appendix F. Theorem 4 defines the power spectral distribution for sufficiently nice graphing signals. In particular, if the graphing and graphing signal are the limit of a sequence of graphs and graph signals, then asking that the power spectral distribution be continuous with respect to the weak topology on distributions forces the distribution to take on a unique value. Namely, the power spectral distribution of the limit of a sequence of graphs and graph signals is merely the weak limit of the power spectral distributions of the graphs as defined in (14). To show that this definition properly preserves the graph power spectral distribution of finite graphs, we illustrate with a simple example.

Example 1.

From a finite graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) on nn nodes and a graph signal 𝐱𝕏(𝒢)\mathbf{x}\in\mathbb{X}(\mathcal{G}), a graphing 𝐆\mathbf{G} with graphing signal 𝐲𝕏LB(𝐆)\mathbf{y}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G}) can be constructed in the following way (cf. [42, Example 18.16]). Identify 𝒱\mathcal{V} with the integers 1,2,,n1,2,\ldots,n. Let 𝒲=[0,1)\mathcal{W}=[0,1) with the usual Borel σ\sigma-field and the uniform (Lebesgue) probability measure λ\lambda. We construct a graphing 𝐆=(𝒲,,λ)\mathbf{G}=(\mathcal{W},\mathcal{F},\lambda) in the following way. Partition 𝒲\mathcal{W} into intervals Ii=[(i1)/n,i/n)I_{i}=[(i-1)/n,i/n) indexed by i𝒱i\in\mathcal{V}. Then, for each (i,j)(i,j)\in\mathcal{E} with i<ji<j, and every point tIit\in I_{i}, put the points (t,t+(ji)/n)Ii×Ij,(t+(ji)/n,t)Ij×Ii(t,t+(j-i)/n)\in I_{i}\times I_{j},(t+(j-i)/n,t)\in I_{j}\times I_{i} into the edge set \mathcal{F}. Additionally, assign [𝐲]t=[𝐱]i[{\mathbf{y}}]_{t}=[{\mathbf{x}}]_{i}.

Under this construction, one can show that for all K0K\geq 0, the distributions on ΩK\Omega_{K} corresponding to (𝒢,𝐱)(\mathcal{G},\mathbf{x}) and (𝐆,𝐲)(\mathbf{G},\mathbf{y}) are equal, i.e., (ΣK)(𝒢,𝐱)=(ΣK)(𝐆,𝐲)(\Sigma_{K})_{*}(\mathcal{G},\mathbf{x})=(\Sigma_{K})_{*}(\mathbf{G},\mathbf{y}). It holds, then, that 𝐦K(𝐱)=𝐦K(𝐲)\mathbf{m}_{K}(\mathbf{x})=\mathbf{m}_{K}(\mathbf{y}) for all KK, so that the power spectral distribution corresponding to the graphing signal 𝐲\mathbf{y} is identical to that of the graph signal 𝐱\mathbf{x}.

In Sections 4, 5 and 6, we considered how the distribution of rooted balls in graphs yields useful topological or metric structure with which graphs can be related to each other, particularly in light of graph summaries such as the power spectral distribution. In all of these cases, the topology of weak convergence was used to show that a convergent sequence of graphs in the sense of the distribution of rooted balls is also convergent in the sense of any appropriate graph summary. This left the question open: what do the graphs converge to? By considering graphings as limiting objects for sparse graphs, we have shown that the extension of graph summaries, such as the power spectral distribution, to graphings maintains continuity in the natural way.

Remark 4 (Hyperfinite graphings).

Theorem 4 hinges on the graphing 𝐆\mathbf{G} and signal 𝐱\mathbf{x} being the limit of a sequence of finite graphs and graph signals. As noted in [42, Conjecture 19.8], the question of whether all graphings are limits of sequences of graphs is still an open question. However, there is a class of graphings for which this is known to hold: namely, hyperfinite graphings. We refer the interested reader to [42, Chapter 21] for more details.

8 Conclusion

Taking into account the folklore knowledge that low-order graph filters only use local information on graphs, we have developed a local framework in which many aspects of graph signal processing are illuminated. In particular, by mapping a graph and graph signal to a probability distribution on a fixed space, we allow for the application of tools from probability theory to characterize similarities between graphs based on their local structures, even if they do not have a common node set. Using the topology of weak convergence of probability distributions, we define a topology on the space of graphs and graph signals, where notions of continuity can be understood: this was most obviously demonstrated in Section 4, where it was proved that the power spectral distribution of a graph signal is weakly continuous with respect to the distribution of rooted balls in a graph. This approach allows for relatively painless treatment of limiting objects, such as graphings, by simply decomposing them into their constituent distributions.

The analysis carried out in this paper provides proper justification for transferring graph filters across multiple graphs, given the assumption that they share local structural properties. With the particular suitability of this approach to highly sparse graphs, this complements existing work on transferability based on graphon models or assumptions of graphs as discretizing manifold structures. Leveraging the framework of this paper, it would be interesting to study more exotic types of functions, such as graph neural networks and graph kernels, or to incorporate the properties encoded by distributions of local structures into tasks such as network topology inference.

Appendix A Metric Topology of the Signal Space

Let a rooted ball  𝒢=(𝒱,,r)\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,=(\mathcal{V},\mathcal{E},r) be given. Denote by Aut( 𝒢)\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) the automorphism group of maps {ϕ:𝒱𝒱}\{\phi:\mathcal{V}\to\mathcal{V}\}, i.e., the set of all isomorphisms from    𝒢\mathcal{G}   to itself, with the group operation given by composition. The group Aut( 𝒢)\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) acts on 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) in a natural way. For any ϕAut( 𝒢),𝐱𝕏( 𝒢)\phi\in\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,),\mathbf{x}\in\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,), define the action of ϕ\phi on 𝐱\mathbf{x} as the graph signal ϕ(𝐱)\phi(\mathbf{x}) satisfying

[ϕ(𝐱)]ϕ(v)=[𝐱]v[\phi(\mathbf{x})]_{\phi(v)}=[{\mathbf{x}}]_{v} (39)

for all v𝒱v\in\mathcal{V}. Next, define an equivalence relation on 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) where 𝐱ϕ(𝐱)\mathbf{x}\!\sim\!\phi(\mathbf{x}) for all ϕAut( 𝒢),𝐱𝕏( 𝒢)\phi\in\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,),\mathbf{x}\in\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,), and take the quotient space 𝕏( 𝒢)/\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)/\!\sim, whose elements are the equivalence classes under this relation. Define the metric 2\|\cdot\|_{2} on 𝕏( 𝒢)/\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)/\!\sim so that for any [𝐱],[𝐲]𝕏/[\mathbf{x}],[\mathbf{y}]\in\mathbb{X}/\!\sim,

[𝐱][𝐲]2=min𝐱[𝐱]𝐲[𝐲]𝐱𝐲2,\|[\mathbf{x}]-[\mathbf{y}]\|_{2}=\min_{\begin{subarray}{c}\mathbf{x}^{\prime}\in[\mathbf{x}]\\ \mathbf{y}^{\prime}\in[\mathbf{y}]\end{subarray}}\|\mathbf{x}^{\prime}-\mathbf{y}^{\prime}\|_{2}, (40)

where 𝐱𝐲2\|\mathbf{x}^{\prime}-\mathbf{y}^{\prime}\|_{2} is the norm on 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) inherited from identification with |𝒱|\mathbb{R}^{|\mathcal{V}|}. As Aut( 𝒢)\operatorname{Aut}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) is a finite group, this minimum is well-defined, and satisfies the triangle inequality, thus yielding a valid metric. When defining the space ΩK\Omega_{K} in Section 3.1 we treat 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,) with the structure of 𝕏( 𝒢)/\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)/\!\sim, from which the topology on ΩK\Omega_{K} is inherited. Similarly, we use the metric 2\|\cdot\|_{2} in Section 5 as defined on 𝕏( 𝒢)/\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,)/\!\sim.

Appendix B Proof of Theorem 1

By Lemma 1, the sequence {μj}j=1\{\mu_{j}\}_{j=1}^{\infty} has compact support. Since  𝐦K\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K} is a continuous function, this implies that {𝔼μj[ 𝐦K]}j=1\{\mathbb{E}_{\mu_{j}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]\}_{j=1}^{\infty} is a convergent sequence by the definition of weak convergence of probability measures. Applying Proposition 2, the moments 𝐦K(𝐱j)=𝔼μj[ 𝐦K]\mathbf{m}_{K}(\mathbf{x}_{j})=\mathbb{E}_{\mu_{j}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}] converge, as desired.

For the case where weak convergence holds for all K0K\geq 0, we complete the proof with the following lemma.

Lemma 2.

Let A=[0,a]A=[0,a] be a compact subset of the real line, for some a>0a>0. Let {{𝐦K(j)}K=0}j=1\{\{\mathbf{m}_{K}(j)\}_{K=0}^{\infty}\}_{j=1}^{\infty} be an array of numbers such that {𝐦K(j)}K=0\{\mathbf{m}_{K}(j)\}_{K=0}^{\infty} is the sequence of moments of a finite measure PjP_{j} supported on AA, and that 𝐦K(j)𝑗𝐦K\mathbf{m}_{K}(j)\overset{j}{\to}\mathbf{m}_{K} for each KK, where {𝐦K}K=0\{\mathbf{m}_{K}\}_{K=0}^{\infty} is some sequence of real numbers.

Then, the sequence {𝐦K}K=0\{\mathbf{m}_{K}\}_{K=0}^{\infty} uniquely corresponds to a distribution PP supported on AA, such that the sequence of distributions PjP_{j} converges weakly to PP.

Lemma 2 is a routine derivation following from the Hausdorff moment property [43, Theorem 3.15] and the Weierstrass approximation theorem [44], so we omit the proof. In particular, let A=[0,2D]A=[0,2D]. The moments 𝐦K(𝐱j)\mathbf{m}_{K}(\mathbf{x}_{j}) correspond to the power spectral distributions of the graph signals 𝐱j\mathbf{x}_{j}, which are supported on AA by the bounded degree assumption. Since 𝐦K(𝐱j)\mathbf{m}_{K}(\mathbf{x}_{j}) converges for all K0K\geq 0, we have that the sequence of limits {𝐦K}K=0\{\mathbf{m}_{K}\}_{K=0}^{\infty} is the sequence of moments of a distribution PP supported on AA. Moreover, we have that PP is the weak limit of the power spectral distributions of the graph signals 𝐱n\mathbf{x}_{n}, as desired.

Appendix C Proof of Lemma 1

Denote by  𝔾D\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D} the set of rooted graphs such that the underlying graph has node degree bounded by DD. For any K0K\geq 0 and any graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) contained in 𝔾D\mathbb{G}_{D}, note that for all v𝒱v\in\mathcal{V}, it holds that the rooted KK-ball centered at vv is contained in  𝔾D\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D}. Moreover, it is straightforward to show that there are only finitely many rooted KK-balls contained in  𝔾D\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D}, up to isomorphism. For any a0a\geq 0 and any rooted graph    𝒢\mathcal{G}  , denote the set of graph signals taking values in [a,a][-a,a] by 𝕏( 𝒢,a)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,a). It is clear that 𝕏( 𝒢,a)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,a) is compact as a subset of 𝕏( 𝒢)\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,).

Using these constructions, it immediately follows that for any graph 𝒢𝔾D\mathcal{G}\in\mathbb{G}_{D} with graph signal 𝐱𝕏( 𝒢,a)\mathbf{x}\in\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,a) for some a>0a>0, the associated distribution μ=(ΣK)(𝒢,𝐱)\mu=(\Sigma_{K})_{*}(\mathcal{G},\mathbf{x}) satisfies

supp(μ) 𝒢k 𝔾D:kK𝕏( 𝒢,a).\operatorname{supp}(\mu)\subseteq\coprod_{\,\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathcal{G}$\kern-0.70004pt}}}\,_{k}\in\hbox{\vbox{\hrule height=0.66pt\kern 0.90417pt\hbox{\kern-0.70004pt$\mathbb{G}$\kern-0.70004pt}}}_{D}:k\leq K}\mathbb{X}(\,\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathcal{G}$\kern-1.00006pt}}}\,,a). (41)

Since there are only finitely many rooted KK-balls in  𝔾D\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbb{G}$\kern-1.00006pt}}}_{D}, the above set is a disjoint union of finitely many compact spaces, and is thus a compact space itself, by the properties of the disjoint union of topological spaces. If a family of graphs in {𝒢j}jJ𝔾D\{\mathcal{G}_{j}\}_{j\in J}\in\mathbb{G}_{D} has uniformly bounded signals {𝐱j}jJ\{\mathbf{x}_{j}\}_{j\in J} for some index set JJ, this means that there is some a0a\geq 0 such that 𝐱j𝕏(𝒢j,a)\mathbf{x}_{j}\in\mathbb{X}(\mathcal{G}_{j},a) for all jJj\in J, so that the condition in (41) holds for all graph/graph signals in the family, as desired.

Appendix D Proof of Theorem 2

We first establish that for a function  JJ satisfying 1, the metric dCd_{C} preserves Lipschitz continuity over the subspace ΓΩK\Gamma\subset\Omega_{K} as defined in Lemma 1.

Lemma 3.

Let  J:ΩK\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}:\Omega_{K}\to\mathbb{R} be a function satisfying 1. Then, for any C>0C>0,  JJ has a Lipschitz constant at most max{L,A/C}\max\{L,A/C\} on the subspace Γ\Gamma with the metric dCd_{C}, where

A=supωΓ J(ω)infωΓ J(ω).A=\sup_{\omega\in\Gamma}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\omega)-\inf_{\omega\in\Gamma}\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}(\omega). (42)

Lemma 3 can be shown by directly applying the definition of Lipschitz continuity, so we omit the proof. By the dual formulation of the 11-Wasserstein distance [38], Lemma 3 implies that, for C>0C>0,

|𝔼μ[ J]𝔼ν[ J]|max{L,AC}W1(μ,ν;C),\big{|}\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\big{|}\leq\max\left\{L,\frac{A}{C}\right\}\cdot W_{1}\left(\mu,\nu;C\right), (43)

owing to the fact that supp(μ),supp(ν)Γ\operatorname{supp}(\mu),\operatorname{supp}(\nu)\subseteq\Gamma, by assumption. Observing that W1(μ,ν;C)W_{1}(\mu,\nu;C) is monotonically increasing with respect to CC, we can restrict our view to CA/LC\geq A/L. Applying a simple change of variables yields the expression

|𝔼μ[ J]𝔼ν[ J]|infC(0,1]LCW1(μ,ν;ACL).\big{|}\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]-\mathbb{E}_{\nu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$J$\kern-1.00006pt}}}]\big{|}\leq\inf_{C\in(0,1]}\frac{L}{C}\cdot W_{1}\left(\mu,\nu;\frac{AC}{L}\right). (44)

Taking C=1C=1 and noting that A=1A=1 by assumption yields the bound (23), as desired.

Appendix E Proof of Theorem 3

One can check that  𝐦K\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K} is continuous and bounded on j=1supp(μj)\bigcup_{j=1}^{\infty}\operatorname{supp}(\mu_{j}) due to the weighted degree bound DD and the uniform boundedness of the signals, so that {𝔼μj[ 𝐦K]}j=1\{\mathbb{E}_{\mu_{j}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]\}_{j=1}^{\infty} is a convergent sequence by the definition of weak convergence of probability measures. Applying Proposition 2, the moments 𝐦K(𝐱j)=𝔼μj[ 𝐦K]\mathbf{m}_{K}(\mathbf{x}_{j})=\mathbb{E}_{\mu_{j}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}] converge, as desired.

When weak convergence holds for all K0K\geq 0, the proof is completed by appealing to Lemma 2 as in the proof of Theorem 1.

Appendix F Proof of Theorem 4

We begin by establishing two results regarding locally essentially bounded signals 𝕏LB(𝐆)\mathbb{X}_{\mathrm{LB}}(\mathbf{G}) and the graphing Laplacian 𝐒\mathbf{S}.

Lemma 4.

For a graphing 𝐆=(𝒱,,λ)\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda), the set 𝕏LB(𝐆)\mathbb{X}_{\mathrm{LB}}(\mathbf{G}) is a subset of (𝒱,λ)\mathcal{L}^{\infty}(\mathcal{V},\lambda), where (𝒱,λ)\mathcal{L}^{\infty}(\mathcal{V},\lambda) denotes the space of essentially bounded functions on 𝒱\mathcal{V} with respect to the probability measure λ\lambda.

We omit the proof of Lemma 4, as it follows directly from Definition 2.

Lemma 5.

𝕏LB(𝐆)\mathbb{X}_{\mathrm{LB}}(\mathbf{G}) is closed under application of the Laplacian 𝐒\mathbf{S}, i.e., for any 𝐱𝕏LB(𝐆)\mathbf{x}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G}), we have that 𝐒𝐱𝕏LB(𝐆)\mathbf{S}\mathbf{x}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G}).

The proof of Lemma 5 can be found in Appendix G.

For any K0K\geq 0, the graphing signal 𝐒K𝐱\mathbf{S}^{K}\mathbf{x} is contained in 𝕏LB(𝐆)\mathbb{X}_{\mathrm{LB}}(\mathbf{G}) by Lemma 5. The product of two functions in (𝒱,λ)\mathcal{L}^{\infty}(\mathcal{V},\lambda) is contained in (𝒱,λ)\mathcal{L}^{\infty}(\mathcal{V},\lambda), so that the graphing signal 𝐲:v[𝐱]v[𝐒K𝐱]v\mathbf{y}:v\to[{\mathbf{x}}]_{v}\cdot[\mathbf{S}^{K}\mathbf{x}]_{v} is contained in (𝒱,λ)\mathcal{L}^{\infty}(\mathcal{V},\lambda), by Lemma 4. Thus, 𝐦K(𝐱)=𝔼λ[𝐲]\mathbf{m}_{K}(\mathbf{x})=\mathbb{E}_{\lambda}[\mathbf{y}] has finite value for all K0K\geq 0.

Let K0K\geq 0 be given, and let  𝐦K:ΩK\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}:\Omega_{K}\to\mathbb{R} be the function as defined in Proposition 2. One can check that 𝐦K(𝐱)=𝔼μ[ 𝐦K]\mathbf{m}_{K}(\mathbf{x})=\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}], where μ=(ΣK)(𝐆,𝐱)\mu=(\Sigma_{K})_{*}(\mathbf{G},\mathbf{x}). By the bounded degree property of the graphing and the assumption that 𝐱(𝒱,λ)\mathbf{x}\in\mathcal{L}^{\infty}(\mathcal{V},\lambda) (as a consequence of Lemma 4), Lemma 1 implies that supp(μ)\operatorname{supp}(\mu) is contained in a compact subset of ΩK\Omega_{K}. For each finite graph and graph signal (𝒢n,𝐱n)(\mathcal{G}_{n},\mathbf{x}_{n}), denote the KthK^{th} moment of the signal 𝐱n\mathbf{x}_{n} by 𝐦K(𝐱n)\mathbf{m}_{K}(\mathbf{x}_{n}), and the pushforward measure of ΣK\Sigma_{K} on the space ΩK\Omega_{K} by μn\mu_{n}, so that 𝐦K(𝐱n)=𝔼μn[ 𝐦K]\mathbf{m}_{K}(\mathbf{x}_{n})=\mathbb{E}_{\mu_{n}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}], by Proposition 2. The assumption that (𝒢n,𝐱n)(𝐆,𝐱)(\mathcal{G}_{n},\mathbf{x}_{n})\rightharpoonup(\mathbf{G},\mathbf{x}) implies that μn\mu_{n} weakly converges to μ\mu. Since μ\mu has compact support and  𝐦K\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K} is a continuous function, this implies that 𝔼μn[ 𝐦K]𝔼μ[ 𝐦K]\mathbb{E}_{\mu_{n}}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]\to\mathbb{E}_{\mu}[\hbox{\vbox{\hrule height=0.66pt\kern 1.29167pt\hbox{\kern-1.00006pt$\mathbf{m}$\kern-1.00006pt}}}_{K}]. In other words, 𝐦K(𝐱n)𝐦K(𝐱)\mathbf{m}_{K}(\mathbf{x}_{n})\to\mathbf{m}_{K}(\mathbf{x}).

Having satisfied the conditions of Theorem 1, we have that the power spectral distributions of the finite graphs converge to a unique distribution supported on [0,2D][0,2D], and that the moments of this distribution are given by {𝐦K(𝐱)}K=0\{\mathbf{m}_{K}(\mathbf{x})\}_{K=0}^{\infty} by Lemma 2, as desired.

Appendix G Proof of Lemma 5

Let a degree DD graphing 𝐆=(𝒱,,λ)\mathbf{G}=(\mathcal{V},\mathcal{E},\lambda) with graphing signal 𝐱𝕏LB(𝐆)\mathbf{x}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G}) be given. By definition, there exists an a0a\geq 0 such that for all K0K\geq 0, we have

λ(𝒩K((𝐱1[a,a])𝖢))=0.\lambda\left(\mathcal{N}^{K}((\mathbf{x}^{-1}[-a,a])^{\mathsf{C}})\right)=0. (45)

Put b=2Dab=2Da, and define the graphing signal 𝐲=𝐒𝐱\mathbf{y}=\mathbf{S}\mathbf{x}, where 𝐒\mathbf{S} denotes the graphing Laplacian. Observe, due to the bounded degree DD of the graphing, that

(𝐲1[b,b])𝖢𝒩((𝐱1[a,a])𝖢),(\mathbf{y}^{-1}[-b,b])^{\mathsf{C}}\subseteq\mathcal{N}\left((\mathbf{x}^{-1}[-a,a])^{\mathsf{C}}\right), (46)

so that, for all K0K\geq 0,

𝒩K((𝐲1[b,b])𝖢)𝒩K+1((𝐱1[a,a])𝖢).\mathcal{N}^{K}\left((\mathbf{y}^{-1}[-b,b])^{\mathsf{C}}\right)\subseteq\mathcal{N}^{K+1}\left((\mathbf{x}^{-1}[-a,a])^{\mathsf{C}}\right). (47)

By the definition of local essential boundedness, the right hand side of (47) has measure zero (under the probability measure λ\lambda), which implies that the left hand side also has measure zero. Since KK was arbitrary, this establishes that 𝐲𝕏LB(𝐆)\mathbf{y}\in\mathbb{X}_{\mathrm{LB}}(\mathbf{G}).

References

  • [1] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, 2013, doi:10.1109/MSP.2012.2235192.
  • [2] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, 2018, doi:10.1109/JPROC.2018.2820126.
  • [3] D. Owerko, F. Gama, and A. Ribeiro, “Optimal power flow using graph neural networks,” in IEEE Int. Conf. Acoust., Speech and Signal Process.   IEEE, May 2020, pp. 5930–5934, doi:10.1109/ICASSP40776.2020.9053140.
  • [4] F. Gama, Q. Li, E. Tolstaya, A. Prorok, and A. Ribeiro, “Synthesizing decentralized controllers with graph neural networks and imitation learning,” Oct. 2021, eprint:arXiv 2012.14906v3.
  • [5] W. Huang, T. A. W. Bolton, J. D. Medaglia, D. S. Bassett, A. Ribeiro, and D. Van De Ville, “A graph signal processing perspective on functional brain imaging,” Proc. IEEE, vol. 106, no. 5, pp. 868–885, 2018, doi:10.1109/JPROC.2018.2798928.
  • [6] J. Ma, W. Huang, S. Segarra, and A. Ribeiro, “Diffusion filtering of graph signals and its use in recommendation systems,” in IEEE Int. Conf. Acoust., Speech and Signal Process.   IEEE, 2016, pp. 4563–4567, doi:10.1109/ICASSP.2016.7472541.
  • [7] F. Gama, J. Bruna, and A. Ribeiro, “Stability properties of graph neural networks,” IEEE Trans. Signal Process., vol. 68, pp. 5680–5695, 2020, doi:10.1109/TSP.2020.3026980.
  • [8] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, “Network motifs: Simple building blocks of complex networks,” Science, vol. 298, no. 5594, pp. 824–827, 2002, doi:10.1126/science.298.5594.824.
  • [9] C. N. Tzouanas, S. Kim, K. N. Badhiwala, B. W. Avants, and J. T. Robinson, “Hydra vulgaris shows stable responses to thermal stimulation despite large changes in the number of neurons,” iScience, vol. 24, no. 6, p. 102490, 2021, doi:10.1016/j.isci.2021.102490.
  • [10] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Appl. Comput. Harmonic Anal., vol. 30, no. 2, pp. 129–150, 2011, doi:10.1016/j.acha.2010.04.005.
  • [11] D. I. Shuman, C. Wiesmeyr, N. Holighaus, and P. Vandergheynst, “Spectrum-adapted tight graph wavelet and vertex-frequency frames,” IEEE Trans. Signal Process., vol. 63, no. 16, pp. 4223–4235, 2015, doi:10.1109/TSP.2015.2424203.
  • [12] T. M. Roddenberry, F. Frantzen, M. T. Schaub, and S. Segarra, “Hodgelets: Localized spectral representations of flows on simplicial complexes,” 2021, eprint:arXiv 2109.08728.
  • [13] N. M. Kriege, F. D. Johansson, and C. Morris, “A survey on graph kernels,” Appl. Netw. Sci., vol. 5, no. 1, pp. 1–42, 2020, doi:10.1007/s41109-019-0195-3.
  • [14] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt, “Efficient graphlet kernels for large graph comparison,” in Int. Conf. Artificial Intell., Statist.   PMLR, 2009, pp. 488–495.
  • [15] C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel, “Faster kernels for graphs with continuous attributes via hashing,” in Int. Conf. Data Mining.   IEEE, 2016, pp. 1095–1100, doi:10.1109/ICDM.2016.0142.
  • [16] M. Togninalli, E. Ghisu, F. Llinares-López, B. A. Rieck, and K. Borgwardt, “Wasserstein Weisfeiler-Lehman graph kernels,” Conf. Neural Inform. Process. Syst., vol. 9, 2020.
  • [17] P. de Haan, T. Cohen, and M. Welling, “Natural graph networks,” 2020, eprint:arXiv 2007.08349.
  • [18] B. Bevilacqua, F. Frasca, D. Lim, B. Srinivasan, C. Cai, G. Balamurugan, M. M. Bronstein, and H. Maron, “Equivariant subgraph aggregation networks,” 2021, eprint:arXiv 2110.02910.
  • [19] M. Zhang and P. Li, “Nested graph neural networks,” Conf. Neural Inform. Process. Syst., vol. 34, 2021.
  • [20] Q. Zhu, C. Yang, Y. Xu, H. Wang, C. Zhang, and J. Han, “Transfer learning of graph neural networks with ego-graph information maximization,” Conf. Neural Inform. Process. Syst., vol. 34, 2021.
  • [21] T. Maehara and H. NT, “Learning on random balls is sufficient for estimating (some) graph parameters,” Conf. Neural Inform. Process. Syst., vol. 34, 2021.
  • [22] J. Leskovec and J. Mcauley, “Learning to discover social circles in ego networks,” Conf. Neural Inform. Process. Syst., vol. 25, 2012.
  • [23] T. Tao, An introduction to measure theory.   Amer. Math. Soc., 2011, vol. 126, doi:10.1090/gsm/126.
  • [24] J. M. Lee, An Introduction to Smooth Manifolds.   Springer, 2012, doi:10.1007/978-1-4419-9982-5.
  • [25] F. Ji, W. P. Tay, and A. Ortega, “Graph signal processing over a probability space of shift operators,” 2021, eprint:arXiv 2108.09192.
  • [26] N. Shervashidze, P. Schweitzer, E. J. Van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-Lehman graph kernels,” J. Mach. Learning Res., vol. 12, no. 9, 2011.
  • [27] F. Gama and A. Ribeiro, “Ergodicity in stationary graph processes: A weak law of large numbers,” IEEE Trans. Signal Process., vol. 67, no. 10, pp. 2761–2774, Apr. 2019, doi:10.1109/TSP.2019.2908909.
  • [28] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear network operators,” IEEE Trans. Signal Process., vol. 65, no. 15, pp. 4117–4131, 2017, doi:10.1109/TSP.2017.2703660.
  • [29] M. Avella-Medina, F. Parise, M. T. Schaub, and S. Segarra, “Centrality measures for graphons: Accounting for uncertainty in networks,” IEEE Trans. Network Sci. Eng., vol. 7, no. 1, pp. 520–537, 2018, doi:10.1109/TNSE.2018.2884235.
  • [30] L. Ruiz, L. Chamon, and A. Ribeiro, “Graphon signal processing,” IEEE Trans. Signal Process., vol. 69, pp. 4961–4976, 2021, doi:10.1109/TSP.2021.3106857.
  • [31] C. Borgs, J. T. Chayes, H. Cohn, and N. Holden, “Sparse exchangeable graphs and their limits via graphon processes,” J. Mach. Learning Res., vol. 18, no. 210, pp. 1–71, 2018. [Online]. Available: http://jmlr.org/papers/v18/16-421.html
  • [32] S. Segarra and A. Ribeiro, “Stability and continuity of centrality measures in weighted graphs,” IEEE Trans. Signal Process., vol. 64, no. 3, pp. 543–555, 2015, doi:TSP.2015.2486740.
  • [33] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, 1998, doi:10.1038/30918.
  • [34] B. Bollobás, Modern graph theory.   Springer Science & Business Media, 1998, vol. 184, doi:10.1007/978-1-4612-0619-4.
  • [35] F. Gama, A. G. Marques, G. Mateos, and A. Ribeiro, “Rethinking sketching as sampling: A graph signal processing approach,” Signal Process., vol. 169, pp. 107 404 (1–15), Apr. 2020, doi:10.1016/j.sigpro.2019.107404.
  • [36] F. Gama and S. Sojoudi, “Distributed linear-quadratic control with graph neural networks,” Signal Process., Feb. 2022, doi:10.1016/j.sigpro.2022.108506.
  • [37] L. De Branges, “The Stone-Weierstrass theorem,” Proc.Amer. Math. Soc., vol. 10, no. 5, pp. 822–824, 1959, doi:10.1090/S0002-9939-1959-0113131-7.
  • [38] B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. Lanckriet, “On integral probability metrics, ϕ\phi-divergences and binary classification,” 2009, eprint:arXiv 0901.2698.
  • [39] L. Ruiz, F. Gama, and A. Ribeiro, “Graph neural networks: Architectures, stability, and transferability,” Proc. IEEE, vol. 109, no. 5, pp. 660–682, 2021, doi:10.1109/JPROC.2021.3055400.
  • [40] C. Villani, Optimal transport.   Springer, 2009, vol. 338, doi:10.1007/978-3-540-71050-9.
  • [41] F. Gama, E. Isufi, G. Leus, and A. Ribeiro, “Graphs, convolutions, and neural networks: From graph filters to graph neural networks,” IEEE Signal Process. Mag., vol. 37, no. 6, pp. 128–138, Nov. 2020, doi:10.1109/MSP.2020.3016143.
  • [42] L. Lovász, Large networks and graph limits.   Amer. Math. Soc., 2012, vol. 60, doi:10.1090/coll/060.
  • [43] K. Schmüdgen, The moment problem.   Springer, 2017, vol. 9, doi:10.1007/978-3-319-64546-9.
  • [44] K. Weierstrass, “Über die analytische Darstellbarkeit sogenannter willkürlicher Functionen einer reellen veränderlichen,” Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften zu Berlin, vol. 2, pp. 633–639, 1885.