On Local Distributions in Graph Signal Processing
Abstract
Graph filtering is the cornerstone operation in graph signal processing (GSP). Thus, understanding it is key in developing potent GSP methods. Graph filters are local and distributed linear operations, whose output depends only on the local neighborhood of each node. Moreover, a graph filter’s output can be computed separately at each node by carrying out repeated exchanges with immediate neighbors. Graph filters can be compactly written as polynomials of a graph shift operator (typically, a sparse matrix description of the graph). This has led to relating the properties of the filters with the spectral properties of the corresponding matrix – which encodes global structure of the graph. In this work, we propose a framework that relies solely on the local distribution of the neighborhoods of a graph. The crux of this approach is to describe graphs and graph signals in terms of a measurable space of rooted balls. Leveraging this, we are able to seamlessly compare graphs of different sizes and coming from different models, yielding results on the convergence of spectral densities, transferability of filters across arbitrary graphs, and continuity of graph signal properties with respect to the distribution of local substructures.
1 Introduction
Graph filters are a fundamental building block in graph signal processing, where cascaded applications of a graph shift operator model diffusion on the nodes of a graph [1, 2]. The analogy between filtering in discrete time and filtering on graphs has led to a fruitful research direction, with applications including power systems [3], robotics [4], neuroscience [5], and recommender systems [6]. Due to their typical implementation as low-degree matrix polynomials, graph filters are local operators, where the output of a graph filter at a given node is strictly dependent on the connectivity structure and signal values on the node’s local neighborhood. This highlights an invariance property of graph filters, typically summarized by the property of permutation equivariance. However, the equivariance of graph filters is much stronger than not being sensitive to permutations of nodes. If the same filter is applied to two different graphs, and two nodes within each of those graphs have identical neighborhoods, then the graph filter output at those nodes will be identical as well [7]. Indeed, graph filters in their usual implementation are equivariant to local substructures, which have been shown to be of primary importance in real-world networks [8], often leading to useful properties such as scale invariance and robustness [9].
The importance of local substructures is an incipient development in the literature. This view has been considered before in the graph wavelet literature [10, 11, 12], where wavelet atoms are constructed by applying graph filters to impulse functions on each node in the graph, yielding a dictionary of atoms with known spectral properties that also exhibit (approximate) spatial localization. As a representational tool for graphs predating the development of graph signal processing, graph kernel methods have put forth the idea of graphs as bags of motifs [13], such as in the paper on graphlets [14]. Works such as [15, 16] have considered the extension of graph kernels for graphs with continuous labels, typically via a neighborhood aggregation or discretization approach. In [17], equivariance to local structures is proposed as a more useful invariant for graph neural networks, as opposed to permutation equivariance. For instance, [18] considers neural networks for graph classification that act on small subgraphs over an entire graph, and [19] considers how to design graph neural networks that can recognize structures more expressive than those of the Weisfeiler-Lehman test. The treatment of graphs as distributions of ego-networks in [20] was used to devise a novel loss function for training of graph neural networks based on the maximization of mutual information. Additionally, local structures at each node need not be deterministic, as [21] considers random walk features and defines the notion of an estimable parameter under such a model.
In this work, we develop machinery for reasoning about basic notions in graph signal processing, with the primacy of local substructures in mind. After reviewing basic definitions for graph signal processing in Section 2, we make the following theoretical contributions:
-
1.
We introduce rooted graphs and rooted graph filters as vehicles for describing localized graph filters. In doing so, we develop a measure-theoretic view of graph signal processing that considers distributions of rooted balls and signals supported on them (Section 3).
-
2.
Within the proposed framework, we illustrate how convergence of Fourier spectra of graph signals can be understood in terms of weak convergence of measures (Section 4).
-
3.
We apply the proposed framework to yield a principled understanding of the transferability of graph summaries via integral probability metrics (Section 5).
-
4.
To highlight the flexibility of the proposed approach, we extend the relationship between distributions of local graph structures and the Fourier spectra of graph signals to weighted graphs (Section 6).
-
5.
We identify graphings and signals supported on them as the appropriate limiting objects in the proposed framework, and develop basic notions of graphing signal processing. We prove that the proposed framework applies directly to graphing signal processing in a natural way (Section 7), yielding a suitable spectral theory.
2 Graph Signal Processing
Consider an undirected graph where is the set of nodes and is the set of edges. Since the graph is undirected, it holds that if edge for , then . We further assume the graph to have no self-loops, i.e., for all , and to be unweighted. An extension to weighted graphs can be found in Section 6.
The neighborhood of a given collection of nodes is defined as
(1) |
That is, the neighborhood of a collection of nodes is that set of nodes as well as those nodes immediately connected to it. The -hop neighborhood can then be conveniently defined in a recursive manner as for integers and with . Note that the -hop neighborhood of a singleton is denoted as , and that its degree can be easily computed as .
Graph signals can be associated with a graph structure and are defined as a map between the node set and the real numbers . That is, a graph signal simply attaches a single real number to each node of the graph . For a given graph , we can define the space of graph signals as .
A graph shift operator (GSO) can also be associated with the graph structure as a means of relating graph signals explicitly with the underlying graph support. More precisely, the GSO is defined as a linear operator between graph signals such that the output graph signal is computed as
(2) |
The operation (2) shifts the signal around the graph and is analogous to the time-shift in discrete-time signal processing. Note that the GSO can be completely characterized by specifying the adjacency rule that assigns the values for every . Examples of GSOs that have found widespread use in the literature include the adjacency matrix, the Laplacian matrix, the Markov transition matrix, and their normalized counterparts. While it is technically possible to design adjacency rules that lead to arbitrary values of , we only consider rules that are determined exclusively by the combinatorial structure of the graph.
A graph filter can then be built from the GSO. More specifically, given a collection of scalar coefficients , a -tap graph filter is defined as the linear mapping between two graph signals given by
(3) |
where denotes repeated applications of the GSO (see (2)) to the input signal . The operator is understood to be the identity map on .
Graph filters are linear and local operators. They are linear in the input graph signal . They are local in that they only require information up to the -hop neighborhood. More specifically, the output of the graph filter at each node can be computed by repeated exchanges of information with one-hop neighbors , thus collecting the signal values contained in up to the -hop neighborhood . Each node can compute their output separately from the rest. The key observation is that the nodes need not know the global structure of the whole graph but only their local neighborhood. Thus, graph filters can be analyzed and understood entirely from looking at this local neighborhood structure.
3 A Local Framework for GSP
The local nature of graph filters calls for a local framework to analyze their effects. Towards this end, define a rooted -ball as a graph with a root such that all nodes are at most -hops away from the root, i.e., . Note that, since the rooted -ball is a graph, the notions of graph shift operator and graph signal are immediately well-defined. We denote these as and , respectively. For ease of exposition, the specification of the root node will be dropped, unless necessary to avoid confusion.
Of particular interest is the case of rooted -balls obtained as the induced subgraph of the -hop neighborhood of a node. More specifically, given a graph , a rooted -ball can be constructed by selecting a node , setting and . In this context, the graph signal corresponding to the rooted -ball can be tied to the graph signal defined on by setting .
We can now formalize the notion that a graph filter only relies on local information.
Proposition 1.
Let be a graph with graph signal and a GSO . For a node , denote the corresponding rooted -ball as , with graph signal and GSO , respectively. Then, for any -tap filter , the following equality holds
(4) |
We note that for (4) to hold, the adjacency rule used to construct and has to be the same, and it is assumed to depend only on the combinatorial structure of the graphs; see Section 2.
Proposition 1 formalizes the well-known fact that a -tap graph filter evaluated at a node only depends on the subgraph of -hops centered at that node (i.e., the ego-network of the node [22]). In this sense, it is not necessary to understand how a graph filter behaves on the global structure of a graph. Rather, one only needs to understand the behavior of a graph filter locally, in particular on the rooted -balls of the graph. This naturally leads to a weaker form of the property of permutation equivariance, discussed next.
For two graphs and , we say that a map is a -morphism if it preserves the structure of rooted -balls for all nodes in . The structure of two rooted -balls and with signals and , respectively, is preserved, if there exists an isomorphism such that , if and only if , and for all . Then, a -morphism between two graphs and with associated signals and , respectively, satisfies that the rooted -balls and are isomorphic for all nodes . Note that -morphisms are not necessarily invertible, nor are they necessarily surjective.
Graph filters are invariant under -morphisms, as it follows from Proposition 1, where we see that the same filter applied to two different graphs yields the same result if the local structures are the same.
Corollary 1.
Consider two graphs and with corresponding signals and . Denote by the -rooted ball for root , and by and the corresponding GSO and graph signal, respectively. If there exists a -morphism , then for any -hop filter it holds that
(5) |
for all .
Note that Corollary 1 is a generalization of the permutation equivariance property that graph filters have [7]. More generally, Corollary 1 opens up ways to compare the performance of a fixed graph filter across two different graphs with different associated graph signals. It would be expected then, that if two graphs have similar distributions of rooted -balls with associated signal, then a fixed graph filter acting on one should yield similar results on the other. We formalize these notions in the ensuing discussion.
3.1 Distributions of Rooted Balls
Proposition 1 states that the output of a -tap graph filter at each node depends only on the rooted -ball at that node and the corresponding graph signal values. Therefore, to characterize the effect of graph filtering, it suffices to characterize the distribution of rooted -balls of a graph. Towards this end, we will construct a sample space, -field, and a corresponding measure to describe this distribution.
Denote by a rooted -ball and by its corresponding signal space. Consider now the space constructed by the disjoint union of all signal spaces on all possible rooted -balls (up to isomorphism) with , given by111In this context, each signal space is considered modulo the action of the automorphism group of . This is clarified in Appendix A.
(6) |
The construction of the space is illustrated in Fig. 1. Essentially, one considers a different space (typically, -dimensional vectors ) for each possible rooted -ball , and then puts them all into a common space where each original space is isolated. An element is described by both the rooted -ball and the graph signal space it indexes, i.e., . Making the identification , the point can be thought as a -dimensional vector associated with the rooted -ball . Note that, while it may happen that because , these are different component spaces in the disjoint union as they are indexed by different rooted balls and , respectively.
To define the corresponding -field associated with , note that the disjoint union of graph signal spaces (i.e., Euclidean spaces) has a natural topology defined over it, namely the disjoint union topology. Thus, the Borel -field generated from the topology of is a proper -field. Intuitively, measurable sets in are constructed by selecting some rooted -ball and a Borel set in the space of graph signals , then considering the corresponding set in .
A measure can be properly defined now that the -field associated with has been constructed. Let be a map that takes a node and returns the corresponding rooted -ball and signal, i.e.,
(7) |
Note that . Then, for a Borel set , the measure is defined as
(8) |
Recall that each element in can be thought of as a rooted -ball with an associated signal. Formally, is the pushforward of the uniform measure on the nodes of by the sampling map , which can be denoted as [23]. We observe that any measure on the nodes of the graph can be used to replace the RHS of (8).
The space serves as a natural domain in which to characterize -tap graph filters. Indeed, Proposition 1 indicates that the behavior of a -tap graph filter can be fully characterized by its behavior over . That is to say, any given -tap graph filter has a corresponding map such that for any graph with GSO and signal , (4) can be rewritten as:
(9) |
To use the terminology of quotient spaces [24], (9) indicates that a -tap graph filter passes, or descends, to a unique map on via the map . Thus, in the context of Proposition 1 and graph filtering, it is sufficient to characterize local operations over , rather than with respect to the global structure of a given graph .
Now that the local framework based on rooted balls has been introduced, we proceed to show how it can be leveraged to obtain theoretical results that provide insight into the inner workings of graph signal processing. In Section 4 we discuss how to understand power spectrum densities, while in Section 5 we discuss transferability of graph filters across graphs of different size.
Remark 1 (Distributions of edge sets).
In [25], the authors consider a pushforward probability measure into the Fourier domain when there is an underlying probability distribution of edges in a fixed graph. This allows for the modeling of graph signals under uncertainty on the edge set. Such a distribution of edges can be incorporated into the distribution via the pushforward map , allowing for a description of the distribution of rooted balls in a graph under a probability distribution on the edges.
Remark 2 (Rooted subtrees).
Many authors have considered graph processing methods that are equivariant to local subtree structures [19], such as architectures whose computation resembles the -Weisfeiler-Lehman test[13]. In the proposed framework, these subtree structures are strictly “less expressive” than rooted ball structures, in the sense that the rooted subtree of depth centered at a given node can be determined completely by the rooted -ball centered at that node. However, the rooted -ball cannot be determined by the rooted subtree of depth , as evidenced by the inability of the -Weisfeiler-Lehman test to distinguish certain structures [26]. With this in mind, the proposed framework could certainly be applied where the space is constructed with rooted subtrees, rather than rooted balls. However, given that these rooted balls are more expressive than rooted subtrees [19], we state all proceeding results using the rooted ball construction.
4 Power Spectral Density
One of the key ideas in graph signal processing is that of the graph Fourier transform of a graph signal [2]. For a graph on nodes, consider the GSO to be the graph Laplacian, which is a positive semidefinite Hermitian matrix. Thus, it admits the following eigendecomposition
(10) |
for eigenvalues with corresponding pairwise orthogonal eigenvectors . One can check that for any of these eigenvectors, . The graph Laplacian induces a quadratic form on a graph signal that measures a useful notion of smoothness [2]. For a given graph signal on , the quadratic form can be shown to be equal to
(11) |
That is, the quadratic form of the graph Laplacian measures the sum of the squared differences of the graph signal at neighboring pairs of nodes. A signal is called smooth if the quadratic form takes a small value relative to its norm. For instance, the eigenvectors of with small corresponding eigenvalues are smooth signals.
We note that this notion of smoothness can be extended to normalized versions of the graph Laplacian in a similar fashion. Furthermore, a notion of smoothness can also be defined using the adjacency matrix [27]. In any case, for ease of exposition and conceptual simplicity, we assume that is the graph Laplacian throughout this section, but we remark that the results derived herein are also valid for any other GSO built from adjacency rules that rely on the combinatorial structure of the graph.
Since the eigenvectors form an orthonormal basis for , any graph signal admits a unique representation as a linear combination of the eigenvectors of . That is, for any graph signal , there exists a set of coefficients such that the following holds:
(12) | ||||
(13) |
Thus, the representation of a graph signal as a weighted sum of the eigenvectors can be used to conveniently compute the quadratic form in terms of the spectrum. In an analogy to complex exponential functions being the eigenfunctions of the Laplace operator in discrete-time signal processing, we call the representation of the graph signal by the coefficients the graph Fourier transform (GFT).
When coupled with their corresponding eigenvalues , the coefficients can be used to describe the distribution of energy in a graph signal, in a way concordant with the notion of smoothness described by the Laplacian quadratic form (11). To capture this, define the normalized power spectral distribution of a graph signal . Given a graph signal with Fourier coefficients and corresponding Laplacian eigenvalues , define
(14) | ||||
One can easily see that is a monotone, right-continuous function, with for , and at most discontinuities (one at each eigenvalue ). Moreover, there is a convenient expression for the moments of :
(15) |
That is to say, the moments of the normalized power spectral distribution are given by scaled quadratic forms of powers of the GSO. Observe that the quadratic form (11) can be expressed as . Indeed, the moments of the graph Fourier transform are an essential quantity in the design of low-order graph filters [28]. This points to a very useful idea: the powers of the GSO are the most primitive types of graph filters, so the proposed local framework provides a path for understanding these moments in terms of functions on the space of distributions of rooted balls.
To see this, let us examine (15). Notice that
(16) |
As before, let be the GSO of the -ball rooted at node . By Proposition 1, the above equation can be written as
(17) |
That is, the moments of the normalized power spectral distribution of a graph signal can be written as an average of operations that resemble local filtering. In the proposed local framework, this has the following interpretation.
Proposition 2.
Let be a graph and be a graph signal. For a given , define the map
(18) | ||||
(19) |
where denotes the Laplacian of the rooted -ball.
Letting be the probability distribution on determined by , the moment of the normalized power spectral distribution is given by the equation
(20) |
Having reduced the moments of the normalized power spectral distribution to an average over the distribution of rooted balls, we can now reason about the convergence of graph Fourier distributions in terms of weak convergence of measure for graphs with bounded degree . Although the limit of dense graphs has been considered before via graphon models [29, 30], this cannot capture the behavior of bounded degree graphs, as all infinite sequences of growing graphs of bounded degree converge to the zero graphon [31]. However, in the case of sparse graphs, understanding the descension of maps to the space is a feasible approach, with the following compactness property.
Lemma 1.
For given integers and any collection of graphs in with uniformly bounded signals, there exists a compact subspace such that the support of the probability measures corresponding to the graphs/graph signals is contained in .
The proof of Lemma 1 can be found in Appendix C. The constraint of a graph having bounded degree, for instance coming from physical constraints in a real-world system, corresponds to a compact description in the space . From this, the following convergence result holds.
Theorem 1.
Let integers be given. Let be a sequence of graphs and graph signals such that each and the collection of signals is uniformly bounded. Let be the probability measure on associated with for each , and the corresponding normalized power spectral distribution function.
If the measures converge weakly, then the th moments of the normalized power spectral distribution functions converge. Moreover, if weak convergence of measure holds for all , then the normalized power spectral distribution functions converge weakly.
The proof of Theorem 1 can be found in Appendix B. Theorem 1 casts the power spectral density of a graph signal in terms of the distribution of local structures in the graph. In particular, we treat the power spectral density as a function from graphs with graph signals to Borel measures on a compact subset of the real line, then prove that this function is continuous with respect to the distribution of rooted balls in the graph, where continuity is understood to be with respect to the weak topology for both the distribution in and the power distribution of the signal. Although the frequency domain representation of a graph signal is typically understood to be a representation in terms of the global graph structure, owing to the global support of the graph Laplacian’s eigenvectors, this results indicates that it is in essence still subject to fundamentally local phenomena, namely the rooted balls in the graph.
This characterization of the power spectral distribution in terms of the distribution of rooted balls also reflects the local nature of graph filtering. Indeed, as discussed in [28], the moments of the power spectral distribution are key in the understanding of graph filters. More broadly, when designing a graph filter, we often aim to construct a matched filter in the frequency domain, so that the response of the filter aligns with that of the signal, subject to some noise model (additive white Gaussian noise, for instance). Additionally, constraints such as a filter having taps reduces the space of possible frequency responses to degree polynomials in the frequency domain, so that the moments become the basic values of interest. This is an immediate consequence of graph filters admitting a distributed implementation [28], so that the expression of a graph filter’s performance in terms of the power spectrum of the signal is actually a distillation of the distribution of rooted balls in .
We have shown how connecting the moments of the power spectral distribution to the space yields insights about how features of graphs and graph signals are dependent only on localized information. In particular, a function was constructed whose expected value under the pushforward measure of a graph computes the moments of the power spectral distribution. This machinery can be immediately extended to other functions on graphs that act locally, allowing us to compare the behavior of a given function on two graphs in terms of their distributions of rooted balls.
5 Transferability
Let be a function mapping a graph and its corresponding graph signal to a real number. This function is typically known as a graph summary and examples include the graph power spectral distribution discussed in the previous section, as well as the average node degree or node centrality values [32], the clustering coefficient [33], or the conductance [34]. We can also consider cost functions for a given task as graph summaries, since they ultimately take a graph and its signal as input, and output a real number [35, 36]. In this context, we want to study how the function changes across two different graphs with different graph signals and .
To study the transference of the graph summary across different graphs and graph signals, let us assume that there exists a function such that for any pair whose associated probability distribution on is denoted by , we have . This assumption is not too severe, essentially saying that the graph summary is an average over a function on the nodes, with the value at each node determined strictly by the structure of its local neighborhood. Then, consider the transfer equation
(21) |
as a way of measuring the transferability of , and where and are the distributions of rooted balls on the space of the graphs and , respectively.
To understand (21) in a quantitative manner demands a stronger structure on than the disjoint union topology. To this end, we then endow with the structure of a metric space, in a way that preserves the original topology. For an arbitrary constant , define a metric on as follows. For and , put
(22) |
where is inherited from the identification of with Euclidean space.222This metric is again understood modulo the automorphism group of the rooted ball, discussed in Appendix A. Note that is well defined since both and are vectors in the same -dimensional space when . One can check that is indeed a metric on for any . The metric is constructed in a way that preserves the local metric structure of each signal space , while also endowing the whole space with a global metric structure, at the cost of some local distortion in order to maintain the triangle inequality, as dictated by the constant . When is chosen to be very large, the local metric structure on each rooted ball of the space is well-preserved, but these rooted balls are otherwise kept “far apart” from each other. On the other hand, when is very small, the rooted balls become “close,” but the local metric structure on each of them is distorted (i.e. signals belonging to the same rooted ball that are beyond a distance of are indistinguishable from signals that belong to different rooted balls). Importantly, the metric topology on induced by is equal to the disjoint union topology (6), so that all notions of continuity and weak convergence of measure are preserved.
Given the additional metric structure on , we can quantitatively characterize the transfer equation (21) by comparing distributions of rooted balls on the metric space . To do so, we make the following assumption on the function .
Assumption 1.
For all rooted -balls , is -Lipschitz continuous with respect to the second argument over the space of bounded signals .
This is not a difficult assumption to satisfy. For instance, it is sufficient for to be continuously differentiable on each signal space for this to hold. In fact, the set of Lipschitz continuous functions on a compact space is dense in the set of continuous functions with respect to the uniform norm [37].
Theorem 2.
Let graphs with corresponding bounded signals be given. Denote their corresponding measures in by and , respectively. For a function that satisfies 1, it holds that
(23) |
where denotes the -Wasserstein distance between and under the metric .
The proof of Theorem 2 can be found in Appendix D. Theorem 2 bounds the transfer equation (21) in terms of the -Wasserstein distance between the two distributions of rooted balls. The appearance of the -Wasserstein distance in this setting is intuitive: the dual formulation of the -Wasserstein distance defines the distance between two distributions via an integral probability metric over -Lipschitz functions [38], yielding a transferability bound that holds for all Lipschitz functions. We illustrate the notion of the -Wasserstein distance between graphs in Fig. 2.
5.1 Examples and Discussion
Theorem 2 indicates that smoother functions generally transfer better than ones with large Lipschitz constant. Some examples of graph summaries that fulfill the conditions of Theorem 2 follow.
Example: Power spectral distribution. For any , the spectral moment function as defined in Section 4 descends to the expectation of the function . For any rooted -ball , it holds that is a polynomial of the graph signal in the second argument, and can thus be shown to satisfy 1. Thus, the difference in the spectral moments of two graphs can be bounded in terms of the Wasserstein distance of their respective distributions of rooted balls. Moreover, observe that for , the moment function can be defined on in a way that preserves the identity (20), so that we can compare the similarity of spectral moments between graph signals by comparing the Lipschitz constants of the functions and on the same space . Indeed, one can see that has a larger Lipschitz constant than , so that the bound given by Theorem 2 is tighter for the lower moments of the power spectral distribution, contingent upon a relatively small -Wasserstein distance between the distributions of rooted balls.
Example: MSE of a graph filter. Let be a fixed -tap graph filter, and let be given. For any given graph on nodes with associated signal and shift operator , put the function as
(24) |
where the expectation is taken over the random vector . In words, is the mean squared error when applying to under additive white Gaussian noise. Define on rooted -balls with shift operator as
(25) |
If we denote the measure on associated with by , one can show that
(26) |
Observe that is a polynomial of the values of for each rooted -ball , so that it is Lipschitz continuous on each bounded signal space, thus satisfying 1. Moreover, the Lipschitz constant of over each signal space can be shown to be proportional to , where is the signal taking value on the root node and elsewhere. That is to say, the Lipschitz constant is directly influenced by the “spread” of the impulse response of the filter. Intuitively, this indicates that while filters with large coefficients in their higher-order terms may suit a particular signal well, they may transfer across graphs poorly. This aligns with existing results on the stability of graph filters [7, 39].
Estimation of the transferability bound. The computation of the -Wasserstein distance in (23) may prove problematic for large graphs. However, given the definition of the -Wasserstein distance as an integral probability metric, it is not difficult to compute an emprical estimate of . By [38, Theorem 6], if and are i.i.d. samples drawn from the respective distributions and , then the -Wasserstein distance between the empirical distributions can be computed via a linear program. To draw i.i.d. samples from these distributions, one can choose a node uniformly at random (with replacement) from the node set of the underlying graph and then take the rooted -ball via the map . By [38, Corollary 10], the compact support of the distributions guarantees almost sure convergence of the empirical estimator to the true Wasserstein distance as .
Tighter bounds. In Theorem 2, the difference in expectation between two distributions of rooted balls is characterized in terms of their Wasserstein distance, which is defined in terms of a metric imposed upon . Given that the metric is defined for essentially arbitrary , one can derive a tighter bound on the transfer equation, with less stringent assumptions on the function . First, note that for two graphs and bounded signals , the corresponding measures have support in the compact subspace , where is as described in Lemma 1. Because of this, any continuous function on is bounded. For a function satisfying 1, define the range of as
(27) |
In the same vein as Theorem 2, it holds that
(28) |
This bound illustrates an interplay between the smoothness of , the range of , and the regime of similarity between the two graphs. If two graphs have similar distributions of rooted balls, then the smoothness of plays a stronger role in bounding transferability. If they have highly distinct distributions of rooted balls, the range of , as described by the value , plays a stronger role in bounding transferability.
Quantifying weak convergence. It is well known that in cases of bounded support, the Wasserstein distance between probability measures metrizes the weak topology on the set of probability measures on that space, i.e., the topology induced by the definition of weak convergence [40]. Therefore, convergence in the -Wasserstein distance is equivalent to weak convergence of measures. For instance, this implies via Theorem 1 that the moments of the graph Fourier transform are continuous with respect to the -Wasserstein distance between distributions of rooted balls.
Extensions to graph neural networks. Although we have considered maps on graphs with real-valued signals, the analysis in this section is equally applicable when the nodes have other types of features such as categorical labeling. For example, one could consider to be the cross-entropy loss for a fixed graph neural network with layers, where the “graph signal” is taken to be a set of discrete input features and a ground-truth label against which the loss is evaluated [41].
Wasserstein graph kernels. Wasserstein distances between graphs have been considered in the graph kernel literature [16], where node embeddings in Euclidean space are computed from local structures about each node, followed by a computation of the Wasserstein distance in Euclidean space. Given that the proposed computation for nodes with continuous attributes in this paper is smooth with respect to the node attributes, one can show that the Wasserstein distance between distributions in our case descends to theirs via a continuous map.
By endowing the space with an appropriate metric structure, we have applied the proposed framework to studying transferability of graph summaries that descend to maps on rooted -balls. In analyzing the relationship between the Lipschitz constant of the graph summary and the distribution of rooted balls, insights have been gleaned as to what drives commonalities between graphs in this regard. Although our focus in this section has been on fixed graph summaries, such as the loss of a fixed graph filter in Section 5.1, these ideas can be readily adapted to problems of designing graph summaries. For instance, the presence of the Lipschitz constant in Theorem 2 indicates that a graph filter or graph neural network will transfer better if it is designed to be smooth. This can be cast in the light of stability, where certain conditions on filter banks in graph neural networks can guarantee a response that is stable to structural perturbations [7, 39].
6 Extension to Weighted Graphs
In this section, we consider how ideas from the unweighted case (Sections 4 and 5) can be extended to weighted graphs. We define a weighted graph as a graph coupled with a weight function . For a given graph , the set of all possible weight functions on it is denoted . We use the same notation for the set of weight functions on a rooted graph: .
We now expand our definition of a rooted -ball with a signal to include weight functions. Define
(29) |
Similar to the definition of in (6), the space consists of all rooted -balls with both edge weights and graph signals on them. As before, a weighted graph with a signal yields elements of via a suitably defined sampling map , defined for nodes :
(30) |
where denotes the restriction of to the edges of . As before, a given weighted graph with a signal defines a probability measure on via the pushforward of the uniform measure by , which we denote by . For a weighted graph , define the weighted degree of a node as follows:
(31) |
The set of weighted graphs whose nodes have weighted degree at most is denoted by for real number .
We now develop a local description of the graph Fourier transform for weighted graphs, much like in Section 4. For a graph with weight function , define the weighted graph Laplacian as a shift operator such that, for any ,
(32) |
The weighted Laplacian for a graph on nodes is positive semidefinite, and admits an eigendecomposition , with a Fourier representation for graph signals defined in a way analogous to (12). For a given graph signal , there is a power spectral distribution associated with via the weighted Laplacian, whose moments are given by (15). We characterize the properties of the power spectral distribution in terms of the distribution of rooted -balls in the following theorem.
Theorem 3.
Let an integer and a real number be given. Let be a sequence of graphs, weight functions, and graph signals such that each graph is contained in and the signals are uniformly bounded. Let be the probability measure on associated with for each , and the corresponding normalized power spectral distribution function.
If the measures converge weakly, then the moments of the normalized power spectral distribution functions converge. Moreover, if weak convergence of measure holds for all , then the normalized power spectral distribution functions converge weakly.
The proof of Theorem 3 can be found in Appendix E. Theorem 3 establishes that the power spectral distribution of a graph signal on a weighted graph is continuous with respect to the weak topology of distributions of rooted -balls, under boundedness assumptions. Unlike the regime of Theorem 1, the convergence of the power spectral distribution for weighted graphs here is not dependent on the combinatorial constraint of having bounded node degrees. Rather, any node may have an arbitrarily large number of neighbors, as long as the total influence remains bounded. This is a realistic assumption for many systems, where an agent may be allowed to exert influence on a large collection of other agents, but the total influence is bounded by some physical constraint, such as power consumption.
Remark 3 (Zero-weight edges).
In many applications, an edge with an assigned weight of zero is treated as a non-edge. For instance, the moment function satisfies this condition. This motivates a quotient map on that “glues” rooted balls with edge-weights that take value zero to rooted balls where those edges are not present. Any function that is continuous on and also satisfying this condition will also be continuous under such a quotient map, so that our results still hold. The construction in this case is somewhat technical, so we omit it for simplicity of exposition.
7 In the Limit: Graphing Signal Processing
In Sections 5 and 6, the (weak) continuity of the power spectral density with respect to the distribution of rooted balls was established. Specifically, it was shown that weakly convergent sequences of distributions of rooted balls yielded weakly convergent power spectra. Given that these distributions correspond to underlying graphs, it remains to be established precisely what objects these sequences are converging to. In general, a weakly convergent sequence of finite graphs does not necessarily converge to a finite graph. When studying graph limits using graphon models, the homomorphism density of motifs is of primary concern [42]. This setting is slightly different, depending on the isomorphism density of rooted graphs. In this setting, a more appropriate model is that of a graphing. We show that the basic ideas in the discussion so far can be transferred directly to these limiting objects. Let us define the basic object of study for this section.
Definition 1.
A graphing of degree is a triplet such that
-
1.
is a sample space with a -field on
-
2.
is a probability measure on
-
3.
is such that, for all , we have
(33) |
where , with the condition that for all .
A graphing is a way to describe a graph with a potentially uncountable number of nodes (elements of ), yet with bounded node degrees. We tacitly assume that all graphings considered are of degree , for some . Unlike a graphon, which describes dense graphs with unbounded node degrees, the structures that arise from a graphing are typically sparse, and notions of graph filtering and graph shifts reduce to finite sums, rather than continuous integrals. Given a graphing , a graphing signal is a map between the nodes and the real numbers such that is a measurable function. Endowing this with the usual vector space structure, we define the space of graphing signals as
(34) |
Much like how a graphon describes a model for dense random graphs, a graphing carries with it a distribution of random rooted graphs. As before, let be the neighborhood operator, returning all of the one-hop neighbors of a node , so that is the -hop neighborhood operator. For a graphing with associated signal , the rooted -ball at a node is defined in the same way as in Section 3, denoted by , with the same notation used to denote the corresponding graph signal . Due to the bounded degree condition on the graphing, is always a finite rooted graph of maximum degree at most . With this in mind, we define a sampling map for graphings much like the sampling map (7) for finite graphs:
(35) |
The sampling map is illustrated in Fig. 3 for a simple graphing. Much like for finite graphs, we can pushforward the measure to yield a probability measure on . That is to say, , in the same way as before, where we adopt as the probability measure over to be pushed forward. This differs slightly from the case of finite graphs, where we implicitly assume that the initial measure was the uniform probability measure on the finite node set.
To establish the basic building blocks for signal processing on graphings, we first construct a Laplacian for graphing signals. Unlike graph signals, graphing signals are not necessarily supported on a finite, or even countable space. We specify the graphing Laplacian as a linear operator on the space of graphing signals in the following way. For any and , put
(36) |
By the degree boundedness condition from Definition 1, the sum in (36) is well-defined. Due to its singular nature, it is difficult to define a “graphing Fourier transform” directly from the Laplacian: for instance, even fairly tame graphing structures may have a Laplacian whose eigenvalues have uncountable multiplicity. This, for instance, makes the notion of the power spectral distribution of a graphing signal unwieldy.
However, the spectral properties of a graphing signal with respect to the underlying graphing structure can be studied indirectly. Let be a graphing with graphing signal and Laplacian . Then, for integers , we tentatively define
(37) |
where again indicates repeated applications of the Laplacian. See that this definition resembles that of the moments of the power spectral distribution of a graph signal in (15). We define the values tentatively, as it is not obvious when they are finite, or even well-defined. In the following discussion, we establish sufficient conditions under which the sequence exists and corresponds to the moments of a distribution function.
Much like in Sections 4 and 5, we control the behavior of the graphing by bounding its size. We first do this by determining a suitable notion of boundedness for a graphing signal.
Definition 2.
Let be a graphing and be a graphing signal. The signal is said to be locally essentially bounded if it is bounded on almost all of the connected components of . That is, there exists an such that for all ,
(38) |
where denotes the set of nodes such that , and is the -hop neighborhood of a set. Denote the set of all such graphing signals by .
In words, local essential boundedness not only controls the size of the set of nodes with large signal value (essential boundedness), but also controls the size of the neighborhoods with which those nodes can interact. Local essential boundedness is an analog of boundedness for finite graph signals adapted to graphing signals. For instance, a graphing signal that is strictly bounded is locally essentially bounded. This condition, however, is stronger than being bounded almost everywhere (see Lemma 4), due to the neighborhood condition in (38). The necessity of this condition stems from the highly singular nature of the graphing Laplacian as an operator on signals.
We will also find it useful to consider graphings that “resemble” finite graphs in some sense.
Definition 3.
Let be a graphing with graphing signal , and a sequence of graphs with associated graph signals. For all , put , and . Then, we say that the sequence of graphs converges weakly to if for all , converges to as tends to infinity. We denote weak convergence of graphs to a graphing by .
Much like Theorem 1, we can characterize the properties of a graphing based on the limiting properties of a sequence of graphs that converges to it. We state this formally below.
Theorem 4.
Let a graphing of degree with signal be given. Suppose there exists a sequence of graphs and graph signals such that . Then, there exists a unique distribution function supported on such that the moments of are given by the sequence .
The proof of Theorem 4 can be found in Appendix F. Theorem 4 defines the power spectral distribution for sufficiently nice graphing signals. In particular, if the graphing and graphing signal are the limit of a sequence of graphs and graph signals, then asking that the power spectral distribution be continuous with respect to the weak topology on distributions forces the distribution to take on a unique value. Namely, the power spectral distribution of the limit of a sequence of graphs and graph signals is merely the weak limit of the power spectral distributions of the graphs as defined in (14). To show that this definition properly preserves the graph power spectral distribution of finite graphs, we illustrate with a simple example.
Example 1.
From a finite graph on nodes and a graph signal , a graphing with graphing signal can be constructed in the following way (cf. [42, Example 18.16]). Identify with the integers . Let with the usual Borel -field and the uniform (Lebesgue) probability measure . We construct a graphing in the following way. Partition into intervals indexed by . Then, for each with , and every point , put the points into the edge set . Additionally, assign .
Under this construction, one can show that for all , the distributions on corresponding to and are equal, i.e., . It holds, then, that for all , so that the power spectral distribution corresponding to the graphing signal is identical to that of the graph signal .
In Sections 4, 5 and 6, we considered how the distribution of rooted balls in graphs yields useful topological or metric structure with which graphs can be related to each other, particularly in light of graph summaries such as the power spectral distribution. In all of these cases, the topology of weak convergence was used to show that a convergent sequence of graphs in the sense of the distribution of rooted balls is also convergent in the sense of any appropriate graph summary. This left the question open: what do the graphs converge to? By considering graphings as limiting objects for sparse graphs, we have shown that the extension of graph summaries, such as the power spectral distribution, to graphings maintains continuity in the natural way.
Remark 4 (Hyperfinite graphings).
Theorem 4 hinges on the graphing and signal being the limit of a sequence of finite graphs and graph signals. As noted in [42, Conjecture 19.8], the question of whether all graphings are limits of sequences of graphs is still an open question. However, there is a class of graphings for which this is known to hold: namely, hyperfinite graphings. We refer the interested reader to [42, Chapter 21] for more details.
8 Conclusion
Taking into account the folklore knowledge that low-order graph filters only use local information on graphs, we have developed a local framework in which many aspects of graph signal processing are illuminated. In particular, by mapping a graph and graph signal to a probability distribution on a fixed space, we allow for the application of tools from probability theory to characterize similarities between graphs based on their local structures, even if they do not have a common node set. Using the topology of weak convergence of probability distributions, we define a topology on the space of graphs and graph signals, where notions of continuity can be understood: this was most obviously demonstrated in Section 4, where it was proved that the power spectral distribution of a graph signal is weakly continuous with respect to the distribution of rooted balls in a graph. This approach allows for relatively painless treatment of limiting objects, such as graphings, by simply decomposing them into their constituent distributions.
The analysis carried out in this paper provides proper justification for transferring graph filters across multiple graphs, given the assumption that they share local structural properties. With the particular suitability of this approach to highly sparse graphs, this complements existing work on transferability based on graphon models or assumptions of graphs as discretizing manifold structures. Leveraging the framework of this paper, it would be interesting to study more exotic types of functions, such as graph neural networks and graph kernels, or to incorporate the properties encoded by distributions of local structures into tasks such as network topology inference.
Appendix A Metric Topology of the Signal Space
Let a rooted ball be given. Denote by the automorphism group of maps , i.e., the set of all isomorphisms from to itself, with the group operation given by composition. The group acts on in a natural way. For any , define the action of on as the graph signal satisfying
(39) |
for all . Next, define an equivalence relation on where for all , and take the quotient space , whose elements are the equivalence classes under this relation. Define the metric on so that for any ,
(40) |
where is the norm on inherited from identification with . As is a finite group, this minimum is well-defined, and satisfies the triangle inequality, thus yielding a valid metric. When defining the space in Section 3.1 we treat with the structure of , from which the topology on is inherited. Similarly, we use the metric in Section 5 as defined on .
Appendix B Proof of Theorem 1
By Lemma 1, the sequence has compact support. Since is a continuous function, this implies that is a convergent sequence by the definition of weak convergence of probability measures. Applying Proposition 2, the moments converge, as desired.
For the case where weak convergence holds for all , we complete the proof with the following lemma.
Lemma 2.
Let be a compact subset of the real line, for some . Let be an array of numbers such that is the sequence of moments of a finite measure supported on , and that for each , where is some sequence of real numbers.
Then, the sequence uniquely corresponds to a distribution supported on , such that the sequence of distributions converges weakly to .
Lemma 2 is a routine derivation following from the Hausdorff moment property [43, Theorem 3.15] and the Weierstrass approximation theorem [44], so we omit the proof. In particular, let . The moments correspond to the power spectral distributions of the graph signals , which are supported on by the bounded degree assumption. Since converges for all , we have that the sequence of limits is the sequence of moments of a distribution supported on . Moreover, we have that is the weak limit of the power spectral distributions of the graph signals , as desired.
Appendix C Proof of Lemma 1
Denote by the set of rooted graphs such that the underlying graph has node degree bounded by . For any and any graph contained in , note that for all , it holds that the rooted -ball centered at is contained in . Moreover, it is straightforward to show that there are only finitely many rooted -balls contained in , up to isomorphism. For any and any rooted graph , denote the set of graph signals taking values in by . It is clear that is compact as a subset of .
Using these constructions, it immediately follows that for any graph with graph signal for some , the associated distribution satisfies
(41) |
Since there are only finitely many rooted -balls in , the above set is a disjoint union of finitely many compact spaces, and is thus a compact space itself, by the properties of the disjoint union of topological spaces. If a family of graphs in has uniformly bounded signals for some index set , this means that there is some such that for all , so that the condition in (41) holds for all graph/graph signals in the family, as desired.
Appendix D Proof of Theorem 2
We first establish that for a function satisfying 1, the metric preserves Lipschitz continuity over the subspace as defined in Lemma 1.
Lemma 3.
Let be a function satisfying 1. Then, for any , has a Lipschitz constant at most on the subspace with the metric , where
(42) |
Lemma 3 can be shown by directly applying the definition of Lipschitz continuity, so we omit the proof. By the dual formulation of the -Wasserstein distance [38], Lemma 3 implies that, for ,
(43) |
owing to the fact that , by assumption. Observing that is monotonically increasing with respect to , we can restrict our view to . Applying a simple change of variables yields the expression
(44) |
Taking and noting that by assumption yields the bound (23), as desired.
Appendix E Proof of Theorem 3
One can check that is continuous and bounded on due to the weighted degree bound and the uniform boundedness of the signals, so that is a convergent sequence by the definition of weak convergence of probability measures. Applying Proposition 2, the moments converge, as desired.
Appendix F Proof of Theorem 4
We begin by establishing two results regarding locally essentially bounded signals and the graphing Laplacian .
Lemma 4.
For a graphing , the set is a subset of , where denotes the space of essentially bounded functions on with respect to the probability measure .
We omit the proof of Lemma 4, as it follows directly from Definition 2.
Lemma 5.
is closed under application of the Laplacian , i.e., for any , we have that .
The proof of Lemma 5 can be found in Appendix G.
For any , the graphing signal is contained in by Lemma 5. The product of two functions in is contained in , so that the graphing signal is contained in , by Lemma 4. Thus, has finite value for all .
Let be given, and let be the function as defined in Proposition 2. One can check that , where . By the bounded degree property of the graphing and the assumption that (as a consequence of Lemma 4), Lemma 1 implies that is contained in a compact subset of . For each finite graph and graph signal , denote the moment of the signal by , and the pushforward measure of on the space by , so that , by Proposition 2. The assumption that implies that weakly converges to . Since has compact support and is a continuous function, this implies that . In other words, .
Appendix G Proof of Lemma 5
Let a degree graphing with graphing signal be given. By definition, there exists an such that for all , we have
(45) |
Put , and define the graphing signal , where denotes the graphing Laplacian. Observe, due to the bounded degree of the graphing, that
(46) |
so that, for all ,
(47) |
By the definition of local essential boundedness, the right hand side of (47) has measure zero (under the probability measure ), which implies that the left hand side also has measure zero. Since was arbitrary, this establishes that .
References
- [1] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, 2013, doi:10.1109/MSP.2012.2235192.
- [2] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, 2018, doi:10.1109/JPROC.2018.2820126.
- [3] D. Owerko, F. Gama, and A. Ribeiro, “Optimal power flow using graph neural networks,” in IEEE Int. Conf. Acoust., Speech and Signal Process. IEEE, May 2020, pp. 5930–5934, doi:10.1109/ICASSP40776.2020.9053140.
- [4] F. Gama, Q. Li, E. Tolstaya, A. Prorok, and A. Ribeiro, “Synthesizing decentralized controllers with graph neural networks and imitation learning,” Oct. 2021, eprint:arXiv 2012.14906v3.
- [5] W. Huang, T. A. W. Bolton, J. D. Medaglia, D. S. Bassett, A. Ribeiro, and D. Van De Ville, “A graph signal processing perspective on functional brain imaging,” Proc. IEEE, vol. 106, no. 5, pp. 868–885, 2018, doi:10.1109/JPROC.2018.2798928.
- [6] J. Ma, W. Huang, S. Segarra, and A. Ribeiro, “Diffusion filtering of graph signals and its use in recommendation systems,” in IEEE Int. Conf. Acoust., Speech and Signal Process. IEEE, 2016, pp. 4563–4567, doi:10.1109/ICASSP.2016.7472541.
- [7] F. Gama, J. Bruna, and A. Ribeiro, “Stability properties of graph neural networks,” IEEE Trans. Signal Process., vol. 68, pp. 5680–5695, 2020, doi:10.1109/TSP.2020.3026980.
- [8] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, “Network motifs: Simple building blocks of complex networks,” Science, vol. 298, no. 5594, pp. 824–827, 2002, doi:10.1126/science.298.5594.824.
- [9] C. N. Tzouanas, S. Kim, K. N. Badhiwala, B. W. Avants, and J. T. Robinson, “Hydra vulgaris shows stable responses to thermal stimulation despite large changes in the number of neurons,” iScience, vol. 24, no. 6, p. 102490, 2021, doi:10.1016/j.isci.2021.102490.
- [10] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Appl. Comput. Harmonic Anal., vol. 30, no. 2, pp. 129–150, 2011, doi:10.1016/j.acha.2010.04.005.
- [11] D. I. Shuman, C. Wiesmeyr, N. Holighaus, and P. Vandergheynst, “Spectrum-adapted tight graph wavelet and vertex-frequency frames,” IEEE Trans. Signal Process., vol. 63, no. 16, pp. 4223–4235, 2015, doi:10.1109/TSP.2015.2424203.
- [12] T. M. Roddenberry, F. Frantzen, M. T. Schaub, and S. Segarra, “Hodgelets: Localized spectral representations of flows on simplicial complexes,” 2021, eprint:arXiv 2109.08728.
- [13] N. M. Kriege, F. D. Johansson, and C. Morris, “A survey on graph kernels,” Appl. Netw. Sci., vol. 5, no. 1, pp. 1–42, 2020, doi:10.1007/s41109-019-0195-3.
- [14] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt, “Efficient graphlet kernels for large graph comparison,” in Int. Conf. Artificial Intell., Statist. PMLR, 2009, pp. 488–495.
- [15] C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel, “Faster kernels for graphs with continuous attributes via hashing,” in Int. Conf. Data Mining. IEEE, 2016, pp. 1095–1100, doi:10.1109/ICDM.2016.0142.
- [16] M. Togninalli, E. Ghisu, F. Llinares-López, B. A. Rieck, and K. Borgwardt, “Wasserstein Weisfeiler-Lehman graph kernels,” Conf. Neural Inform. Process. Syst., vol. 9, 2020.
- [17] P. de Haan, T. Cohen, and M. Welling, “Natural graph networks,” 2020, eprint:arXiv 2007.08349.
- [18] B. Bevilacqua, F. Frasca, D. Lim, B. Srinivasan, C. Cai, G. Balamurugan, M. M. Bronstein, and H. Maron, “Equivariant subgraph aggregation networks,” 2021, eprint:arXiv 2110.02910.
- [19] M. Zhang and P. Li, “Nested graph neural networks,” Conf. Neural Inform. Process. Syst., vol. 34, 2021.
- [20] Q. Zhu, C. Yang, Y. Xu, H. Wang, C. Zhang, and J. Han, “Transfer learning of graph neural networks with ego-graph information maximization,” Conf. Neural Inform. Process. Syst., vol. 34, 2021.
- [21] T. Maehara and H. NT, “Learning on random balls is sufficient for estimating (some) graph parameters,” Conf. Neural Inform. Process. Syst., vol. 34, 2021.
- [22] J. Leskovec and J. Mcauley, “Learning to discover social circles in ego networks,” Conf. Neural Inform. Process. Syst., vol. 25, 2012.
- [23] T. Tao, An introduction to measure theory. Amer. Math. Soc., 2011, vol. 126, doi:10.1090/gsm/126.
- [24] J. M. Lee, An Introduction to Smooth Manifolds. Springer, 2012, doi:10.1007/978-1-4419-9982-5.
- [25] F. Ji, W. P. Tay, and A. Ortega, “Graph signal processing over a probability space of shift operators,” 2021, eprint:arXiv 2108.09192.
- [26] N. Shervashidze, P. Schweitzer, E. J. Van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-Lehman graph kernels,” J. Mach. Learning Res., vol. 12, no. 9, 2011.
- [27] F. Gama and A. Ribeiro, “Ergodicity in stationary graph processes: A weak law of large numbers,” IEEE Trans. Signal Process., vol. 67, no. 10, pp. 2761–2774, Apr. 2019, doi:10.1109/TSP.2019.2908909.
- [28] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear network operators,” IEEE Trans. Signal Process., vol. 65, no. 15, pp. 4117–4131, 2017, doi:10.1109/TSP.2017.2703660.
- [29] M. Avella-Medina, F. Parise, M. T. Schaub, and S. Segarra, “Centrality measures for graphons: Accounting for uncertainty in networks,” IEEE Trans. Network Sci. Eng., vol. 7, no. 1, pp. 520–537, 2018, doi:10.1109/TNSE.2018.2884235.
- [30] L. Ruiz, L. Chamon, and A. Ribeiro, “Graphon signal processing,” IEEE Trans. Signal Process., vol. 69, pp. 4961–4976, 2021, doi:10.1109/TSP.2021.3106857.
- [31] C. Borgs, J. T. Chayes, H. Cohn, and N. Holden, “Sparse exchangeable graphs and their limits via graphon processes,” J. Mach. Learning Res., vol. 18, no. 210, pp. 1–71, 2018. [Online]. Available: http://jmlr.org/papers/v18/16-421.html
- [32] S. Segarra and A. Ribeiro, “Stability and continuity of centrality measures in weighted graphs,” IEEE Trans. Signal Process., vol. 64, no. 3, pp. 543–555, 2015, doi:TSP.2015.2486740.
- [33] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, 1998, doi:10.1038/30918.
- [34] B. Bollobás, Modern graph theory. Springer Science & Business Media, 1998, vol. 184, doi:10.1007/978-1-4612-0619-4.
- [35] F. Gama, A. G. Marques, G. Mateos, and A. Ribeiro, “Rethinking sketching as sampling: A graph signal processing approach,” Signal Process., vol. 169, pp. 107 404 (1–15), Apr. 2020, doi:10.1016/j.sigpro.2019.107404.
- [36] F. Gama and S. Sojoudi, “Distributed linear-quadratic control with graph neural networks,” Signal Process., Feb. 2022, doi:10.1016/j.sigpro.2022.108506.
- [37] L. De Branges, “The Stone-Weierstrass theorem,” Proc.Amer. Math. Soc., vol. 10, no. 5, pp. 822–824, 1959, doi:10.1090/S0002-9939-1959-0113131-7.
- [38] B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. Lanckriet, “On integral probability metrics, -divergences and binary classification,” 2009, eprint:arXiv 0901.2698.
- [39] L. Ruiz, F. Gama, and A. Ribeiro, “Graph neural networks: Architectures, stability, and transferability,” Proc. IEEE, vol. 109, no. 5, pp. 660–682, 2021, doi:10.1109/JPROC.2021.3055400.
- [40] C. Villani, Optimal transport. Springer, 2009, vol. 338, doi:10.1007/978-3-540-71050-9.
- [41] F. Gama, E. Isufi, G. Leus, and A. Ribeiro, “Graphs, convolutions, and neural networks: From graph filters to graph neural networks,” IEEE Signal Process. Mag., vol. 37, no. 6, pp. 128–138, Nov. 2020, doi:10.1109/MSP.2020.3016143.
- [42] L. Lovász, Large networks and graph limits. Amer. Math. Soc., 2012, vol. 60, doi:10.1090/coll/060.
- [43] K. Schmüdgen, The moment problem. Springer, 2017, vol. 9, doi:10.1007/978-3-319-64546-9.
- [44] K. Weierstrass, “Über die analytische Darstellbarkeit sogenannter willkürlicher Functionen einer reellen veränderlichen,” Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften zu Berlin, vol. 2, pp. 633–639, 1885.