A Distributed Algorithm for Spectral Sparsification of Graphs with Applications to Data Clustering
Abstract
Spectral sparsification is a technique that is used to reduce the number of non-zero entries in a positive semidefinite matrix with little changes to its spectrum. In particular, the main application of spectral sparsification is to construct sparse graphs whose spectra are close to a given dense graph. We study spectral sparsification under the assumption that the edges of a graph are allocated among sites which can communicate among each other. In this work we show that if a graph is allocated among several sites, the union of the spectral sparsifiers of each induced subgraph give us an spectral sparsifier of the original graph. In contrast to other works in the literature, we present precise computations of the approximation factor of the union of spectral sparsifiers and give an explicit calculation of the edge weights. Then we present an application of this result to data clustering in the Number-On-Forehead model of multiparty communication complexity when input data is allocated as a sunflower among sites in the party.
1 Introduction
Spectral sparsification is a technique introduced by Spielman and Teng spielman:2011 that is used to approximate a graph by a sparse graph . The notion of approximation used by spectral sparsification is that the spectra of both and must be close up to a constant factor. Batson, Spielman and Srivastava batson:12 proved that every graph has an spectral sparsifier with a number of edges linear in the number of vertices of and provided an algorithm achieving such bound. There are several algorithms in the literature that construct spectral sparsifiers of graphs with a trade-off between running time and number of edges of . To the best of our knowledge, Lee and Sun lee:2018 has the best probabilistic algorithm for spectral sparsification with a running time that is almost linear and constructs spectral sparsifiers with edges, where is the number of vertices of , is an approximation factor and is a constant.
There are situations where algorithms need to work with data that is not centralized and allocated in different sites. One way to deal with decentralized data is to design communication protocols so that the sites can communicate among them. The efficiency of a communication protocol can be measured by the number of bits shared among the sites and such a measure is known as the communication complexity of the protocol nisan:93 . When data comes in the form of a graph, the edges greatly affects communication complexity, and hence, computing spectral sparsifiers of graphs in distributed systems is of great importance.
In this work we present a distributed algorithm for spectral sparsification of graphs in the communication complexity model. In this model, we are only interested in the communication costs among sites and we assume that each site has arbitrary computational power. The idea behind this protocol is that, given an input graph , spectral sparsifiers of induced subgraphs of can be computed in each site first, and then any given site computes the union of such graphs which results in a spectral sparsifier of . Even though other works have used the idea of taking the union of spectral sparsifiers like Chen et al. chen:16 , they have not shown a precise calculation of the approximation factor. The main contribution of this work, presented in Theorem 3.1, is an estimation of the approximation factor and an explicit calculation of the edge weights in the union of spectral sparsifiers. In order to compute the approximation factor we introduce an idea that we call “overlapping cardinality partition,” which is a way to partition the edge set of a graph with respect to the number of times each edge is allocated among sites. Overlapping cardinality partition is a technical tool that allows us to express the Laplacian matrix of the union of induced subgraphs of as a linear combination of the Laplacian matrices of graphs induced from the partition.
In a second part of this paper, we present in Section 4 an application of Theorem 3.1 in distributed data clustering in the Number-On-Forehead model of communication complexity. In particular, if we assume the existence of a sunflower structure erdos:61 ; deza:74 ; kostochka:00 on the input data, we show how a communication protocol can detect the presence of the sunflower and take advantage of its kernel to reduce the communication costs.
2 Preliminaries and Notation
In this section we will introduce some definitions and notations that will be used throughout this paper.
2.1 Spectral Graph Theory
Let be an undirected and weighted graph with vertices and edges. Let be family of subsets of . We denote by the subgraph induced by , where is defined as for all and 0 otherwise. Every graph has an associated matrix called its Laplacian matrix, or simply Laplacian, which is define as
where is the weighted adjacency matrix and is the weighted degree matrix. We will omit the subindex from and when it is clear from the context.
The normalized Laplacian is defined as . The Laplacian matrix (and normalized Laplacian) is positive semidefinite (PSD) with its first eigenvalue always equals zero with multiplicity equal to the number of connected components of luxburg:07 . Indeed, if there exists a multicut of size in then the -th smallest eigenvalue of gives useful information to find a multicut.
One of the fastest methods to approximate an optimal multicut in a graph is the so-called spectral clustering algorithm. This technique uses eigenvectors of or associated to the first smallest eigenvalues in order to construct a matrix with the eigenvectors as columns, and then, it applies a simpler clustering algorithm (like k-means) to the rows of luxburg:07 . Lee, Gharan, and Trevisan lee2:2014 proved that approximates the optimal value of a multicut of size in and the eingevectors give the corresponding partition over .
2.2 Spectral Sparsification
Spectral sparsification is a technique used to reduce the density of a given PSD matrix changing its spectra only by a constant factor of approximation. Given a matrix , spectral sparsification constructs another matrix which is “similar” to in some well-defined way. We will use a notion of similarity defined in spielman:2011 . A subgraph of is called an -spectral sparsifier of if for any we have that
The importance of a spectral sparsifier lies on the sparseness of , for example, some computations are easier over an sparse matrix. There are deterministic and probabilistic algorithms to find spectral sparsifiers of a given graph. The algorithm of Batson, Spielman and Srivastava batson:12 is currently the best deterministic algorithm. The algorithm of batson:12 constructs a graph with edges in time, where is the approximation factor and is a constant.
3 A Distributed Algorithm for Spectral Sparsification
In this section we present our main result. In particular, given a graph and a family of induced subgraphs of , we show that the union of spectral sparsifiers of the induced subgraphs is a spectral sparsifier of . In contrast to other work, however, we give explicit bounds on the approximation factor and a construction of the new weight function.
First we introduce some definitions which will help us understand the overlapping of data among the sites. We denote by the set .
Definition 1 (Occurrence Number)
Let be a family of subsets of . For any , the occurrence number of in , denoted , is the maximum number of sets from in which appears.
Example 1
Let and , , . Here we have that , , , and so on.∎
Definition 2 (Overlapping Cardinality)
Let be a family of subsets of for some fixed and . The overlapping cardinality of a subset in is a positive integer such that for each its ocurrence number ; otherwise the overlapping cardinality of in is .
The overlapping cardinality identifies the maximum number of times the elements of a subset appears in a family of subsets.
Example 2
Let and be as in Example 1. Here we have that . Now consider the sets and .
-
•
The overlapping cardinality of in is 4, because .
-
•
The overlapping cardinality of in is because the occurrence number of one of the elements of the set is different from the others, namely, , , and .∎
Now we use the idea of overlapping cardinality to construct a partition on the set of subsets of .
Definition 3 (Overlapping Cardinality Partition)
Given a family as in Definition 2, an overlapping cardinality partition over on is a partition of where each has overlapping cardinality on . We call the sequence , with , the overlapping cardinalities over the family .
Example 3
Take from examples 1 and 2. An overlapping cardinality partition is
Here, has overlapping cardinality equal to because . The subset has overlapping cardinality equal to because . In Example 2 we saw that the subset has overlapping cardinality . Finally, the subset has overlapping cardinality equal to because .∎
Our main technical lemma shows that the Laplacian of an input graph can be rewritten as a linear combination of Laplacians corresponding to induced subgraphs constructed from an overlapping cardinality partition of the set of edges.
For the rest of this section we make the following assumptions. Let be an undirected and weighted graph with a function , let be a collection of subsets of such that where and is an induced subgraph of where and for all and 0 otherwise.
Lemma 1
If are the overlapping cardinalities over the family with an overlapping cardinality partition , then where is the Laplacian of .
Proof
First notice that, for all there exists a subfamily of with cardinality equal to such that belongs to every member of it and its associated subgraph. Take any for some . There exists induced subgraphs of that has as an edge, and all other induced subgraphs do not have as and edge, where . This means that
(1) |
Now, let denote the degree of in . We know that where . Since is a partition of , we can rewrite the degree of as
Then, the degree of in the graph is
If we take an edge , where is fixed, we know that appears only in the induced subgraphs , and hence, we obtain
(2) |
If we take another edge , with , note that does not belong to any of the graphs and each Laplacian matrix has 0 in its -entry. Therefore, adding to Eq.(1) we have that
Extending this argument to all equivalent classes in , for each non-diagonal entry , with , it holds
(3) |
A similar argument can be made for the diagonal entries with Eq.(2), thus obtaining
(4) |
Now we will use Lemma 1 to show that the spectral sparsifier of is an spectral sparsifier of the Laplacian of an input graph .
Theorem 3.1
Let be the overlapping cardinalities over the family with its associated overlapping cardinality partition and the Laplacians of . If is an -spectral sparsifier of , then is an -spectral sparsifier of where and .
Proof
Let be the Laplacian of . By hypothesis we have that for every and
Then we may take the summation over all to get
(5) |
Now, lets consider the left hand side of the Equation (5). Using Lemma 1 we get
(6) |
where the last equality follows from the fact that is a partition of . Similarly for the right hand side of Equation (5) we have that
(7) |
To finish the proof, note that we want and with . In order to solve this, we choose an . First notice that . Then we have that . ∎
From Theorem 3.1, a distributed algorithm for computing spectral sparsifiers is natural. Just let every site compute a spectral sparsifier of its own input and then each site sends its result to a coordinator that will construct the union of all spectral sparsifiers.
4 Data Clustering in the Number-On-Forehead Model
In this section we will show an application of Theorem 3.1 to distributed data clustering in the Number-On-Forehead model of communication complexity for the case when the input data is allocated as a sunflower among sites.
Clustering is an unsupervised machine learning task that involves finding a partition over a given set of points . Such a partition must fulfill two conditions, (i) every two points in the same set must be “similar” in some way and (ii) every two points on different sets must be far from being similar. Each equivalence class from the partition is also called a cluster. Clustering can be accomplished by different kinds of techniques, where spectral clustering luxburg:07 is one of the fastest methods.
It is easy to see clustering as a graph problem, where each point corresponds to a vertex in a complete graph and the cost of each edge is interpreted as a similarity between points. Thus, finding a set of optimal clusters in data is equivalent to finding an optimal multicut in a graph. Since the optimal multicut depends on the spectrum of the graph’s Laplacian lee2:2014 and we want to keep the communication costs low, each site must be capable of constructing sparse induced subgraphs of its own data while preserving the spectrum of its graph Laplacian.
In our communication protocol, each site is assigned an induced subgraph of , and we want each site to be aware of all clusters in the data. Consequently, each site must be capable of running a clustering algorithm on its own data, communicate its results to the other sites, and then use the exchanged messages to construct an approximation to the original graph . This is where the distributed spectral sparsification algorithm is relevant.
First, we will construct a protocol to verify if the input data in every site is a sunflower. If the input is indeed allocated in a sunflower structure, then a party can take advantage of the sunflower to find an approximation of clusters in the data.
4.1 Models of Communication and their Complexity Measure
We will introduce some standard notations from communication complexity—we refer the interested reader to the textbook by Kushilevitz and Nisan kushilevitz:97 for more details. Let be a set of sites where a site has an input , with a positive integer. In a multiparty communication protocol, with , the sites want to jointly compute a function for some finite codomain . In the Number-On-Forehead model of communication, or NOF model, each site only has access to the other sites’s input but not its own, that is, a site has access to . In order to compute the sites must communicate, and they do so by writing bits on a blackboard which can be accessed by all sites in the party. This is the so-called blackboard model of communication.
The maximum number of bits exchanged in the protocol over the worst-case input is the cost of the protocol. The deterministic communication complexity of the function is the minimum cost over all protocols which compute .
Let be an input graph and be a family of subsets of . In order to study communication protocols for graph problems we assume that is the input data to site . In the NOF model, we let be the set of edges which can access. Given a site , the symmetric difference on , denoted , is defined as the symmetric difference among all sets has access to, that is, is the symmetric difference between each set in .
For the rest of this paper, we use as a shorthand for the set of subsets of the set of edges of an input graph with , and for the set where . Here captures the idea of the NOF model where every site have access to the other’s sites input but not its own.
4.2 Sunflowers and NOF Communication
A sunflower or -System is a family of sets where for all . We call the kernel of . The family is a weak -System if for all for some constant kostochka:00 . It is known that if is a weak -System and , where , then is a -System deza:74 .
We start with a simple fact that ensures the existence of -Systems with the same kernel in the NOF model if input data in a communication protocol is allocated as a sunflower among sites.
Lemma 2
If and is a -System with kernel , then any is a -System with kernel .
The following lemma states a sufficient condition for the existence of a -System in the input data in the NOF model with the requirement, however, that we need at least four or more sites
Lemma 3
Let . If, for all , we have that is a -System, then is a -System.
Proof
Suppose that is no a -System, and we want to prove that for some , is not a -System.
With no loss of generality, suppose that there exists exactly two sets and that certify that is not a -System; that is, there exists and such that , and, for any and , it holds that , with . Now take any , with different from and . Then cannot be a -System because and belong to and there is at least another set in because .∎
Lemma 3 implies that we only need to know if all sites in a communication protocol have access to a -System to ensure that an entire family of input sets is a -System, provided there are at least 4 sites.
Proposition 1
There exists a protocol that verifies if , with , is a -System or not with bits of communication exchanged.
4.3 Data Clustering with Sunflowers
In this section, we present a NOF communication protocol that exploits the sunflower structure in input data. First, we start by defining an overlapping coefficient of the edges of which can be seen as a measure of how well spread out are the edges among sites.
Definition 4
The overlapping coefficient on site is defined as and the greatest overlapping coefficient is defined as
The following proposition presents a simple protocol that makes every site aware of the entire input graph.
Proposition 2
Let be a site and let be a weak -System with each for , with a kernel of size . Suppose that . If site sends all the edges in , then every other site will know the entire graph . The number of edges this communication protocol sends is at most .
Proof
We will prove this proposition by showing how each site constructs the graph . First, a given site computes and writes it on the blackboard. Since , by the result of Deza deza:74 , we known that is a sunflower with kernel and by Lemma 2 this kernel is the same in all sites. At this point all sites know , therefore, they can construct by its own using the kernel of . In one more round, one of the sites writes so that site can also construct .
In order to compute the communication cost of the protocol, first notice that ,where we used the fact that the union of all edges in every site equals the union of the symmetric difference and the kernel . Then we have that , which implies , where the last equality follows from the fact that . Finally, after was sent to the blackboard the communication cost is .∎
Theorem 4.1
Let be a weak -system with each for , and suppose that . There exists a communication protocol such that after two rounds of communication every site knows an -spectral sparsifier of the entire graph with communication cost .
Proof
From deza:74 we know that is a sunflower with a kernel of size and, by Lemma 2, is equal in all sites. First, a site computes a spectral sparsifier of the induced subgraph using the spectral sparsification algorithm of lee:2018 . This way we have that where . Then site writes on the blackboard. Any other site constructs an -spectral sparsifier of . By Theorem 3.1, the graph is a -spectral sparsifier of . In a second round, a given site writes on the blackboard. Finally, site receives and by Theorem 3.1 it can also construct an -spectral sparsifier for . Finally, the communication complexity is upper-bounded by ∎
Acknowledgement. We give our thanks to the reviewers of CTW 2020 for their comments that helped improve this paper. This work is supported by Conacyt research grants POSG17-62 and PINV15-208.
References
- (1) Batson, J.D., Spielman, D.A., Srivastava, N.: Twice-ramanujan sparsifiers. In: Proceedings of the 41st annual ACM symposium on Theory of computing (STOC), pp. 255–262 (2009)
- (2) Chen, J., Sun, H., Woodruff, D., Zhang, Q.: Communication-optimal distributed clustering. In: Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), pp. 3727–3735 (2016)
- (3) Deza, M.: Solution d’un problème de Erdös-Lovász. Journal of Combinatorial Theory, Series B 16(2), 166–167 (1974)
- (4) Erdös, P., Chao, Rado, R.: Intersection theorems for systems op finite sets. Quarterly Journal of Mathematics 12(1), 313–320 (1961)
- (5) Kostochka, A.: Extremal problems on -systems. Numbers, Information and Complexity pp. 143–150 (2000)
- (6) Kushilevitz, E., Nisan, N.: Communication complexity. Cambridge University Press (2006)
- (7) Lee, J.R., Gharan, S.O., Trevisan, L.: Multiway spectral partitioning and higher-order cheeger inequalities. Journal of the ACM (JACM) 61(6), 37 (2014)
- (8) Lee, Y.T., Sun, H.: Constructing linear-sized spectral sparsification in almost-linear time. SIAM Journal on Computing 47(6), 2315–2336 (2018)
- (9) von Luxburg, U.: A tutorial on spectral clustering. Statistics and computing 17(4), 395–416 (2007)
- (10) Nisan, N.: The communication complexity of threshold gates. Combinatorics, Paul Erdös is Eighty 1, 301–315 (1993)
- (11) Spielman, D.A., Teng, S.H.: Spectral sparsification of graphs. SIAM Journal On Computing 40(4), 981–1025 (2011)