Improving Spectral Clustering Using Spectrum-Preserving Node Aggregation
Abstract
Spectral clustering is an important graph clustering technique. However, it suffers from a scalability problem due to the involved computationally expensive eigen-decomposition procedure and is highly sensitive to noisy nodes in the graph. In this work we solve the two problems simultaneously by using spectrum-preserving node aggregation to generate a nearly-noiseless concise representation of the data set. Specifically, we create a small set of spectral-representative pseudo-nodes based on spectral analysis. Then, standard spectral clustering algorithm is performed on the smaller node set. Finally, we map the clusters of pseudo nodes back to find the cluster memberships of the data points in the original data set. The proposed framework has a nearly-linear time complexity. Meanwhile, the clustering accuracy can be significantly improved. The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods.
Index Terms— Spectral Clustering, Node Reduction
1 Introduction
Clustering is one of the most fundamental machine learning problems [1]. It aims to assign data samples in a data set into different clusters in such a way that samples in the same cluster are more similar compared to those in different clusters.
In the past decades, many clustering techniques have been proposed. Among them, spectral clustering has drawn considerable attention and has been widely applied to computer vision, image processing and speech processing [2, 3]. Although spectral clustering has superior performance, the involved eigen-decomposition procedure has a time complexity of , where is the number of samples in the data set. The high computational cost can easily become an obstacle in many large-scale applications [4, 5]. To solve this problem, considerable effort has been devoted in research related to approximate spectral clustering. [6] proposed a k-means based hierachical spectral clustering framework (KASP); Inspired by sparse coding theory, [5] proposed a landmark-based representation method (LSC); [7] leveraged the Nyström method to approximate the affinity matrix. However, none of these methods can preserve the spectrum of the original graphs, thus result in a significant degradation of clustering accuracy. For example, for Covtype data set, the clustering accuracy drops more than when using the LSC method, more than when using the KASP framework and more than when using the Nyström method.
[8] attempted to apply a graph coarsening method for dividing the graph topology into random number of partitions. However, its claims about spectral clustering have the following problems:
1) Methodologically, the algorithm flow in [8] only partitions very few pseudo aggregated nodes. Without mapping the pseudo clusters to original real nodes via mapping operator, [8] can only provide an auxiliary intermediate result for data clustering.
2) Partitioning pure graph topologies into small parts and finding clusters of data samples are two different tasks. Clustering is the task of grouping similar samples into same clusters. According to the definition of spectral clustering [9], the input of spectral clustering is similarity matrix of data samples calculated from their feature vectors. However, in [8]’s experiment about spectral clustering, there is no data sample but only pure graph topologies are used as input. Without ”data” points represented by features, there will be no ”similarity” among nodes.
3) To convincingly evaluate the clustering performance, each input sample should have a ground truth label (cluster membership) so that we can check the correctness of the clustering algorithm’s result [5, 4]. However, in [8], clustering correctness cannot be measured because there is no label. So the experiments in [8] cannot demonstrate the effectiveness of spectrum-preserving node aggregation on spectral clustering.
4) In clustering, the number of clusters are meaningful or naturally formed based on the density. For example, handwritten digits tasks have 10 clusters corresponding to digits 0-9, ModelNet40 data set 111https://modelnet.cs.princeton.edu/ has 40 clusters corresponding to its 40 kinds of 3D objects (tables, planes, chairs, …). However, [8] chose a random number 30 as the number of subsets for all the graphs. This is not make sense for spectral clustering experiments and evaluations.
In contrast to [8], in this paper, we integrate the spectrum-preserving node aggregation into the general approximate spectral clustering framework [6] and construct the correspondence table based on the mapping operators. Then we use the table to map the clusters of aggregated nodes to original samples in the data set to find cluster membership of real data points. Based on this complete framework, we use real-world data sets to demonstrate the effectiveness of the aggregation scheme for data clustering. The major contributions of this work has been summarized as follows:
-
1.
In this work, we proposed a complete framework for improving spectral clustering by integrating spectral methods into the general approximate spectral clustering framework [6] and demonstrate its effectiveness via real-world data sets.
-
2.
In contrast to other spectral clustering acceleration methods based on representative nodes such as KASP, Nystrom and LSC, our method uses the spectral-representative nodes which can directly preserve the spectrum of the graph Laplacian which will be key to clustering accuracy. As a result, for very large data set such as Covtype which has instances, only our method can accelerate clustering without loss of accuracy. All the other representation nodes based methods have to accuracy degradation.
-
3.
Our method enable to construct a nearly-noiseless manifold representation of the data set and dramatically clustering accuracy improvements can be achieved. For example, by removing more than of nodes for the MNIST data set, its noisy nodes are also removed together. The clustering accuracy improves from to .
2 Preliminaries
k-means is the most fundamental clustering method [9]. It discovers clusters by minimizing the following objective function:
(1) |
where:
(2) |
and is the centroid of the cluster .
However, it often fails to handle complicated geometric shapes, such as non-convex shapes [10]. In contrast, spectral clustering is good at detecting non-convex and linearly non-separable patterns [11]. Spectral clustering includes the following three steps: 1) construct a Laplacian matrix according to the similarities between data points; 2) embed nodes into -dimensional space using the first nontrivial eigenvectors of ; 3) apply k-means to partition the embedded data points into clusters. As shown in Figure 1, in the two moons data set, there are two slightly entangled non-convex shapes, where each cluster corresponds to a moon. In the two circles data set, the samples are arranged in two concentric circles, where each cluster corresponds to a circle. k-means gives incorrect clustering results for both data sets while spectral clustering performs an ideal clustering. Because k-means only cares about the distance while spectral clustering cares about the connectivity between nodes. The connectivity information is embedded in the spectrum of the underlying graph so that it is critical to preserve the spectrum when manipulating graphs [12].




3 Methods
3.1 Algorithmic Framework
We first construct a standard k-NN graph . Then, based on the node proximity measure proposed in [13], the spectral similarity of two nodes and can be calculated as:
(3) |
where is obtained by applying Gauss-Seidel relaxation to solve for , starting with random vectors. We aggregate the nodes with high spectral similarity to form a few spectral-representative nodes to reduce the graph size while preserving the spectrum of the original graph. By aggregating nodes based on spectral similarity, the reduced graph can best represent the original data set in the sense of minimizing the structural distortion [12], as shown in Figure 2.


Then the standard spectral clustering is performed on the aggregated nodes. Spectral clustering algorithm needs to perform a very computational expensive eigen-decompistion procedure with time complexity to calculate the first non-trivial eigenvectors of the graph Laplacian, where is number of nodes. By using spectral-representative nodes to represent the global structure of the original data set, number of nodes can be reduced and thus leads to speedup of eigen-decomposition. If the desired size of node set is not reached, the nodes can be further aggregated layer by layer by utilizing a multi-level aggregation scheme [8]: given the graph Laplacian and the mapping operators from the finest level to the coarsest level , , where and
However, the goal of data clustering is to find cluster membership of original samples in the data set rather than the clusters of the pseudo aggregated nodes. In this paper, we propose to map the cluster membership back from the pseudo aggregated nodes to the original real data points. Specifically, we build a correspondence table using the mapping operators of each layer to associate each original sample to its aggregated nodes in the final layer. For each original data points, we assigning it to the cluster as its corresponding aggregated node. The complete algorithm flow has been shown in Algorithm 1.
Input: A data set with samples , number of clusters .
Output: Clusters ,…,.
3.2 Nearly noiseless representation
One drawback of spectral clustering is that it is highly sensitive to noisy nodes in the graph, which limits the capability to accurately discover clusters: if a few noisy nodes form a narrow bridge between two densely connected communities, the algorithm tends to report the two communities plus the noisy nodes as a single community.
Relying on the very strong capability of spectrum-preserving pseudo nodes in preserving the spectrum of the graph Laplacian and global manifold of the data set, almost all the original nodes including the noisy nodes can be removed together. For example, for the MNIST data set with nodes, only spectral-representative nodes are enough for representing the global structure of the data set, which means of nodes can be removed so that the noise nodes won’t exist in the aggregated graph. As a result, the clustering accuracy improves from to for the MNIST data set.
Even though other approximate spectral clustering methods can also significantly reduced to node set size with a small amount of representative nodes, however, without using spectral analysis to choose spectrally-critical nodes, their representative nodes are not able to truthfully encode the spectrum of original graph Laplacian. So their clustering accuracy gains from denoising are counteracted with the loss of spectrum information.
3.3 Algorithm Complexity
The computational complexity of spectrum-preserving node reduction is , where is the number of edges in the original graph and is the number of nodes. The computational complexity of reduced standard spectral clustering is where is the number of aggregated nodes. The complexity of cluster membership retrieving is , where is the number of samples in the original data set.
4 Experiment
Experiments are performed using MATLAB R2020b running on a Laptop. The reported results are averaged over runs.
4.1 Experiment Setup
One mid-sized data set (USPS), one large data set (MNIST) and one very large data set (Covtype) are used in our experiments. They can be downloaded from the UCI machine learning repository 222https://archive.ics.uci.edu/ml/ and LibSVM Data333https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/: USPS includes images of USPS hand written digits with attributes; MNIST is a data set from Yann LeCun’s website444 http://yann.lecun.com/exdb/mnist/, which includes images with each of them represented by attributes; Covtype includes instances for predicting forest cover type from cartographic variables and each instance with attributes is from one of seven classes.
We compare the proposed method against both the baseline and the state-of-the-art fast spectral clustering methods including: (1) the standard spectral clustering algorithm [9], (2) the Nyström method [7], (3) the Landmark-based spectral clustering (LSC) method that uses random sampling for landmark selection [5], and (4) the KASP method using k-means for centroids selection [6]. For fair comparison, we use the same parameter setting in [5] for compared algorithms: the number of sampled points in Nyström method ( or the number of landmarks in LSC, or the number of centroids in KASP ) is set to 500.
4.2 Evaluation Metrics
Clustering accuracy () is the most widely used measurement for clustering quality [5, 4]. It is defined as follows:
(4) |
where is the number of data samples, is the ground-truth label, and is the label generated by the algorithm. is a delta function defined as: =1 for , and =0, otherwise. is a permutation function which can be realized using the Hungarian algorithm [14]. A higher value of indicates better clustering quality.
4.3 Experimental Results
Data Set | Standard SC | Nyström | KASP | LSC | Ours() | Ours() | Ours() |
USPS | 64.31 | 69.31 | 70.62 | 66.28 | 70.65 | 72.25 | 81.39 |
MNIST | 64.20 | 55.86 | 71.23 | 58.94 | 70.99 | 72.42 | 82.18 |
Covtype | 48.81 | 33.24 | 27.56 | 22.60 | 48.77 | 44.50 | 48.28 |
Data Set | Standard SC | Nyström | KASP | LSC | Ours() | Ours() | Ours() |
USPS | 0.72 | 0.29 | 0.16 | 0.22 | 0.21 | 0.18 | 0.17 |
MNIST | 252.59 | 0.95 | 0.18 | 0.49 | 0.59 | 0.35 | 0.21 |
Covtype | 128.70 | 5.48 | 0.70 | 3.77 | 0.92 | 0.50 | 0.49 |
Data Set | 5X | 10X | 50X | |
---|---|---|---|---|
USPS | 9,298 | 1,692 | 767 | 138 |
MNIST | 70,000 | 13,658 | 6,368 | 1,182 |
Covtype | 581,012 | 104,260 | 44,188 | 8,192 |

Table 1 shows the clustering accuracy results of compared methods and our method with node reduction ratios of , and . The runtime of the eigen-decomposition and k-means steps are reported in Table 2. As observed, the proposed method can consistently lead to dramatic performance improvement, beating all competitors in clustering accuracy across all the three data sets: our method achieves more than accuracy gain on USPS, gain on MNIST and gain on Covtype over the second-best methods; for the USPS and MNIST data sets, our method achieves over and accuracy gain over the standard spectral clustering method, respectively. The superior clustering results of our method clearly illustrate that spectrum-preserving concise representation can improve clustering accuracy by removing redundant or false relations among data samples.
As shown in Table 3, the reduced graphs generated by our framework have only , and nodes for USPS, MNIST and Covtype data sets, respectively, thereby allowing much faster eigen-decompositions. As shown in Fig. 3 and Fig. 4, with increasing node reduction ratio, the proposed method consistently produces high clustering accuracy. For the very large data set Covtype, our method is the only one that enables us to accelerate spectral clustering algorithm without loss of clustering accuracy.
5 Conclusion
In this paper, a complete framework for improving spectral clustering is proposed. We use real-world data sets to demonstrate the effectiveness of the method. Our method outperforms state-of-the-art methods by a large margin.
References
- [1] Feiping Nie, Dong Xu, Ivor Wai-Hung Tsang, and Changshui Zhang, “Spectral embedded clustering,” in Twenty-First International Joint Conference on Artificial Intelligence, 2009.
- [2] Francis Bach and Michael Jordan, “Learning spectral clustering,” Advances in neural information processing systems, vol. 16, no. 2, pp. 305–312, 2004.
- [3] Xia Wan and C-CJ Kuo, “A new approach to image retrieval with hierarchical color clustering,” IEEE transactions on circuits and systems for video technology, vol. 8, no. 5, pp. 628–643, 1998.
- [4] Jialu Liu, Chi Wang, Marina Danilevsky, and Jiawei Han, “Large-scale spectral clustering on graphs,” in Proceedings of the Twenty-Third international joint conference on Artificial Intelligence. AAAI Press, 2013, pp. 1486–1492.
- [5] Xinlei Chen and Deng Cai, “Large scale spectral clustering with landmark-based representation.,” in AAAI, 2011.
- [6] Donghui Yan, Ling Huang, and Michael I Jordan, “Fast approximate spectral clustering,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009, pp. 907–916.
- [7] Charless Fowlkes, Serge Belongie, Fan Chung, and Jitendra Malik, “Spectral grouping using the nystrom method,” IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 2, pp. 214–225, 2004.
- [8] Zhiqiang Zhao, Ying Zhang, and Zhuo Feng, “Towards scalable spectral embedding and data visualization via spectral coarsening,” in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 869–877.
- [9] Ulrike Von Luxburg, “A tutorial on spectral clustering,” Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007.
- [10] Feiping Nie, Cheng-Long Wang, and Xuelong Li, “K-multiple-means: A multiple-means clustering method with specified k clusters,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 959–967.
- [11] Andrew Y Ng, Michael I Jordan, Yair Weiss, et al., “On spectral clustering: Analysis and an algorithm,” Advances in neural information processing systems, vol. 2, pp. 849–856, 2002.
- [12] Yongyu Wang, High Performance Spectral Methods for Graph-Based Machine Learning, Ph.D. thesis, Michigan Technological University, 2021.
- [13] Oren E Livne and Achi Brandt, “Lean algebraic multigrid (lamg): Fast graph laplacian linear solver,” SIAM Journal on Scientific Computing, vol. 34, no. 4, pp. B499–B522, 2012.
- [14] Christos H Papadimitriou and Kenneth Steiglitz, Combinatorial optimization: algorithms and complexity, Courier Corporation, 1982.