Discovering Complex Patterns in High Dimensional Data

1 Spectral Clustering

1.1 Ng, Jordan, Weiss Paper, 2002 [ng2002spectral]

Famous paper cited over 7000 times. On second page, ”Since the majority of analyses in spectral graph partitioning apprea to deal with paritioning the graph into exactly two parts, these methods are then typically applied recursively to find $k$ clusters (e.g. [9]). Experimentally it has been observed that using more eigenvectors and directly computing $k$ way partitioning is better (e.g. [5, 1]).

Papers referenced should be viewed to see how recursive partitioning on the second eigenvector has been done and why $k$ way partitioning is seen as a better method.

1.2 Von Luxburg Paper, 2007 [von2007tutorial]

Famous paper cited over 6000 times. Explains in detail many aspects of the ”best” way to do spectral clustering… Note this paper is written after [ng2002spectral] which has stated that using more eigenvectors is better than bi-partitioning, so it does not mention a recursive version of the algorithm.

1.3 Riolo and Newman Paper, 2014 [riolo2014first]

More recent paper that derives spectral clustering in different way than previous papers.

2 Accelerated Spectral Clustering

2.1 Fowlkes et. al. Paper, 2004 [fowlkes2004spectral]

This method computes the spectral clustering on a small number of samples and then uses matrix completion to infer the clusters of the remaining points belong to.

2.2 Tremblay et. al. Paper, 2016 [tremblay2016accelerated]

Relatively recent paper which uses ”graph signal processing” to filter the graph down prior to running spectral clustering.

3 Modularity

3.1 Newman Paper, 2006 [newman2006modularity]

Paper cited over 6000 times. Newman points to reasons why traditional spectral clustering may fall short in discovering communities… Normalized minimum cut tends to assume two communities are roughly the same size. Modularity is (up to a multiplicative constant), the number of edgest falling within groups minus the expected number in an equivalent network with edges placed at random.

4 Accelerated Modularity

4.1 Blondel et. al. Paper, 2008 [blondel2008fast]

Paper cited over 7000 times. Very popular algorithm that’s actively being researched on finding communities in extremely large graphs. Non-deterministic, but ”identifying communities in a 118 million nodes network took only 152 minutes”.

5 Partitioning Stability

5.1 Ben-Hur et. al. Paper, 2001 [ben2001stability]

Very interesting paper cited over 600 times, discussing the use of sub-sampling to measure stability in clustering.

Paper states, ”It can be combined with any clustering algorithm, but proves to be particularly useful in conjunction with hierarchical clustering algorithms.”

Provides several clustering similarity measures.

Paper should be looked at not just for the clustering similarity measures, but look at Figure 2 which outlines the algorithm to determine stability and how it relates to our graph matching problem.

5.2 Ben-David, Von Luxburg, Pal Paper, 2006 [ben2006sober]

This paper is cited over 200 times and questions the Ben-Hur et. al. Paper’s approach. It states that instability may be due to the fact that there are competing partitions of a certain size that has nothing to do with the sampling process. Note that if we’re using a partitioning to be fed into graph matching, we want to know that the partition is stable either way.