This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Functorial Manifold Learning

Dan Shiebler
Abstract

We adapt previous research on category theory and topological unsupervised learning to develop a functorial perspective on manifold learning, also known as nonlinear dimensionality reduction. We first characterize manifold learning algorithms as functors that map pseudometric spaces to optimization objectives and that factor through hierarchical clustering functors. We then use this characterization to prove refinement bounds on manifold learning loss functions and construct a hierarchy of manifold learning algorithms based on their equivariants. We express several popular manifold learning algorithms as functors at different levels of this hierarchy, including Metric Multidimensional Scaling, IsoMap, and UMAP. Next, we use interleaving distance to study the stability of a broad class of manifold learning algorithms. We present bounds on how closely the embeddings these algorithms produce from noisy data approximate the embeddings they would learn from noiseless data. Finally, we use our framework to derive a set of novel manifold learning algorithms, which we experimentally demonstrate are competitive with the state of the art.

1 Introduction

Suppose we have a finite pseudometric space (X,dX)(X,d_{X}) that we assume has been sampled from some larger space ๐—\mathbf{X} according to some probability measure ฮผ๐—\mu_{\mathbf{X}} over ๐—\mathbf{X}. Manifold Learning algorithms like Isomap [23], Metric Multidimensional Scaling [2], and UMAP [19] construct โ„m\mathbb{R}^{m}-embeddings for the points in XX, which we interpret as coordinates for the support of ฮผ๐—\mu_{\mathbf{X}}. These techniques are based on the assumption that this support can be well-approximated with a manifold. In this paper we use functoriality, a basic concept from Category Theory, to explore two aspects of manifold learning algorithms:

  • โ€ข

    Equivariance: A manifold learning algorithm is equivariant to a transformation if applying that transformation to its inputs results in a corresponding transformation of its outputs. Understanding the equivariance of a transform lets us understand how it will behave on new types of data.

  • โ€ข

    Stability: The stability of a manifold learning algorithm captures how well the embeddings it generates on noisy data approximate the embeddings it would generate on noiseless data. Understanding the stability of a transform helps us predict its performance on real-world applications.

1.1 Functoriality

In order for a manifold learning algorithm to be useful, the properties of the embeddings that the algorithm derives from (X,dX)(X,d_{X}) must be somewhat in line with the structure of (X,dX)(X,d_{X}). One way to formalize this is to cast the algorithm as a functor between categories. A category is a collection of objects and morphisms between them. Morphisms are closed under an associative composition operation, and each object is equipped with an identity morphism. An example of a category is the collection of sets (objects) and functions (morphisms) between them.

A functor is a mapping between categories that preserves identity morphisms and morphism composition. Underlying this straightforward definition is a powerful concept: the functoriality of a transformation is a blueprint for its structure, expressed in terms of the equivariants it preserves. If a given transformation is functorial over some pair of categories, then the transformation preserves the structure represented by those categoriesโ€™ morphisms. By identifying the settings under which an algorithm is functorial, we can derive extensions of the algorithm that preserve functoriality and identify modifications that break it. See โ€œBasic Category Theoryโ€ [18] or โ€œSeven Sketches in Compositionalityโ€ [14] for more details on categories and functoriality.

1.2 Summary of Contributions

In Section 2, we demonstrate that manifold learning algorithms can be expressed as optimization problems defined on top of hierarchical overlapping clusterings. That is, we can express these algorithms in terms of the composition of:

  • โ€ข

    a strategy for clustering points at different distance scales

  • โ€ข

    an order-preserving transformation from a clustering of points to a loss function

We formalize this relationship in terms of the composition of functors between categories of pseudometric spaces, hierarchical overlapping clusterings, and optimization problems. This allows us to formally extend clustering theory into manifold learning theory.

In Section 2.1 we build on clustering theory to demonstrate that every manifold learning objective lies on a spectrum based on the criterion by which embeddings are penalized for being too close together or too far apart. In Section 2.2 we introduce a hierarchy of manifold learning algorithms and categorize algorithms based on the dataset transformations over which they are equivariant. In Section 2.3 we provide several examples of this categorization. We show that UMAP is equivariant to isometries and both IsoMap and Metric Multidimensional Scaling are equivariant to surjective non-expansive maps. Identifying these equivariants is helpful for identifying the circumstances under which we would expect our algorithms to generalize. For example, while adding points to a dataset will modify the IsoMap objective in a predictable way, this will completely disrupt the UMAP objective. This is caused by the fact that UMAP uses a local distance rescaling procedure that is density-sensitive, and is therefore not equivariant to injective or surjective non-expansive maps.

In Section 3 we use interleaving distance to study the stability of a broad class of manifold learning algorithms. We present novel bounds on how well the embeddings that algorithms in this class learn on noisy data approximate the embeddings they learn on noiseless data.

In Section 4 we build on this theory to describe a strategy for deriving novel algorithms from existing manifold learning algorithms. As an example we derive the Single Linkage Scaling algorithm, which projects samples in the same connected component of the Rips complex as close together as possible. We also present experimental results demonstrating the efficacy of this algorithm.

1.3 Related Work

Several authors have explored functorial perspectives on clustering algorithms. Carlsson et al.ย [9] introduce clustering functors that map metric spaces to partitionings, whereas Culbertson et al.ย [12] take a slightly broader scope and also explore overlapping clustering functors that map metric spaces to coverings. Both approaches demonstrate that metric space categories with fewer morphisms permit a richer class of clustering functors. For example, while the single linkage clustering algorithm is functorial over the full category of metric spaces and non-expansive maps, density-sensitive clustering algorithms like robust single linkage are only functorial over the subcategory of metric spaces and injective non-expansive maps. In order to get around the Kleinberg Impossibility Theorem [17], which states that any scale invariant flat clustering must sacrifice either surjectivity or a notion of consistency, several authors [9, 13, 20] also explore hierarchical clustering functors that map metric spaces to multi-scale dataset partitionings or covers. Shiebler [21] builds on this perspective to factor clustering functors through a category of simplicial complexes.

Manifold learning shares structure with hierarchical clustering, and some authors have begun applying categorical ideas to manifold learning. For example, McInnes et al.ย [19] introduce the UMAP manifold learning algorithm in terms of Spivakโ€™s fuzzy simplicial sets [22], which are a categorical analog of simplicial filtrations.

In Section 3 we study the stability of manifold learning algorithms to dataset noise. Due to the importance of this topic, many other authors have researched the stability properties of manifold learning algorithms. For example, Baily [4] explores adaptations of PCA to noisy data, and Gerber et al.ย [16] demonstrate that Laplacian Eigenmaps has nicer stability properties than IsoMap. However, we believe that ours is the first work that uses interleaving distance to formalize a stability property.

1.4 Preliminaries on Functorial Hierarchical Overlapping Clustering

We briefly review some definitions from the functorial perspective on hierarchical overlapping clustering. For more details, see Shiebler [21]. Given a set XX, a non-nested flag cover ๐’žX\mathcal{C}_{X} of XX is a cover of XX such that: (1) if S1,S2โˆˆ๐’žXS_{1},S_{2}\in\mathcal{C}_{X} and S1โІS2S_{1}\subseteq S_{2}, then S1=S2S_{1}=S_{2}, (2) the simplicial complex with vertices corresponding to the elements of XX and faces to all finite subsets of the sets in ๐’žX\mathcal{C}_{X} is a flag complex, or a simplicial complex that can be expressed as the clique of its 11-skeleton. Intuitively, ๐’žX\mathcal{C}_{X} is a non-nested flag cover if there exists a graph whose cliques are the sets in ๐’žX\mathcal{C}_{X}. The category ๐‚๐จ๐ฏ\mathbf{Cov} has tuples (X,๐’žX)(X,\mathcal{C}_{X}) as objects where ๐’žX\mathcal{C}_{X} is a non-nested flag cover of the finite set XX. The morphisms between (X,๐’žX)(X,\mathcal{C}_{X}) and (Y,๐’žY)(Y,\mathcal{C}_{Y}) are functions f:Xโ†’Yf:X\rightarrow Y where for any set SS in ๐’žX\mathcal{C}_{X} there exists some set Sโ€ฒS^{\prime} in ๐’žY\mathcal{C}_{Y} such that fโ€‹(S)โІSโ€ฒf(S)\subseteq S^{\prime}.

Next, we will represent datasets with the category ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}} of finite pseudometric spaces and non-expansive maps between them. Given a subcategory ๐ƒ\mathbf{D} of ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}}, a flat ๐ƒ\mathbf{D}-clustering functor is a functor C:๐ƒโ†’๐‚๐จ๐ฏC:\mathbf{D}\rightarrow\mathbf{Cov} that is the identity on the underlying set. Intuitively, a flat ๐ƒ\mathbf{D}-clustering functor maps a dataset (X,dX)(X,d_{X}) to a cover of the set XX in such a way that increasing the distances between points in XX may cause clusters to separate.

A fuzzy non-nested flag cover is a functor FX:(0,1]oโ€‹pโ†’๐‚๐จ๐ฏF_{X}:(0,1]^{op}\rightarrow\mathbf{Cov} such that for any morphism aโ‰คaโ€ฒa\leq a^{\prime} in (0,1](0,1], the morphism FXโ€‹(aโ‰คaโ€ฒ)F_{X}(a\leq a^{\prime}) is the identity on the underlying set. In the category ๐…๐‚๐จ๐ฏ\mathbf{F}\mathbf{Cov} objects are fuzzy non-nested covers and morphisms are natural transformations between them. Given a subcategory ๐ƒ\mathbf{D} of ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}} a hierarchical ๐ƒ\mathbf{D}-clustering functor is a functor H:๐ƒโ†’๐…๐‚๐จ๐ฏH:\mathbf{D}\rightarrow\mathbf{F}\mathbf{Cov} such that for aโˆˆ(0,1]a\in(0,1], Hโ€‹(โˆ’)โ€‹(a):๐๐Œ๐ž๐ญโ†’๐‚๐จ๐ฏH(-)(a):\mathcal{\mathbf{PMet}}\rightarrow\mathbf{Cov} is a flat ๐ƒ\mathbf{D}-clustering functor. Intuitively, a hierarchical ๐ƒ\mathbf{D}-clustering functor maps a pair of a dataset (X,dX)(X,d_{X}) and a strength aโˆˆ(0,1]a\in(0,1] to a cover of the set XX in such a way that increasing the distances between points in XX or increasing the strength aa may cause clusters to separate.

2 Manifold Learning

Given a choice of mโˆˆโ„•m\in\mathbb{N}, a manifold learning algorithm constructs an nร—mn\times m real valued matrix of embeddings in ๐Œ๐š๐ญn,m=โ„nร—m\mathbf{Mat}_{n,m}=\mathbb{R}^{n\times m} from a finite pseudometric space with nn points. In this work we focus on algorithms that operate by solving embedding optimization problems, or tuples (n,m,l)(n,m,l) where l:๐Œ๐š๐ญn,mโ†’โ„l:\mathbf{Mat}_{n,m}\rightarrow\mathbb{R} is a loss function, or a penalty term. We call the set of all Aโˆˆ๐Œ๐š๐ญn,mA\in\mathbf{Mat}_{n,m} that minimize lโ€‹(A)l(A) the solution set of the embedding optimization problem. In particular, we focus on pairwise embedding optimization problems, or embedding optimization problems where ll can be expressed as a sum of pairwise terms liโ€‹j:โ„โ‰ฅ0โ†’โ„l_{{}_{ij}}:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R} such that lโ€‹(A)=โˆ‘iโˆˆ1โ€‹โ€ฆโ€‹njโˆˆ1โ€‹โ€ฆโ€‹nliโ€‹jโ€‹(โ€–Aiโˆ’Ajโ€–)l(A)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}l_{{}_{ij}}(\|A_{i}-A_{j}\|). Intuitively the terms liโ€‹jl_{{}_{ij}} act as a penalty on bad embeddings. We express such a pairwise embedding optimization problem with the tuple (n,m,{liโ€‹j})(n,m,\{l_{{}_{ij}}\}).

Formally we define a manifold learning problem to be a function that maps the pseudometric space (X,dX)(X,d_{X}) to a pairwise embedding optimization problem of the form (|X|,m,{liโ€‹j})(|X|,m,\{l_{{}_{ij}}\}). Note that this definition does not include any specification of how the optimization problem is solved (such as gradient descent or reduction to an eigenproblem). For example, the Metric Multidimensional Scaling manifold learning problem maps the pseudometric space (X,dX)(X,d_{X}) to (|X|,m,{liโ€‹j})(|X|,m,\{l_{{}_{ij}}\}) where liโ€‹jโ€‹(ฮด)=(dXโ€‹(xi,xj)โˆ’ฮด)2l_{{}_{ij}}(\delta)=(d_{X}(x_{i},x_{j})-\delta)^{2}. Optimizing this objective involves finding a matrix AA that minimizes โˆ‘i,jโˆˆ1โ€‹โ€ฆโ€‹|X|(dXโ€‹(xi,xj)โˆ’โ€–Aiโˆ’Ajโ€–)2\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(d_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2}. That is, the Euclidean distance matrix of the rows of the optimal AA is as close as possible to the dXd_{X} distance matrix.

If a manifold learning problem maps isometric pseudometric spaces to embedding optimization problems with the same solution set, we call it isometry-invariant. Intuitively, isometry-invariant manifold learning algorithms do not change their output based the ordering of XX. A particularly useful property of isometry-invariant manifold learning problems is that they factor through hierarchical clustering algorithms.

Proposition 1.

Given any isometry-invariant manifold learning problem MM, there exists a manifold learning problem Lโˆ˜HL\circ H, where HH is a hierarchical overlapping clustering algorithm (as defined by Shiebler [21]) and LL is a function that maps the output of HH to an embedding optimization problem, such that the solution spaces of the images of MM and Lโˆ˜HL\circ H on any pseudometric space (X,dX)(X,d_{X}) are identical.

Intuitively, Proposition 1 holds because manifold learning problems generate loss functions by grouping points in the finite pseudometric space together. In order to use this property to adapt clustering theorems into manifold learning theorems we will introduce a target category of pairwise embedding optimization problems and replace functions with functors from ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}} into this category. To start, consider the category ๐‹\mathbf{L}:

Definition 2.1.

The objects in ๐‹\mathbf{L} are tuples (n,{liโ€‹j})(n,\{l_{{}_{ij}}\}) where nn is a natural number and liโ€‹j:โ„โ‰ฅ0โ†’โ„l_{{}_{ij}}:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R} is a real-valued function that satisfies liโ€ฒโ€‹jโ€ฒโ€‹(x)=0l_{i^{\prime}j^{\prime}}(x)=0 for iโ€ฒ>ni^{\prime}>n or jโ€ฒ>nj^{\prime}>n. ๐‹\mathbf{L} is a preorder where (n,{liโ€‹j})โ‰ค(nโ€ฒ,{liโ€‹jโ€ฒ})(n,\{l_{{}_{ij}}\})\leq(n^{\prime},\{l_{{}_{ij}}^{\prime}\}) iff for any xโˆˆโ„โ‰ฅ0,i,jโˆˆโ„•x\in\mathbb{R}_{\geq 0},i,j\in\mathbb{N} we have liโ€‹jโ€‹(x)โ‰คliโ€‹jโ€ฒโ€‹(x)l_{{}_{ij}}(x)\leq l_{{}_{ij}}^{\prime}(x).

Given a choice of mm we can view the object (n,{liโ€‹j})(n,\{l_{{}_{ij}}\}) in ๐‹\mathbf{L} as a pairwise embedding optimization problem. However, ๐‹\mathbf{L} is not quite expressive enough to serve as our target category. Recall the Metric Multidimensional Scaling manifold learning problem, which maps the pseudometric space (X,dX)(X,d_{X}) to the pairwise embedding optimization problem (|X|,m,{liโ€‹j})(|X|,m,\{l_{{}_{ij}}\}) where liโ€‹jโ€‹(ฮด)=(dXโ€‹(xi,xj)โˆ’ฮด)2l_{{}_{ij}}(\delta)=(d_{X}(x_{i},x_{j})-\delta)^{2}. Since liโ€‹jl_{{}_{ij}} does not vary monotonically with dXd_{X}, it is clear that this manifold learning problem is not a functor from ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}} to ๐‹\mathbf{L}. However, note that we can express liโ€‹jโ€‹(A)l_{{}_{ij}}(A) as the sum of a term that increases monotonically in dXโ€‹(xi,xj)d_{X}(x_{i},x_{j}) and a term that decreases monotonically in dXโ€‹(xi,xj)d_{X}(x_{i},x_{j}):

liโ€‹jโ€‹(ฮด)=(dXโ€‹(xi,xj)โˆ’ฮด)2=(ฮด2+dXโ€‹(xi,xj)2)โˆ’(2โ€‹ฮดโ€‹dXโ€‹(xi,xj))\displaystyle l_{{}_{ij}}(\delta)=(d_{X}(x_{i},x_{j})-\delta)^{2}=\left(\delta^{2}+d_{X}(x_{i},x_{j})^{2}\right)-\left(2\delta d_{X}(x_{i},x_{j})\right)

We will see in Section 2.3 that the embedding optimization problems associated with many common manifold learning algorithms decompose similarly. We can build on this insight to create a new category ๐‹๐จ๐ฌ๐ฌ\mathbf{Loss} with the following pullback:

๐‹๐จ๐ฌ๐ฌ{\mathbf{Loss}}๐‹{\mathbf{L}}๐‹oโ€‹p{\mathbf{L}^{op}}โ„•{\mathbb{N}}U\scriptstyle{U}U\scriptstyle{U}

where UU is the forgetful functor that maps (n,{liโ€‹j})(n,\{l_{{}_{ij}}\}) to nn. Intuitively ๐‹๐จ๐ฌ๐ฌ\mathbf{Loss} is a subcategory of ๐‹oโ€‹pร—๐‹\mathbf{L}^{op}\times\mathbf{L} and we can write the objects in ๐‹๐จ๐ฌ๐ฌ\mathbf{Loss} as tuples (n,{ciโ€‹j,eiโ€‹j})(n,\{c_{{}_{ij}},e_{{}_{ij}}\}) where (n,{ciโ€‹j,eiโ€‹j})โ‰ค(nโ€ฒ,{ciโ€‹jโ€ฒ,eiโ€‹jโ€ฒ})(n,\{c_{{}_{ij}},e_{{}_{ij}}\})\leq(n^{\prime},\{c_{{}_{ij}}^{\prime},e_{{}_{ij}}^{\prime}\}) iff for any xโˆˆโ„โ‰ฅ0,i,jโˆˆโ„•x\in\mathbb{R}_{\geq 0},i,j\in\mathbb{N} we have ciโ€‹jโ€ฒโ€‹(x)โ‰คciโ€‹jโ€‹(x)c_{{}_{ij}}^{\prime}(x)\leq c_{{}_{ij}}(x) and eiโ€‹jโ€‹(x)โ‰คeiโ€‹jโ€ฒโ€‹(x)e_{{}_{ij}}(x)\leq e_{{}_{ij}}^{\prime}(x). Given a choice of mm, each object (n,{ciโ€‹j,eiโ€‹j)(n,\{c_{{}_{ij}},e_{{}_{ij}}) in ๐‹๐จ๐ฌ๐ฌ\mathbf{Loss} corresponds to the pairwise embedding optimization problem (n,m,{liโ€‹j})(n,m,\{l_{{}_{ij}}\}) where liโ€‹jโ€‹(ฮด)=ciโ€‹jโ€‹(ฮด)+eiโ€‹jโ€‹(ฮด)l_{{}_{ij}}(\delta)=c_{{}_{ij}}(\delta)+e_{{}_{ij}}(\delta).

Similarly to the representation of hierarchical clustering algorithms as maps into a category ๐…๐‚๐จ๐ฏ\mathbf{F}\mathbf{Cov} of functors (0,1]opโ†’๐‚๐จ๐ฏ(0,1]^{\text{op}}\rightarrow\mathbf{Cov}, we will represent manifold learning algorithms as mapping into a category ๐…๐‹๐จ๐ฌ๐ฌ\mathbf{F}\mathbf{Loss} of functors (0,1]opโ†’๐‹๐จ๐ฌ๐ฌ(0,1]^{\text{op}}\rightarrow\mathbf{Loss}. The objects in ๐…๐‹๐จ๐ฌ๐ฌ\mathbf{F}\mathbf{Loss} are functors F:(0,1]opโ†’๐‹๐จ๐ฌ๐ฌF:(0,1]^{\text{op}}\rightarrow\mathbf{Loss} that commute with the forgetful functor that maps (n,{ciโ€‹j,eiโ€‹j})(n,\{c_{{}_{ij}},e_{{}_{ij}}\}) to nn, and the morphisms are natural transformations. We call nn the cardinality of FF. We can define a functor Fโ€‹lโ€‹aโ€‹tโ€‹tโ€‹eโ€‹n:๐…๐‹๐จ๐ฌ๐ฌโ†’๐‹๐จ๐ฌ๐ฌFlatten:\mathbf{F}\mathbf{Loss}\rightarrow\mathbf{Loss} which maps the functor FF where Fโ€‹(a)=(n,{cFโ€‹(a)iโ€‹j,eFโ€‹(a)iโ€‹j})F(a)=(n,\{c_{{F(a)}_{ij}},e_{{F(a)}_{ij}}\}) to the tuple (n,{ciโ€‹j,eiโ€‹j})(n,\{c_{{}_{ij}},e_{{}_{ij}}\}) where:

ciโ€‹jโ€‹(x)=โˆซaโˆˆ(0,1]cFโ€‹(a)iโ€‹jโ€‹(x)โ€‹๐‘‘aeiโ€‹jโ€‹(x)=โˆซaโˆˆ(0,1]eFโ€‹(a)iโ€‹jโ€‹(x)โ€‹๐‘‘a\displaystyle c_{{}_{ij}}(x)=\int_{a\in(0,1]}c_{{F(a)}_{ij}}(x)\ da\qquad e_{{}_{ij}}(x)=\int_{a\in(0,1]}e_{{F(a)}_{ij}}(x)\ da

Therefore, each functor Fโˆˆ๐…๐‹๐จ๐ฌ๐ฌF\in\mathbf{F}\mathbf{Loss} with cardinality nn corresponds to the pairwise embedding optimization problem (n,m,{lFiโ€‹j})(n,m,\{l_{{F}_{ij}}\}) where lFiโ€‹jโ€‹(ฮด)=โˆซaโˆˆ(0,1]cFโ€‹(a)iโ€‹jโ€‹(ฮด)+eFโ€‹(a)iโ€‹jโ€‹(ฮด)โ€‹dโ€‹al_{{F}_{ij}}(\delta)=\int_{a\in(0,1]}c_{{F(a)}_{ij}}(\delta)+e_{{F(a)}_{ij}}(\delta)\ da. We will call the sum of these terms, ๐ฅFโ€‹(A)\mathbf{l}_{F}(A), the FF-loss:

๐ฅFโ€‹(A)=โˆ‘iโˆˆ1โ€‹โ€ฆโ€‹njโˆˆ1โ€‹โ€ฆโ€‹nlFiโ€‹jโ€‹(โ€–Aiโˆ’Ajโ€–)=โˆ‘iโˆˆ1โ€‹โ€ฆโ€‹njโˆˆ1โ€‹โ€ฆโ€‹nโˆซaโˆˆ(0,1]cFโ€‹(a)iโ€‹jโ€‹(โ€–Aiโˆ’Ajโ€–)+eFโ€‹(a)iโ€‹jโ€‹(โ€–Aiโˆ’Ajโ€–)โ€‹dโ€‹a\displaystyle\mathbf{l}_{F}(A)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}l_{{F}_{ij}}(\|A_{i}-A_{j}\|)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}\int_{a\in(0,1]}c_{{F(a)}_{ij}}(\|A_{i}-A_{j}\|)+e_{{F(a)}_{ij}}(\|A_{i}-A_{j}\|)\ da

We can now give our definition of a manifold learning functor:

Definition 2.2.

Suppose ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}} is the category of pseudometric spaces and non-expansive maps and ๐…๐‚๐จ๐ฏ\mathbf{F}\mathbf{Cov} is the category of fuzzy non-nested flag covers and natural transformations (see Section 1.4). Then given the subcategories ๐ƒโІ๐๐Œ๐ž๐ญ,๐ƒโ€ฒโІ๐…๐‚๐จ๐ฏ\mathbf{D}\subseteq\mathcal{\mathbf{PMet}},\mathbf{D}^{\prime}\subseteq\mathbf{F}\mathbf{Cov}, the composition Lโˆ˜H:๐ƒโ†’๐…๐‹๐จ๐ฌ๐ฌL\circ H:\mathbf{D}\rightarrow\mathbf{F}\mathbf{Loss} forms a ๐ƒ\mathbf{D}-manifold learning functor if H:๐ƒโ†’๐ƒโ€ฒH:\mathbf{D}\rightarrow\mathbf{D}^{\prime} is a hierarchical ๐ƒ\mathbf{D}-clustering functor and L:๐ƒโ€ฒโ†’๐‹๐จ๐ฌ๐ฌL:\mathbf{D}^{\prime}\rightarrow\mathbf{Loss} is a functor that maps a fuzzy non-nested flag cover with vertex set XX to some FXโˆˆ๐…๐‹๐จ๐ฌ๐ฌF_{X}\in\mathbf{F}\mathbf{Loss} with cardinality |X||X|.

Intuitively a manifold learning functor ๐ƒโ†’๐ป๐ƒโ€ฒโ†’๐ฟ๐…๐‹๐จ๐ฌ๐ฌ\mathbf{D}\xrightarrow{H}\mathbf{D}^{\prime}\xrightarrow{L}\mathbf{F}\mathbf{Loss} factors through a hierarchical clustering functor and sends (X,dX)(X,d_{X}) to FF where Fโ€‹(a)=(|X|,{cFโ€‹(a)iโ€‹j,eFโ€‹(a)iโ€‹j})F(a)=(|X|,\{c_{{F(a)}_{ij}},e_{{F(a)}_{ij}}\}). We will say that M=Lโˆ˜HM=L\circ H is in standard form if MM maps the one-point metric space ({โˆ—},0)(\{*\},0) to some FF where cFโ€‹(a)iโ€‹jโ€‹(x)=eFโ€‹(a)iโ€‹jโ€‹(x)=0c_{{F(a)}_{ij}}(x)=e_{{F(a)}_{ij}}(x)=0 and โˆ€ฯต,ฮดโˆˆโ„โ‰ฅ0,Hโ€‹(X,dX+ฯต)โ€‹(โˆ’logโก(ฮด))โ‰ƒHโ€‹(X,dX)โ€‹(โˆ’logโก(ฮด+ฯต))\forall\epsilon,\delta\in\mathbb{R}_{\geq 0},H(X,d_{X}+\epsilon)(-\log(\delta))\simeq H(X,d_{X})(-\log(\delta+\epsilon)). Each manifold learning functor corresponds to a manifold learning problem that maps (X,dX)(X,d_{X}) to (|X|,m,๐ฅMโ€‹(X,dX))(|X|,m,\mathbf{l}_{M(X,d_{X})}).

2.1 A Spectrum of Manifold Learning Functors

Recall the single and maximal linkage hierarchical overlapping clustering algorithms ๐’ฎโ€‹โ„’\mathcal{SL} and โ„ณโ€‹โ„’\mathcal{ML}. These algorithms map the pseudometric space (X,dX)(X,d_{X}) to the fuzzy non-nested flag cover (X,๐’žXa)(X,\mathcal{C}_{X_{a}}) where ๐’žXa\mathcal{C}_{X_{a}} is respectively the connected components of the โˆ’logโก(a)-\log(a)-Vietoris-Rips complex of (X,dX)(X,d_{X}) and the maximally linked sets of the relation RaR_{a} in which x1โ€‹Raโ€‹x2x_{1}R_{a}x_{2} if dXโ€‹(x1,x2)โ‰คโˆ’logโก(a)d_{X}(x_{1},x_{2})\leq-\log(a) [21, 12]. If we apply functoriality to Proposition 6 in Shiebler [21] we see:

Proposition 2.

Suppose ๐ƒ\mathbf{D} is a subcategory of ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}} such that ๐๐Œ๐ž๐ญ๐‘๐‘–๐‘—โІ๐ƒ\mathcal{\mathbf{PMet}}_{\mathit{bij}}\subseteq\mathbf{D}, Lโˆ˜HL\circ H is a ๐ƒ\mathbf{D}-manifold learning functor such that HH is non-trivial [21, 12] and for all aโˆˆ(0,1]a\in(0,1], the functor Hโ€‹(โˆ’)โ€‹(a):๐ƒโ†’๐‚๐จ๐ฏH(-)(a):\mathbf{D}\rightarrow\mathbf{Cov} has clustering parameter ฮดH,a\delta_{H,a}. Then for aโˆˆ(0,1]a\in(0,1] and (X,dX)โˆˆ๐ƒ(X,d_{X})\in\mathbf{D} we have maps:

(Lโˆ˜โ„ณโ€‹โ„’)โ€‹(X,dX)โ€‹(eโˆ’ฮดH,a)โ‰ค(Lโˆ˜H)โ€‹(X,dX)โ€‹(a)โ‰ค(Lโˆ˜๐’ฎโ€‹โ„’)โ€‹(X,dX)โ€‹(eโˆ’ฮดH,a)\displaystyle(L\circ\mathcal{ML})(X,d_{X})(e^{-\delta_{H,a}})\leq(L\circ H)(X,d_{X})(a)\leq(L\circ\mathcal{SL})(X,d_{X})(e^{-\delta_{H,a}}) (1)

that are natural in aa and (X,dX)(X,d_{X}).

Intuitively, this proposition states that every manifold learning functor maps (X,dX)(X,d_{X}) to a loss that is both no more interconnected than the loss that does not distinguish points within the same connected component of the Vietoris-Rips complex and no less interconnected than the loss that treats each pair of points independently.

There are many manifold learning functors that lie between these extremes. In particular, for any functor L:๐๐Œ๐ž๐ญ๐‘–๐‘›๐‘—โ†’๐‹๐จ๐ฌ๐ฌL:\mathcal{\mathbf{PMet}}_{\mathit{inj}}\rightarrow\mathbf{Loss} and sequence of clustering functors โ„ณโ€‹โ„’,H1,H2,โ€ฆ,Hn,๐’ฎโ€‹โ„’\mathcal{ML},H_{1},H_{2},...,H_{n},\mathcal{SL} whose outputs refine each other, we can apply functoriality to form a sequence of manifold learning functors Lโˆ˜โ„ณโ€‹โ„’โ‰คLโˆ˜H1โ‰คโ€ฆโ‰คLโˆ˜Hnโ‰คLโˆ˜๐’ฎโ€‹โ„’L\circ\mathcal{ML}\leq L\circ H_{1}\leq...\leq L\circ H_{n}\leq L\circ\mathcal{SL}. For example, consider the family โ„’k\mathcal{L}_{k} of hierarchical overlapping clustering functors from Culbertson et al.ย [12]: for kโˆˆโ„•k\in\mathbb{N}, the cover โ„’kโ€‹(X,dX)โ€‹(a)\mathcal{L}_{k}(X,d_{X})(a) is the maximal linked sets of the relation RaR_{a} where xโ€‹Raโ€‹xโ€ฒxR_{a}x^{\prime} if there is a sequence x=x1,x2โ€‹โ€ฆ,xkโˆ’1,xk=xโ€ฒx=x_{1},x_{2}...,x_{k-1},x_{k}=x^{\prime} in XX where dโ€‹(xi,xi+1)โ‰คโˆ’logโก(a)d(x_{i},x_{i+1})\leq-\log(a). The functor Lโˆ˜โ„’kL\circ\mathcal{L}_{k} therefore maps (X,dX)(X,d_{X}) to a loss that distinguishes only between points whose shortest path in the Vietoris-Rips complex is longer than kk. For k>1k>1 this loss is more interconnected than Lโˆ˜โ„ณโ€‹โ„’L\circ\mathcal{ML} and less interconnected than Lโˆ˜๐’ฎโ€‹โ„’L\circ\mathcal{SL}. This also suggests a recipe for generating new manifold learning algorithms (see Section 4): first express an existing manifold learning problem in the form Lโˆ˜HL\circ H, and then form Lโˆ˜๐’ฎโ€‹โ„’,Lโˆ˜โ„ณโ€‹โ„’L\circ\mathcal{SL},L\circ\mathcal{ML}, or any of the functors along the spectrum Lโˆ˜โ„’kL\circ\mathcal{L}_{k}.

2.2 Characterizing Manifold Learning Problems

Similarly to how Carlsson et al.ย [9] characterize clustering algorithms in terms of their functoriality over different subcategories of pseudometric spaces, we can characterize manifold learning algorithms based on the subcategory ๐ƒโІ๐๐Œ๐ž๐ญ\mathbf{D}\subseteq\mathcal{\mathbf{PMet}} over which they are functorial.

We have already introduced isometry-invariant manifold learning problems. Any ๐๐Œ๐ž๐ญ๐‘–๐‘ ๐‘œ๐‘š\mathcal{\mathbf{PMet}}_{\mathit{isom}}-manifold learning functor is isometry-invariant, and an isometry-invariant manifold learning problem is expansive-contractive if the loss that it aims to minimize decomposes into the sum of an expansive term ee that decreases as distances increase and a contractive term cc that increases as distances increase. Intuitively, expansive-contractive manifold learning problems use the term ee to push together points that are close in the original space and use the term cc to push apart points that are far in the original space. Any ๐๐Œ๐ž๐ญ๐‘๐‘–๐‘—\mathcal{\mathbf{PMet}}_{\mathit{bij}}-manifold learning functor is expansive-contractive.

An expansive-contractive manifold learning problem is positive extensible if cc increases and ee decreases when we add new points to increase |X||X|. If instead cc decreases and ee increases when we add new points to increase |X||X|, we say it is negative extensible. Intuitively, many positive-extensible manifold learning problems are minmax problems that aim to simultaneously minimize |c||c| and maximize |e||e|. Any ๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}-manifold learning functor is positive extensible and any ๐๐Œ๐ž๐ญ๐‘–๐‘›๐‘—\mathcal{\mathbf{PMet}}_{\mathit{inj}}-manifold learning functor is negative extensible.

Proposition 3.

Suppose MM is a standard form ๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}-manifold learning functor and Mโ€ฒM^{\prime} is a standard form ๐๐Œ๐ž๐ญ๐‘–๐‘›๐‘—\mathcal{\mathbf{PMet}}_{\mathit{inj}}-manifold learning functor. Then for any (X,dX)(X,d_{X}) and aโˆˆ(0,1]a\in(0,1] we have that eMโ€‹(X,dX)โ€‹(a)iโ€‹je_{{M(X,d_{X})(a)}_{ij}}, cMโ€ฒโ€‹(X,dX)โ€‹(a)iโ€‹jc_{{M^{\prime}(X,d_{X})(a)}_{ij}} are non-positive and cMโ€‹(X,dX)โ€‹(a)iโ€‹jc_{{M(X,d_{X})(a)}_{ij}}, eMโ€ฒโ€‹(X,dX)โ€‹(a)iโ€‹je_{{M^{\prime}(X,d_{X})(a)}_{ij}} are non-negative.

In the next section we show examples of manifold learning algorithms in each of these categories.

2.3 Examples

2.3.1 Metric Multidimensional Scaling (๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}-Manifold Learning Functor)

The most straightforward strategy for learning embeddings is to minimize the difference between the pairwise distance matrix of the original space and the pairwise Euclidean distance matrix of the learned embeddings. The Metric Multidimensional Scaling algorithm [2] does exactly this. Given a finite pseudometric space (X,dX)(X,d_{X}), the Metric Multidimensional Scaling embedding optimization problem is (|X|,m,l)(|X|,m,l) where lโ€‹(A)=โˆ‘iโˆˆ1โ€‹โ€ฆโ€‹njโˆˆ1โ€‹โ€ฆโ€‹n(dXโ€‹(xi,xj)โˆ’โ€–Aiโˆ’Ajโ€–)2l(A)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}(d_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2}. When the distance matrix of the pseudometric space is double-centered (i.e. mean values of rows/columns are zero) this is the same objective that Principal Components Analysis (PCA) optimizes [3]. Now define ๐‘€๐ท๐‘†:๐…๐‚๐จ๐ฏsโ€‹uโ€‹rโ†’๐…๐‹๐จ๐ฌ๐ฌ\mathit{MDS}:\mathbf{F}\mathbf{Cov}_{{sur}}\rightarrow\mathbf{F}\mathbf{Loss} to map the fuzzy non-nested flag cover H:(0,1]opโ†’๐‚๐จ๐ฏiโ€‹nโ€‹jH:(0,1]^{\text{op}}\rightarrow\mathbf{Cov}_{{inj}} with vertex set XX to F:(0,1]opโ†’๐‹๐จ๐ฌ๐ฌF:(0,1]^{\text{op}}\rightarrow\mathbf{Loss} where Fโ€‹(a)=(|X|,{cFโ€‹(a)iโ€‹j,eFโ€‹(a)iโ€‹j},0)F(a)=(|X|,\{c_{{F(a)}_{ij}},e_{{F(a)}_{ij}}\},0) and:

cFโ€‹(a)iโ€‹jโ€‹(x)={x2โˆƒSโˆˆH(a),xi,xjโˆˆS})x2+2โ€‹x2โ€‹(1/Wiโ€‹jโˆ’1/a)else\displaystyle c_{{F(a)}_{ij}}(x)=\begin{cases}x^{2}&\exists S\in H(a),\ x_{i},x_{j}\in S\})\\ x^{2}+2x^{2}\left(1/W_{ij}-1/a\right)&\text{else}\end{cases}
eFโ€‹(a)iโ€‹jโ€‹(x)={0โˆƒSโˆˆH(a),xi,xjโˆˆS})2โ€‹logโก(Wiโ€‹j)Wiโ€‹jโˆ’2โ€‹logโก(a)aelse\displaystyle e_{{F(a)}_{ij}}(x)=\begin{cases}0&\exists S\in H(a),\ x_{i},x_{j}\in S\})\\ \frac{2\log(W_{ij})}{W_{ij}}-\frac{2\log(a)}{a}&\text{else}\end{cases}

where:

Wiโ€‹j=supโ‰ฅ0{a|aโˆˆ(0,1],โˆƒSโˆˆHโ€‹(a),xi,xjโˆˆS}\displaystyle W_{ij}=\sup_{\geq 0}\{a\ |\ a\in(0,1],\ \exists S\in H(a),\ x_{i},x_{j}\in S\}
Proposition 4.

๐‘€๐ท๐‘†\mathit{MDS} is a functor, and ๐‘€๐ท๐‘†โˆ˜โ„ณโ€‹โ„’\mathit{MDS}\circ\mathcal{ML} is a ๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}-manifold learning functor that maps the finite pseudometric space (X,dX)(X,d_{X}) to the Metric Multidimensional Scaling embedding optimization problem.

2.3.2 IsoMap (๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}-Manifold Learning Functor)

For many real world datasets it is the case that the distances between nearby points are more reliable and less noisy than the distances between far away points. The IsoMap algorithm [23] redefines the distances between far apart points in terms of the distances between near points. Given a finite pseudometric space (X,dX)(X,d_{X}), the IsoMap embedding optimization problem is (|X|,m,l)(|X|,m,l) where lโ€‹(A)=โˆ‘iโˆˆ1โ€‹โ€ฆโ€‹njโˆˆ1โ€‹โ€ฆโ€‹n(dXโ€ฒโ€‹(xi,xj)โˆ’โ€–Aiโˆ’Ajโ€–)2l(A)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}(d^{\prime}_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2} such that dXโ€ฒโ€‹(xi,xj)d^{\prime}_{X}(x_{i},x_{j}) is the length of the shortest path between xix_{i} and xjx_{j} in the graph in which there exists an edge of length dXโ€‹(x,xโ€ฒ)d_{X}(x,x^{\prime}) between each pair of points (x,xโ€ฒ)โˆˆX(x,x^{\prime})\in X with dXโ€‹(x,xโ€ฒ)โ‰คฮดd_{X}(x,x^{\prime})\leq\delta.

Proposition 5.

For any ฮดโˆˆโ„โ‰ฅ0\delta\in\mathbb{R}_{\geq 0}, there exists a hierarchical ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}}-clustering functor Iโ€‹sโ€‹oโ€‹Cโ€‹lโ€‹uโ€‹sโ€‹tโ€‹eโ€‹rฮดIsoCluster_{\delta} such that the ๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}-manifold learning functor ๐‘€๐ท๐‘†โˆ˜Iโ€‹sโ€‹oโ€‹Cโ€‹lโ€‹uโ€‹sโ€‹tโ€‹eโ€‹rฮด\mathit{MDS}\circ IsoCluster_{\delta} maps the finite pseudometric space (X,dX)(X,d_{X}) to the IsoMap embedding optimization problem.

2.3.3 kk-Path Scaling (๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}-Manifold Learning Functor)

Given a finite pseudometric space (X,dX)(X,d_{X}) and kโˆˆโ„•k\in\mathbb{N}, suppose dXโ€ฒโ€‹(xi,xj)d^{\prime}_{X}(x_{i},x_{j}) is the smallest ฮด\delta such that there exists a path of โ‰คk\leq k edges between xix_{i} and xjx_{j} in the ฮด\delta-Vietoris Rips complex of (X,dX)(X,d_{X}). Then the ๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}-manifold learning functor ๐‘€๐ท๐‘†โˆ˜โ„’k\mathit{MDS}\circ\mathcal{L}_{k} maps (X,dX)(X,d_{X}) to the kk-Path Scaling embedding optimization problem (|X|,m,l)(|X|,m,l), where lโ€‹(A)=โˆ‘i,jโˆˆ1โ€‹โ€ฆโ€‹|X|(dXโ€ฒโ€‹(xi,xj)โˆ’โ€–Aiโˆ’Ajโ€–)2l(A)=\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(d^{\prime}_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2}.

2.3.4 kk-Vertex-Connected Scaling (๐๐Œ๐ž๐ญ๐‘๐‘–๐‘—\mathcal{\mathbf{PMet}}_{\mathit{bij}}-Manifold Learning Functor)

For kโˆˆโ„•k\in\mathbb{N} the hierarchical overlapping clustering functor ๐’ฑโ€‹โ„’k\mathcal{VL}_{k} maps the finite pseudometric space (X,dX)(X,d_{X}) to the fuzzy non-nested flag cover (X,๐’žXa)(X,\mathcal{C}_{X_{a}}) where ๐’žXa\mathcal{C}_{X_{a}} is the set of maximal minโก(|X|,k)\min(|X|,k)-vertex-connected subgraphs of the โˆ’logโก(a)-\log(a)-Vietoris-Rips complex of (X,dX)(X,d_{X}). Note that ๐’ฑโ€‹โ„’1=๐’ฎโ€‹โ„’\mathcal{VL}_{1}=\mathcal{SL} and lโ€‹iโ€‹mkโ†’โˆžโ€‹๐’ฑโ€‹โ„’k=โ„ณโ€‹โ„’lim_{k\rightarrow\infty}\mathcal{VL}_{k}=\mathcal{ML}. Note also that for k>1k>1 the map ๐’ฑโ€‹โ„’k\mathcal{VL}_{k} is functorial over ๐๐Œ๐ž๐ญ๐‘–๐‘›๐‘—\mathcal{\mathbf{PMet}}_{\mathit{inj}} but not all of ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}} since a non-injective map may split a kk-vertex-linked subgraph [12].

Now given a finite pseudometric space (X,dX)(X,d_{X}) and kโˆˆโ„•k\in\mathbb{N}, suppose dXโ€ฒโ€‹(xi,xj)d^{\prime}_{X}(x_{i},x_{j}) is the smallest ฮด\delta such that there exists a minโก(|X|,k)\min(|X|,k)-vertex-connected subgraph of the ฮด\delta-Vietoris-Rips complex of (X,dX)(X,d_{X}) that contains both xix_{i} and xjx_{j}. Then the ๐๐Œ๐ž๐ญ๐‘๐‘–๐‘—\mathcal{\mathbf{PMet}}_{\mathit{bij}}-manifold learning functor ๐‘€๐ท๐‘†โˆ˜๐’ฑโ€‹โ„’k\mathit{MDS}\circ\mathcal{VL}_{k} maps (X,dX)(X,d_{X}) to the kk-Vertex Connected Scaling embedding optimization problem (|X|,m,l)(|X|,m,l) where lโ€‹(A)=โˆ‘i,jโˆˆ1โ€‹โ€ฆโ€‹|X|(dXโ€ฒโ€‹(xi,xj)โˆ’โ€–Aiโˆ’Ajโ€–)2l(A)=\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(d^{\prime}_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2}. Note that unlike ๐‘€๐ท๐‘†โˆ˜โ„’k\mathit{MDS}\circ\mathcal{L}_{k}, for k>1k>1 the map ๐‘€๐ท๐‘†โˆ˜๐’ฑโ€‹โ„’k\mathit{MDS}\circ\mathcal{VL}_{k} is not functorial over all of ๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}} since ๐’ฑโ€‹โ„’k\mathcal{VL}_{k} is not functorial over ๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}.

2.3.5 UMAP (๐๐Œ๐ž๐ญ๐‘–๐‘ ๐‘œ๐‘š\mathcal{\mathbf{PMet}}_{\mathit{isom}}-Manifold Learning Functor)

The UMAP algorithm builds a local uber-metric space around each point in XX, converts each local uber-metric space to a fuzzy simplicial complex, and minimizes a loss function based on a fuzzy union of these fuzzy simplicial complexes. Given a finite pseudometric space (X,dX)(X,d_{X}), the UMAP embedding optimization problem is (|X|,m,l)(|X|,m,l) where ll is the fuzzy cross-entropy:

lโ€‹(A)=โˆ‘i,jโˆˆ1โ€‹โ€ฆโ€‹|X|Wiโ€‹jโ€‹logโก(Wiโ€‹jeโˆ’โ€–Aiโˆ’Ajโ€–)+(1โˆ’Wiโ€‹j)โ€‹logโก(1โˆ’Wiโ€‹j1โˆ’eโˆ’โ€–Aiโˆ’Ajโ€–)\displaystyle l(A)=\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}W_{ij}\ \log\left(\frac{W_{ij}}{e^{-\|A_{i}-A_{j}\|}}\right)+(1-W_{ij})\ \log\left(\frac{1-W_{ij}}{1-e^{-\|A_{i}-A_{j}\|}}\right)

and Wiโ€‹jW_{ij} is the weight of the fuzzy union of the 11-simplices connecting xix_{i} and xjx_{j} in the Vietoris-Rips complexes formed from the |X||X| local uber-metric spaces (X,dxi)(X,d_{x_{i}}) where:

dxiโ€‹(xj,xk)={dXโ€‹(xj,xk)โˆ’minl=1โ€‹โ€ฆโ€‹nโกdXโ€‹(xi,xl)i=j,i=kโˆželse\displaystyle d_{x_{i}}(x_{j},x_{k})=\begin{cases}d_{X}(x_{j},x_{k})-\min_{l=1...n}d_{X}(x_{i},x_{l})&i=j,i=k\\ \infty&\text{else}\end{cases}
Proposition 6.

There exists a hierarchical ๐๐Œ๐ž๐ญ๐‘–๐‘ ๐‘œ๐‘š\mathcal{\mathbf{PMet}}_{\mathit{isom}}-clustering functor ๐น๐‘ข๐‘ง๐‘ง๐‘ฆ๐‘†๐‘–๐‘š๐‘๐‘™๐‘’๐‘ฅ\mathit{FuzzySimplex} that decomposes into the composition of four functors that:

  1. 1.

    build a local uber-metric space around each point in XX;

  2. 2.

    convert each local uber-metric space to a fuzzy simplicial complex;

  3. 3.

    take a fuzzy union of these fuzzy simplicial complexes;

  4. 4.

    convert the resulting fuzzy simplicial complex to a fuzzy non-nested flag cover.

Proposition 7.

There exists a functor ๐น๐ถ๐ธ:๐…๐‚๐จ๐ฏbโ€‹iโ€‹jโ†’๐…๐‹๐จ๐ฌ๐ฌ\mathit{FCE}:\mathbf{F}\mathbf{Cov}_{{bij}}\rightarrow\mathbf{F}\mathbf{Loss} where ๐น๐ถ๐ธโˆ˜๐น๐‘ข๐‘ง๐‘ง๐‘ฆ๐‘†๐‘–๐‘š๐‘๐‘™๐‘’๐‘ฅ\mathit{FCE}\circ\mathit{FuzzySimplex} is a ๐๐Œ๐ž๐ญ๐‘–๐‘ ๐‘œ๐‘š\mathcal{\mathbf{PMet}}_{\mathit{isom}}-manifold learning functor that maps the pseudometric space (X,dX)(X,d_{X}) to the UMAP embedding optimization problem.

Since the UMAP distance rescaling procedure does not preserve non-expansive maps, even if a map from (X,dX)(X,d_{X}) to (Xโ€ฒ,dXโ€ฒ)(X^{\prime},d_{X^{\prime}}) is non-expansive, this will not necessarily be the case for all of the local uber-metric spaces (X,dxi)(X,d_{x_{i}}) that we build from (X,dX)(X,d_{X}) and (Xโ€ฒ,dXโ€ฒ)(X^{\prime},d_{X^{\prime}}). For this reason ๐น๐ถ๐ธโˆ˜๐น๐‘ข๐‘ง๐‘ง๐‘ฆ๐‘†๐‘–๐‘š๐‘๐‘™๐‘’๐‘ฅ\mathit{FCE}\circ\mathit{FuzzySimplex} is not functorial over ๐๐Œ๐ž๐ญ๐‘๐‘–๐‘—\mathcal{\mathbf{PMet}}_{\mathit{bij}}.

3 Stability of Manifold Learning Algorithms

We can use this functorial perspective on manifold learning to reason about the stability of manifold learning algorithms under dataset noise. An ฯต\epsilon-interleaving between the functors F,G:โ„โ‰ฅ0โ†’๐‚F,G:\mathbb{R}_{\geq 0}\rightarrow\mathbf{C} is a collection of commuting natural transformations between Fโ€‹(ฮด)โ†’Gโ€‹(ฮด+ฯต)F(\delta)\rightarrow G(\delta+\epsilon) and Gโ€‹(ฮด)โ†’Fโ€‹(ฮด+ฯต)G(\delta)\rightarrow F(\delta+\epsilon) [10, 11]. The interleaving distance dId_{I} between such functors is the smallest ฯต\epsilon such that an ฯต\epsilon-interleaving exists. In order to study interleavings between functors in ๐…๐‚๐จ๐ฏ\mathbf{F}\mathbf{Cov} or ๐…๐‹๐จ๐ฌ๐ฌ\mathbf{F}\mathbf{Loss} whose domain is (0,1]op(0,1]^{\text{op}} rather than โ„โ‰ฅ0\mathbb{R}_{\geq 0}, we will say that the functors F,GF,G are ฯตโˆ—\epsilon_{*}-interleaved when there is an ฯต\epsilon-interleaving between the functors Fโˆ˜rF\circ r and Gโˆ˜rG\circ r where rโ€‹(x)=eโˆ’xr(x)=e^{-x}. We will also write dIโˆ—โ€‹(F,G)=dIโ€‹(Fโˆ˜r,Gโˆ˜r)d_{I_{*}}(F,G)=d_{I}(F\circ r,G\circ r).

Proposition 8.

Given a subcategory ๐ƒ\mathbf{D} of ๐๐Œ๐ž๐ญ\mathcal{\mathbf{PMet}}, a standard form ๐ƒ\mathbf{D}-manifold learning functor M=Lโˆ˜HM=L\circ H and a pair of finite pseudometric spaces (X,dX),(Y,dY)(X,d_{X}),(Y,d_{Y}) such that there exists a pair of morphisms f:(X,dX)โ†’(Y,dY+ฯต),g:(Y,dY)โ†’(X,dX+ฯต)f:(X,d_{X})\rightarrow(Y,d_{Y}+\epsilon),g:(Y,d_{Y})\rightarrow(X,d_{X}+\epsilon) in ๐ƒ\mathbf{D}, we have dIโˆ—โ€‹(Mโ€‹(X,dX),Mโ€‹(Y,dY))โ‰คฯตd_{I_{*}}(M(X,d_{X}),M(Y,d_{Y}))\leq\epsilon.

Proposition 8 is similar in spirit to previous results that use the Gromov-Hausdorff distance between metric spaces to bound the bottleneck or homotopy interleaving distances between their corresponding Vietoris-Rips complexes [11, 8, 20, 6]. As a special case, if MM is an ๐๐Œ๐ž๐ญ๐‘๐‘–๐‘—\mathcal{\mathbf{PMet}}_{\mathit{bij}}-manifold learning functor and there exists an ฯต\epsilon-isometry between (X,dX),(Y,dY)(X,d_{X}),(Y,d_{Y}) then dIโˆ—โ€‹(Mโ€‹(X,dX),Mโ€‹(Y,dY))โ‰คฯตd_{I_{*}}(M(X,d_{X}),M(Y,d_{Y}))\leq\epsilon. We can use this to prove the following:

Proposition 9.

Suppose we have a standard form ๐๐Œ๐ž๐ญ๐‘ ๐‘ข๐‘Ÿ\mathcal{\mathbf{PMet}}_{\mathit{sur}}-manifold learning functor MM, a pair of ฯต\epsilon-isometric finite pseudometric spaces (X,dX),(Y,dY)(X,d_{X}),(Y,d_{Y}) and the matrices AX,AYA_{X},A_{Y} that respectively minimize ๐ฅMโ€‹(X,dX)\mathbf{l}_{M(X,d_{X})} and ๐ฅMโ€‹(Y,dY)\mathbf{l}_{M(Y,d_{Y})}. Then if:

|cMโ€‹(X,dX)โ€‹(a)iโ€‹jโ€‹(x)|โ‰คK๐œ2,|cMโ€‹(Y,dY)โ€‹(a)iโ€‹jโ€‹(x)|โ‰คK๐œ2\displaystyle|c_{{M(X,d_{X})(a)}_{ij}}(x)|\leq\frac{K_{\mathbf{c}}}{2},|c_{{M(Y,d_{Y})(a)}_{ij}}(x)|\leq\frac{K_{\mathbf{c}}}{2}

and:

|eMโ€‹(X,dX)โ€‹(a)iโ€‹jโ€‹(x)|โ‰คK๐ž2,|eMโ€‹(Y,dY)โ€‹(a)iโ€‹jโ€‹(x)|โ‰คK๐ž2\displaystyle|e_{{M(X,d_{X})(a)}_{ij}}(x)|\leq\frac{K_{\mathbf{e}}}{2},|e_{{M(Y,d_{Y})(a)}_{ij}}(x)|\leq\frac{K_{\mathbf{e}}}{2}

we have:

๐ฅMโ€‹(X,dX)โ€‹(AY)โ‰ค๐ฅMโ€‹(X,dX)โ€‹(AX)+K๐œโ€‹n2โ€‹(1โˆ’eโˆ’ฯต)+K๐žโ€‹n2โ€‹(eฯตโˆ’1)\displaystyle\mathbf{l}_{M(X,d_{X})}\left(A_{Y}\right)\leq\mathbf{l}_{M(X,d_{X})}\left(A_{X}\right)+K_{\mathbf{c}}n^{2}(1-e^{-\epsilon})+K_{\mathbf{e}}n^{2}(e^{\epsilon}-1) (2)

If eMโ€‹(X,dX)โ€‹(a)iโ€‹jโ€‹(x)e_{{M(X,d_{X})(a)}_{ij}}(x) is constant in xx (such as for any MM that factors as M=๐‘€๐ท๐‘†โˆ˜HM=\mathit{MDS}\circ H) we have:

๐ฅMโ€‹(X,dX)โ€‹(AY)โ‰ค๐ฅMโ€‹(X,dX)โ€‹(AX)+K๐œโ€‹n2โ€‹(1โˆ’eโˆ’ฯต)\displaystyle\mathbf{l}_{M(X,d_{X})}\left(A_{Y}\right)\leq\mathbf{l}_{M(X,d_{X})}\left(A_{X}\right)+K_{\mathbf{c}}n^{2}(1-e^{-\epsilon}) (3)

For a very simple example, consider Multidimensional Scaling without dimensionality reduction. In this case M=๐‘€๐ท๐‘†โˆ˜โ„ณโ€‹โ„’M=\mathit{MDS}\circ\mathcal{ML} and (X,dX),(Y,dY)(X,d_{X}),(Y,d_{Y}) are each finite ordered nn-element subspaces of โ„m\mathbb{R}^{m} with Euclidean distance. If we write the vectors in XX and YY as matrices AX,AYโˆˆ๐Œ๐š๐ญn,mA_{X},A_{Y}\in\mathbf{Mat}_{n,m}, then these matrices respectively minimize ๐ฅMโ€‹(X,dX)\mathbf{l}_{M(X,d_{X})} and ๐ฅMโ€‹(Y,dY)\mathbf{l}_{M(Y,d_{Y})}, and ๐ฅMโ€‹(X,dX)โ€‹(AX)=๐ฅMโ€‹(Y,dY)โ€‹(AY)=0\mathbf{l}_{M(X,d_{X})}(A_{X})=\mathbf{l}_{M(Y,d_{Y})}(A_{Y})=0. Since the function that sends the iith element of XX to the iith element of YY must be an inf{2โ€‹ฯต|โˆ€iโ€–AXiโˆ’AYiโ€–โ‰คฯต}\inf\{2\epsilon\ |\ \forall_{i}\|A_{X_{i}}-A_{Y_{i}}\|\leq\epsilon\}-isometry, Proposition 9 bounds the average of the squared distances between the paired rows of two matrices in terms of the largest such distance.

These bounds apply to a very general class of manifold learning algorithms, including topologically unstable algorithms like IsoMap [5]. As an example, consider using IsoMap to project nn evenly spaced points that lie upon the surface of a radius rr circle in โ„2\mathbb{R}^{2} onto โ„1\mathbb{R}^{1}. In this case (X,dX)(X,d_{X}) is a finite ordered nn-element subspace of โ„2\mathbb{R}^{2} with Euclidean distance, M=๐‘€๐ท๐‘†โˆ˜Iโ€‹sโ€‹oโ€‹Cโ€‹lโ€‹uโ€‹sโ€‹tโ€‹eโ€‹rฮดM=\mathit{MDS}\circ IsoCluster_{\delta} and for any matrix AXโˆˆ๐Œ๐š๐ญn,1A_{X}\in\mathbf{Mat}_{n,1} that consists of nn evenly spaced points along the real line such that AXi+1โˆ’AXi=2โ€‹rโ€‹sinโก(2โ€‹ฯ€2โ€‹n)A_{X_{i+1}}-A_{X_{i}}=2r\sin(\frac{2\pi}{2n}) we have ๐ฅMโ€‹(X,dX)โ€‹(AX)=0\mathbf{l}_{M(X,d_{X})}(A_{X})=0. Now suppose that we instead apply IsoMap to a noisy view of (X,dX)(X,d_{X}): a finite ordered nn-element subspace (Y,dY)(Y,d_{Y}) of โ„2\mathbb{R}^{2} where dYd_{Y} is Euclidean distance and โˆ€i=1โ€‹โ€ฆโ€‹ndXโ€‹(Xi,Yi)=dYโ€‹(Xi,Yi)=โ€–Xiโˆ’Yiโ€–โ‰คฯต\forall_{i=1...n}d_{X}(X_{i},Y_{i})=d_{Y}(X_{i},Y_{i})=\|X_{i}-Y_{i}\|\leq\epsilon. Then for any matrix AYโˆˆ๐Œ๐š๐ญn,1A_{Y}\in\mathbf{Mat}_{n,1} that minimizes ๐ฅMโ€‹(Y,dY)\mathbf{l}_{M(Y,d_{Y})}, Proposition 9 bounds the average squared difference between |AYi+1โˆ’AYi||A_{Y_{i+1}}-A_{Y_{i}}| and 2โ€‹rโ€‹sinโก(2โ€‹ฯ€2โ€‹n)2r\sin(\frac{2\pi}{2n}).

4 Experiments in Functorial Recombination

One benefit of the functorial perspective on manifold learning is that it provides a natural way to produce new manifold learning algorithms by recombining the components of existing algorithms. Suppose we are able to express two existing manifold learning algorithms M1,M2M_{1},M_{2} in this framework such that M1=L1โˆ˜H1M_{1}=L_{1}\circ H_{1} and M2=L2โˆ˜H2M_{2}=L_{2}\ \circ H_{2} where H1,H2H_{1},H_{2} are hierarchical clustering functors. Then we can use the compositionality of functors to define the manifold learning algorithms L2โˆ˜H1L_{2}\circ H_{1} or L1โˆ˜H2L_{1}\circ H_{2}. We can use this procedure to combine the strengths of multiple algorithms in a way that preserves functoriality (and therefore also stability by Proposition 9). For example, if we compose the ๐น๐‘ข๐‘ง๐‘ง๐‘ฆ๐‘†๐‘–๐‘š๐‘๐‘™๐‘’๐‘ฅ\mathit{FuzzySimplex} functor from Proposition 6 with ๐‘€๐ท๐‘†\mathit{MDS} we form the ๐๐Œ๐ž๐ญ๐‘–๐‘ ๐‘œ๐‘š\mathcal{\mathbf{PMet}}_{\mathit{isom}}-manifold learning functor ๐‘€๐ท๐‘†โˆ˜๐น๐‘ข๐‘ง๐‘ง๐‘ฆ๐‘†๐‘–๐‘š๐‘๐‘™๐‘’๐‘ฅ\mathit{MDS}\circ\mathit{FuzzySimplex} which maps (X,dX)(X,d_{X}) to the embedding optimization problem (|X|,m,l)(|X|,m,l) where lโ€‹(A)=โˆ‘i,jโˆˆ1โ€‹โ€ฆโ€‹|X|(โˆ’logโก(ฮฑiโ€‹j)โˆ’โ€–Aiโˆ’Ajโ€–)2l(A)=\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(-\log(\alpha_{ij})-\|A_{i}-A_{j}\|)^{2} and ฮฑiโ€‹j\alpha_{ij} is the strength of the fuzzy simplex that UMAP forms between xix_{i} and xjx_{j}.

For a more illustrative example, consider a DNA recombination task in which we attempt to match a string of DNA that has been repeatedly mutated back to the original string. One way to solve this task is to generate embeddings for each string of DNA and look at the nearest neighbors of the mutated string. We can simulate this task as follows

  1. 1.

    Generate NN original random sequences of DNA of length LL (strings of โ€œAโ€, โ€œCโ€, โ€œGโ€, โ€œTโ€).

  2. 2.

    For each sequence, mutate the sequence MM times to produce a mutation list, or a list of sequences which each start with an original DNA sequence and end with a final DNA sequence.

  3. 3.

    Collect each of the MM sequences in each of these NN mutation lists into an Nโˆ—MN*M element finite pseudometric space with Hamming distance.

  4. 4.

    Build embeddings from this pseudometric space and compute the percentage of mutation lists for which the nearest neighbor of the last DNA sequence in that list among the set of all original sequences is the first sequence in that list (the accuracy).

A manifold learning algorithm that performs well on this task will need to take advantage of the intermediate mutation states to recognize that the first state and final state in a mutation list should be embedded as close together as possible. We can follow the procedure in Section 2.1 to adapt the Metric Multidimensional Scaling algorithm ๐‘€๐ท๐‘†โˆ˜โ„ณโ€‹โ„’\mathit{MDS}\circ\mathcal{ML} (Section 2.3.1) into such an algorithm by forming the maximally interconnected functor ๐‘€๐ท๐‘†โˆ˜๐’ฎโ€‹โ„’\mathit{MDS}\circ\mathcal{SL}. Intuitively, this functor maps (X,dX)(X,d_{X}) to a loss function that corresponds to the optimization objective for Metric Multidimensional Scaling where Euclidean distance is replaced with:

dXโˆ—โ€‹(x,xโ€ฒ)=inf{ฮด|โˆƒx=x1,x2,โ€ฆ,xn=xโ€ฒโˆˆX,dXโ€‹(xi,xi+1)โ‰คฮด}\displaystyle d^{*}_{X}(x,x^{\prime})=\inf\{\delta\ |\ \exists x=x_{1},x_{2},...,x_{n}=x^{\prime}\in X,d_{X}(x_{i},x_{i+1})\leq\delta\}

We call this the Single Linkage Scaling algorithm (Algorithm 1). Since this algorithm embeds points that are connected by a sequence in the original space as close together as possible, we expect Single Linkage Scaling to outperform Metric Multidimensional Scaling on this task. This is exactly what we see in Table 4. We also show the embeddings for each sequence in each list in Figure 1.

Algorithm 1 Single Linkage Scaling
1:procedureย SingleLinkageScaling(((X,dX),m(X,d_{X}),m))
2:ย ย ย ย ย Initialize the |X|ร—|X||X|\times|X| matrix BB to all zeros
3:ย ย ย ย ย โˆ€i,jโ‰ค|X|\forall i,j\leq|X|
4:ย ย ย ย ย Biโ€‹j=inf{ฮด|โˆƒxi=x1,x2,โ€ฆ,xn=xjโˆˆX,dXโ€‹(xk,xk+1)โ‰คฮด}B_{ij}=\inf\{\delta\ |\ \exists x_{i}=x_{1},x_{2},...,x_{n}=x_{j}\in X,d_{X}(x_{k},x_{k+1})\leq\delta\}
5:ย ย ย ย ย Aโ†minAโˆˆ๐Œ๐š๐ญ|X|,mโ€‹โˆ‘i,jโˆˆ1โ€‹โ€ฆโ€‹|X|(โ€–Aiโˆ’Ajโ€–โˆ’Biโ€‹j)2A\leftarrow\min_{A\in\mathbf{Mat}_{|X|,m}}\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(\|A_{i}-A_{j}\|-B_{ij})^{2}
6:ย ย ย ย ย return AA
Refer to caption
Figure 1: Embeddings of DNA sequences from the DNA recombination task with L=1000,N=100,M=10L=1000,N=100,M=10. Each color indicates a unique DNA sequence mutation list. Note that Single Linkage Scaling (๐‘€๐ท๐‘†โˆ˜๐’ฎโ€‹โ„’\mathit{MDS}\circ\mathcal{SL}) on the right embeds sequences in the same mutation list more closely together than Metric Multidimensional Scaling (๐‘€๐ท๐‘†โˆ˜โ„ณโ€‹โ„’\mathit{MDS}\circ\mathcal{ML}) on the left.
Algorithm Accuracy
N=100N=100
Accuracy
N=100N=100
Accuracy
N=200N=200
Accuracy
N=200N=200
M=20M=20
Metric Multidimensional Scaling
Embedding Size 2 0.21 (ยฑ\pm 0.05) 0.01 (ยฑ\pm 0.02) 0.29 (ยฑ\pm 0.02) 0.01 (ยฑ\pm 0.00)
Single Linkage Scaling
Embedding Size 2 0.61 (ยฑ\pm 0.02) 0.68 (ยฑ\pm 0.05) 0.76 (ยฑ\pm 0.01) 0.32 (ยฑ\pm 0.02)
Metric Multidimensional Scaling
Embedding Size 5 0.74 (ยฑ\pm 0.01) 0.13 (ยฑ\pm 0.02) 0.84 (ยฑ\pm 0.01) 0.04 (ยฑ\pm 0.01)
Single Linkage Scaling
Embedding Size 5 0.93 (ยฑ\pm 0.05) 0.91 (ยฑ\pm 0.02) 0.96 (ยฑ\pm 0.02) 0.34 (ยฑ\pm 0.02)
Table 1: Accuracy on the DNA recombination task of the Metric Multidimensional Scaling (๐‘€๐ท๐‘†โˆ˜โ„ณโ€‹โ„’\mathit{MDS}\circ\mathcal{ML}) and Single Linkage Scaling (๐‘€๐ท๐‘†โˆ˜๐’ฎโ€‹โ„’\mathit{MDS}\circ\mathcal{SL}) algorithms (higher numbers are better). The accuracy is the percentage of the NN mutation lists of length MM for which the nearest neighbor of the last sequence in the list among the set of all original DNA sequences is the first sequence in that list. The reported numbers are means (and standard deviations) across 1010 simulations. All DNA sequences are of length L=1000L=1000.

5 Discussion and Future Work

We have taken the first steps towards a categorical framework for manifold learning. By defining an algorithm as a functor from a category of metric spaces, we can explicitly express the kind of dataset transformations under which it is equivariant. We have shown that for many popular manifold learning algorithms, including Metric Multidimensional Scaling and IsoMap, the optimization objective changes in a predictable way as we modify the metric space.

The functorial perspective also suggests a new strategy for exploratory data analysis with manifold learning. Since we can decompose manifold learning algorithms into two components (clustering and loss) we can examine how slight variations of the clustering algorithm affect the learned embeddings. We have shown in Section 2.1 that every manifold learning functor Lโˆ˜HL\circ H lies on a spectrum of interconnectedness between Lโˆ˜โ„ณโ€‹โ„’L\circ\mathcal{ML} and Lโˆ˜๐’ฎโ€‹โ„’L\circ\mathcal{SL}, and we can form new algorithms by moving along this spectrum. For example, we saw in Section 4 that replacing the โ„ณโ€‹โ„’\mathcal{ML} functor with ๐’ฎโ€‹โ„’\mathcal{SL} in the Metric Multidimensional Scaling algorithm substantially changes the learned embeddings and improves performance on a DNA recombination task. There are also many algorithms that lie between these two options, including the kk-Path Scaling and kk-Vertex-Connected Scaling algorithms that we introduced in Section 2.3.

Another major benefit of expressing algorithms as functors is that functors preserve categorical properties like interleaving distance. This allows us to easily reason about the stability properties of both existing algorithms and new algorithms that we create by recombining functors. Other authors have used this strategy to prove stability properties of the homology of geometric filtered complexes [11]. In Section 3 we have used this strategy to define bounds on how dataset noise affects optimization quality. In future work we hope to use these techniques to derive more powerful theorems around the resilience of other kinds of unsupervised or supervised algorithms to noise. For example, we may also be able to tighten our bounds by switching our perspective from finite metric spaces to distributions [7] or even involving categorical probability [15] to replace interleaving distance with a probabilistic analog. Due to the simplicity and flexibility of this strategy, other researchers have begun to develop more flexible characterizations of interleaving distance that we can apply in even more situations [20].

References

  • [1]
  • [2] Hervรฉ Abdi (2007): Metric multidimensional scaling (MDS): analyzing distance matrices. Encyclopedia of Measurement and Statistics. SAGE Publications.
  • [3] Hervรฉ Abdi & Lynneย J Williams (2010): Principal component analysis. Wiley interdisciplinary reviews: computational statistics 2(4), pp. 433โ€“459, 10.1002/wics.101.
  • [4] Stephen Bailey (2012): Principal Component Analysis with Noisy and/or Missing Data. Publications of the Astronomical Society of the Pacific 124(919), pp. 1015โ€“1023, 10.1086/668105. Available at http://www.jstor.org/stable/10.1086/668105.
  • [5] Mukund Balasubramanian (2002): The isomap algorithm and topological stability. Science 295(5552), 10.1126/science.295.5552.7a.
  • [6] Andrewย J Blumberg & Michael Lesnick (2017): Universality of the homotopy interleaving distance. arXiv preprint arXiv:1705.01690.
  • [7] Adam Brown, Omer Bobrowski, Elizabeth Munch & Bei Wang (2020): Probabilistic convergence and stability of random mapper graphs. Journal of Applied and Computational Topology, pp. 1โ€“42, 10.1007/s41468-020-00063-x.
  • [8] Peter Bubenik & Jonathanย A Scott (2014): Categorification of persistent homology. Discrete & Computational Geometry 51(3), pp. 600โ€“627, 10.1007/s00454-014-9573-x.
  • [9] Gunnar Carlsson & Facundo Mรฉmoli (2013): Classifying clustering schemes. Foundations of Computational Mathematics, 10.1007/s10208-012-9141-9.
  • [10] Frรฉdรฉric Chazal, David Cohen-Steiner, Marc Glisse, Leonidasย J Guibas & Steveย Y Oudot (2009): Proximity of persistence modules and their diagrams. In: Proceedings of the twenty-fifth annual symposium on Computational geometry, pp. 237โ€“246, 10.1145/1542362.1542407.
  • [11] Frรฉdรฉric Chazal, Vin Deย Silva & Steve Oudot (2014): Persistence stability for geometric complexes. Geometriae Dedicata 173(1), pp. 193โ€“214, 10.1007/s10711-013-9937-z.
  • [12] Jared Culbertson, Danย P Guralnik, Jakob Hansen & Peterย F Stiller (2016): Consistency constraints for overlapping data clustering. arXiv preprint arXiv:1608.04331.
  • [13] Jared Culbertson, Danย P Guralnik & Peterย F Stiller (2018): Functorial hierarchical clustering with overlaps. Discrete Applied Mathematics 236, pp. 108โ€“123, 10.1016/j.dam.2017.10.015.
  • [14] Brendan Fong & Davidย I. Spivak (2019): An Invitation to Applied Category Theory: Seven Sketches in Compositionality. Cambridge University Press, 10.1017/9781108668804.
  • [15] Tobias Fritz (2020): A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics 370, p. 107239, 10.1016/j.aim.2020.107239.
  • [16] Samuel Gerber, Tolga Tasdizen & Ross Whitaker (2007): Robust non-linear dimensionality reduction using successive 1-dimensional Laplacian eigenmaps. In: Proceedings of the 24th international conference on Machine learning, pp. 281โ€“288, 10.1145/1273496.1273532.
  • [17] Jonย M Kleinberg (2003): An impossibility theorem for clustering. Advances in Neural Information Processing Systems, 10.5555/2968618.2968676.
  • [18] Tom Leinster (2016): Basic Category Theory. Cambridge University Press, 10.1017/CBO9780511525858.
  • [19] Leland McInnes, John Healy, Nathaniel Saul & Lukas GroรŸberger (2018): UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software 3(29), p. 861, 10.21105/joss.00861.
  • [20] Luisย N Scoccola (2020): Locally Persistent Categories And Metric Properties Of Interleaving Distances. PhD Thesis at Western University (Ontario). Available at https://ir.lib.uwo.ca/etd/7119/.
  • [21] Dan Shiebler (2020): Functorial Clustering via Simplicial Complexes. Topological Data Analysis and Beyond Workshop at NeurIPS 2020. Available at https://openreview.net/pdf?id=ZkDLcXCP5sV.
  • [22] Davidย I Spivak (2012): Metric realization of fuzzy simplicial sets. Self published notes. Available at http://math.mit.edu/~dspivak/files/metric_realization.pdf.
  • [23] Joshuaย B Tenenbaum, Vin Deย Silva & Johnย C Langford (2000): A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), pp. 2319โ€“2323, 10.1126/science.290.5500.2319.