Functorial Manifold Learning

Dan Shiebler

Abstract

We adapt previous research on category theory and topological unsupervised learning to develop a functorial perspective on manifold learning, also known as nonlinear dimensionality reduction. We first characterize manifold learning algorithms as functors that map pseudometric spaces to optimization objectives and that factor through hierarchical clustering functors. We then use this characterization to prove refinement bounds on manifold learning loss functions and construct a hierarchy of manifold learning algorithms based on their equivariants. We express several popular manifold learning algorithms as functors at different levels of this hierarchy, including Metric Multidimensional Scaling, IsoMap, and UMAP. Next, we use interleaving distance to study the stability of a broad class of manifold learning algorithms. We present bounds on how closely the embeddings these algorithms produce from noisy data approximate the embeddings they would learn from noiseless data. Finally, we use our framework to derive a set of novel manifold learning algorithms, which we experimentally demonstrate are competitive with the state of the art.

1 Introduction

Suppose we have a finite pseudometric space $(X,d_{X})$ that we assume has been sampled from some larger space $\mathbf{X}$ according to some probability measure $\mu_{\mathbf{X}}$ over $\mathbf{X}$ . Manifold Learning algorithms like Isomap [23], Metric Multidimensional Scaling [2], and UMAP [19] construct $\mathbb{R}^{m}$ -embeddings for the points in $X$ , which we interpret as coordinates for the support of $\mu_{\mathbf{X}}$ . These techniques are based on the assumption that this support can be well-approximated with a manifold. In this paper we use functoriality, a basic concept from Category Theory, to explore two aspects of manifold learning algorithms:

•

Equivariance: A manifold learning algorithm is equivariant to a transformation if applying that transformation to its inputs results in a corresponding transformation of its outputs. Understanding the equivariance of a transform lets us understand how it will behave on new types of data.
•

Stability: The stability of a manifold learning algorithm captures how well the embeddings it generates on noisy data approximate the embeddings it would generate on noiseless data. Understanding the stability of a transform helps us predict its performance on real-world applications.

1.1 Functoriality

In order for a manifold learning algorithm to be useful, the properties of the embeddings that the algorithm derives from $(X,d_{X})$ must be somewhat in line with the structure of $(X,d_{X})$ . One way to formalize this is to cast the algorithm as a functor between categories. A category is a collection of objects and morphisms between them. Morphisms are closed under an associative composition operation, and each object is equipped with an identity morphism. An example of a category is the collection of sets (objects) and functions (morphisms) between them.

A functor is a mapping between categories that preserves identity morphisms and morphism composition. Underlying this straightforward definition is a powerful concept: the functoriality of a transformation is a blueprint for its structure, expressed in terms of the equivariants it preserves. If a given transformation is functorial over some pair of categories, then the transformation preserves the structure represented by those categories’ morphisms. By identifying the settings under which an algorithm is functorial, we can derive extensions of the algorithm that preserve functoriality and identify modifications that break it. See “Basic Category Theory” [18] or “Seven Sketches in Compositionality” [14] for more details on categories and functoriality.

1.2 Summary of Contributions

In Section 2, we demonstrate that manifold learning algorithms can be expressed as optimization problems defined on top of hierarchical overlapping clusterings. That is, we can express these algorithms in terms of the composition of:

•

a strategy for clustering points at different distance scales
•

an order-preserving transformation from a clustering of points to a loss function

We formalize this relationship in terms of the composition of functors between categories of pseudometric spaces, hierarchical overlapping clusterings, and optimization problems. This allows us to formally extend clustering theory into manifold learning theory.

In Section 2.1 we build on clustering theory to demonstrate that every manifold learning objective lies on a spectrum based on the criterion by which embeddings are penalized for being too close together or too far apart. In Section 2.2 we introduce a hierarchy of manifold learning algorithms and categorize algorithms based on the dataset transformations over which they are equivariant. In Section 2.3 we provide several examples of this categorization. We show that UMAP is equivariant to isometries and both IsoMap and Metric Multidimensional Scaling are equivariant to surjective non-expansive maps. Identifying these equivariants is helpful for identifying the circumstances under which we would expect our algorithms to generalize. For example, while adding points to a dataset will modify the IsoMap objective in a predictable way, this will completely disrupt the UMAP objective. This is caused by the fact that UMAP uses a local distance rescaling procedure that is density-sensitive, and is therefore not equivariant to injective or surjective non-expansive maps.

In Section 3 we use interleaving distance to study the stability of a broad class of manifold learning algorithms. We present novel bounds on how well the embeddings that algorithms in this class learn on noisy data approximate the embeddings they learn on noiseless data.

In Section 4 we build on this theory to describe a strategy for deriving novel algorithms from existing manifold learning algorithms. As an example we derive the Single Linkage Scaling algorithm, which projects samples in the same connected component of the Rips complex as close together as possible. We also present experimental results demonstrating the efficacy of this algorithm.

1.3 Related Work

Several authors have explored functorial perspectives on clustering algorithms. Carlsson et al. [9] introduce clustering functors that map metric spaces to partitionings, whereas Culbertson et al. [12] take a slightly broader scope and also explore overlapping clustering functors that map metric spaces to coverings. Both approaches demonstrate that metric space categories with fewer morphisms permit a richer class of clustering functors. For example, while the single linkage clustering algorithm is functorial over the full category of metric spaces and non-expansive maps, density-sensitive clustering algorithms like robust single linkage are only functorial over the subcategory of metric spaces and injective non-expansive maps. In order to get around the Kleinberg Impossibility Theorem [17], which states that any scale invariant flat clustering must sacrifice either surjectivity or a notion of consistency, several authors [9, 13, 20] also explore hierarchical clustering functors that map metric spaces to multi-scale dataset partitionings or covers. Shiebler [21] builds on this perspective to factor clustering functors through a category of simplicial complexes.

Manifold learning shares structure with hierarchical clustering, and some authors have begun applying categorical ideas to manifold learning. For example, McInnes et al. [19] introduce the UMAP manifold learning algorithm in terms of Spivak’s fuzzy simplicial sets [22], which are a categorical analog of simplicial filtrations.

In Section 3 we study the stability of manifold learning algorithms to dataset noise. Due to the importance of this topic, many other authors have researched the stability properties of manifold learning algorithms. For example, Baily [4] explores adaptations of PCA to noisy data, and Gerber et al. [16] demonstrate that Laplacian Eigenmaps has nicer stability properties than IsoMap. However, we believe that ours is the first work that uses interleaving distance to formalize a stability property.

1.4 Preliminaries on Functorial Hierarchical Overlapping Clustering

We briefly review some definitions from the functorial perspective on hierarchical overlapping clustering. For more details, see Shiebler [21]. Given a set $X$ , a non-nested flag cover $\mathcal{C}_{X}$ of $X$ is a cover of $X$ such that: (1) if $S_{1},S_{2}\in\mathcal{C}_{X}$ and $S_{1}\subseteq S_{2}$ , then $S_{1}=S_{2}$ , (2) the simplicial complex with vertices corresponding to the elements of $X$ and faces to all finite subsets of the sets in $\mathcal{C}_{X}$ is a flag complex, or a simplicial complex that can be expressed as the clique of its $1$ -skeleton. Intuitively, $\mathcal{C}_{X}$ is a non-nested flag cover if there exists a graph whose cliques are the sets in $\mathcal{C}_{X}$ . The category $\mathbf{Cov}$ has tuples $(X,\mathcal{C}_{X})$ as objects where $\mathcal{C}_{X}$ is a non-nested flag cover of the finite set $X$ . The morphisms between $(X,\mathcal{C}_{X})$ and $(Y,\mathcal{C}_{Y})$ are functions $f:X\rightarrow Y$ where for any set $S$ in $\mathcal{C}_{X}$ there exists some set $S^{\prime}$ in $\mathcal{C}_{Y}$ such that $f(S)\subseteq S^{\prime}$ .

Next, we will represent datasets with the category $\mathcal{\mathbf{PMet}}$ of finite pseudometric spaces and non-expansive maps between them. Given a subcategory $\mathbf{D}$ of $\mathcal{\mathbf{PMet}}$ , a flat $\mathbf{D}$ -clustering functor is a functor $C:\mathbf{D}\rightarrow\mathbf{Cov}$ that is the identity on the underlying set. Intuitively, a flat $\mathbf{D}$ -clustering functor maps a dataset $(X,d_{X})$ to a cover of the set $X$ in such a way that increasing the distances between points in $X$ may cause clusters to separate.

A fuzzy non-nested flag cover is a functor $F_{X}:(0,1]^{op}\rightarrow\mathbf{Cov}$ such that for any morphism $a\leq a^{\prime}$ in $(0,1]$ , the morphism $F_{X}(a\leq a^{\prime})$ is the identity on the underlying set. In the category $\mathbf{F}\mathbf{Cov}$ objects are fuzzy non-nested covers and morphisms are natural transformations between them. Given a subcategory $\mathbf{D}$ of $\mathcal{\mathbf{PMet}}$ a hierarchical $\mathbf{D}$ -clustering functor is a functor $H:\mathbf{D}\rightarrow\mathbf{F}\mathbf{Cov}$ such that for $a\in(0,1]$ , $H(-)(a):\mathcal{\mathbf{PMet}}\rightarrow\mathbf{Cov}$ is a flat $\mathbf{D}$ -clustering functor. Intuitively, a hierarchical $\mathbf{D}$ -clustering functor maps a pair of a dataset $(X,d_{X})$ and a strength $a\in(0,1]$ to a cover of the set $X$ in such a way that increasing the distances between points in $X$ or increasing the strength $a$ may cause clusters to separate.

2 Manifold Learning

Given a choice of $m\in\mathbb{N}$ , a manifold learning algorithm constructs an $n\times m$ real valued matrix of embeddings in $\mathbf{Mat}_{n,m}=\mathbb{R}^{n\times m}$ from a finite pseudometric space with $n$ points. In this work we focus on algorithms that operate by solving embedding optimization problems, or tuples $(n,m,l)$ where $l:\mathbf{Mat}_{n,m}\rightarrow\mathbb{R}$ is a loss function, or a penalty term. We call the set of all $A\in\mathbf{Mat}_{n,m}$ that minimize $l(A)$ the solution set of the embedding optimization problem. In particular, we focus on pairwise embedding optimization problems, or embedding optimization problems where $l$ can be expressed as a sum of pairwise terms $l_{{}_{ij}}:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}$ such that $l(A)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}l_{{}_{ij}}(\|A_{i}-A_{j}\|)$ . Intuitively the terms $l_{{}_{ij}}$ act as a penalty on bad embeddings. We express such a pairwise embedding optimization problem with the tuple $(n,m,\{l_{{}_{ij}}\})$ .

Formally we define a manifold learning problem to be a function that maps the pseudometric space $(X,d_{X})$ to a pairwise embedding optimization problem of the form $(|X|,m,\{l_{{}_{ij}}\})$ . Note that this definition does not include any specification of how the optimization problem is solved (such as gradient descent or reduction to an eigenproblem). For example, the Metric Multidimensional Scaling manifold learning problem maps the pseudometric space $(X,d_{X})$ to $(|X|,m,\{l_{{}_{ij}}\})$ where $l_{{}_{ij}}(\delta)=(d_{X}(x_{i},x_{j})-\delta)^{2}$ . Optimizing this objective involves finding a matrix $A$ that minimizes $\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(d_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2}$ . That is, the Euclidean distance matrix of the rows of the optimal $A$ is as close as possible to the $d_{X}$ distance matrix.

If a manifold learning problem maps isometric pseudometric spaces to embedding optimization problems with the same solution set, we call it isometry-invariant. Intuitively, isometry-invariant manifold learning algorithms do not change their output based the ordering of $X$ . A particularly useful property of isometry-invariant manifold learning problems is that they factor through hierarchical clustering algorithms.

Proposition 1.

Given any isometry-invariant manifold learning problem $M$ , there exists a manifold learning problem $L\circ H$ , where $H$ is a hierarchical overlapping clustering algorithm (as defined by Shiebler [21]) and $L$ is a function that maps the output of $H$ to an embedding optimization problem, such that the solution spaces of the images of $M$ and $L\circ H$ on any pseudometric space $(X,d_{X})$ are identical.

Intuitively, Proposition 1 holds because manifold learning problems generate loss functions by grouping points in the finite pseudometric space together. In order to use this property to adapt clustering theorems into manifold learning theorems we will introduce a target category of pairwise embedding optimization problems and replace functions with functors from $\mathcal{\mathbf{PMet}}$ into this category. To start, consider the category $\mathbf{L}$ :

Definition 2.1.

The objects in $\mathbf{L}$ are tuples $(n,\{l_{{}_{ij}}\})$ where $n$ is a natural number and $l_{{}_{ij}}:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}$ is a real-valued function that satisfies $l_{i^{\prime}j^{\prime}}(x)=0$ for $i^{\prime}>n$ or $j^{\prime}>n$ . $\mathbf{L}$ is a preorder where $(n,\{l_{{}_{ij}}\})\leq(n^{\prime},\{l_{{}_{ij}}^{\prime}\})$ iff for any $x\in\mathbb{R}_{\geq 0},i,j\in\mathbb{N}$ we have $l_{{}_{ij}}(x)\leq l_{{}_{ij}}^{\prime}(x)$ .

Given a choice of $m$ we can view the object $(n,\{l_{{}_{ij}}\})$ in $\mathbf{L}$ as a pairwise embedding optimization problem. However, $\mathbf{L}$ is not quite expressive enough to serve as our target category. Recall the Metric Multidimensional Scaling manifold learning problem, which maps the pseudometric space $(X,d_{X})$ to the pairwise embedding optimization problem $(|X|,m,\{l_{{}_{ij}}\})$ where $l_{{}_{ij}}(\delta)=(d_{X}(x_{i},x_{j})-\delta)^{2}$ . Since $l_{{}_{ij}}$ does not vary monotonically with $d_{X}$ , it is clear that this manifold learning problem is not a functor from $\mathcal{\mathbf{PMet}}$ to $\mathbf{L}$ . However, note that we can express $l_{{}_{ij}}(A)$ as the sum of a term that increases monotonically in $d_{X}(x_{i},x_{j})$ and a term that decreases monotonically in $d_{X}(x_{i},x_{j})$ :

\displaystyle l_{{}_{ij}}(\delta)=(d_{X}(x_{i},x_{j})-\delta)^{2}=\left(\delta^{2}+d_{X}(x_{i},x_{j})^{2}\right)-\left(2\delta d_{X}(x_{i},x_{j})\right)

We will see in Section 2.3 that the embedding optimization problems associated with many common manifold learning algorithms decompose similarly. We can build on this insight to create a new category $\mathbf{Loss}$ with the following pullback:

where $U$ is the forgetful functor that maps $(n,\{l_{{}_{ij}}\})$ to $n$ . Intuitively $\mathbf{Loss}$ is a subcategory of $\mathbf{L}^{op}\times\mathbf{L}$ and we can write the objects in $\mathbf{Loss}$ as tuples $(n,\{c_{{}_{ij}},e_{{}_{ij}}\})$ where $(n,\{c_{{}_{ij}},e_{{}_{ij}}\})\leq(n^{\prime},\{c_{{}_{ij}}^{\prime},e_{{}_{ij}}^{\prime}\})$ iff for any $x\in\mathbb{R}_{\geq 0},i,j\in\mathbb{N}$ we have $c_{{}_{ij}}^{\prime}(x)\leq c_{{}_{ij}}(x)$ and $e_{{}_{ij}}(x)\leq e_{{}_{ij}}^{\prime}(x)$ . Given a choice of $m$ , each object $(n,\{c_{{}_{ij}},e_{{}_{ij}})$ in $\mathbf{Loss}$ corresponds to the pairwise embedding optimization problem $(n,m,\{l_{{}_{ij}}\})$ where $l_{{}_{ij}}(\delta)=c_{{}_{ij}}(\delta)+e_{{}_{ij}}(\delta)$ .

Similarly to the representation of hierarchical clustering algorithms as maps into a category $\mathbf{F}\mathbf{Cov}$ of functors $(0,1]^{\text{op}}\rightarrow\mathbf{Cov}$ , we will represent manifold learning algorithms as mapping into a category $\mathbf{F}\mathbf{Loss}$ of functors $(0,1]^{\text{op}}\rightarrow\mathbf{Loss}$ . The objects in $\mathbf{F}\mathbf{Loss}$ are functors $F:(0,1]^{\text{op}}\rightarrow\mathbf{Loss}$ that commute with the forgetful functor that maps $(n,\{c_{{}_{ij}},e_{{}_{ij}}\})$ to $n$ , and the morphisms are natural transformations. We call $n$ the cardinality of $F$ . We can define a functor $Flatten:\mathbf{F}\mathbf{Loss}\rightarrow\mathbf{Loss}$ which maps the functor $F$ where $F(a)=(n,\{c_{{F(a)}_{ij}},e_{{F(a)}_{ij}}\})$ to the tuple $(n,\{c_{{}_{ij}},e_{{}_{ij}}\})$ where:

\displaystyle c_{{}_{ij}}(x)=\int_{a\in(0,1]}c_{{F(a)}_{ij}}(x)\ da\qquad e_{{}_{ij}}(x)=\int_{a\in(0,1]}e_{{F(a)}_{ij}}(x)\ da

Therefore, each functor $F\in\mathbf{F}\mathbf{Loss}$ with cardinality $n$ corresponds to the pairwise embedding optimization problem $(n,m,\{l_{{F}_{ij}}\})$ where $l_{{F}_{ij}}(\delta)=\int_{a\in(0,1]}c_{{F(a)}_{ij}}(\delta)+e_{{F(a)}_{ij}}(\delta)\ da$ . We will call the sum of these terms, $\mathbf{l}_{F}(A)$ , the $F$ -loss:

\displaystyle\mathbf{l}_{F}(A)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}l_{{F}_{ij}}(\|A_{i}-A_{j}\|)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}\int_{a\in(0,1]}c_{{F(a)}_{ij}}(\|A_{i}-A_{j}\|)+e_{{F(a)}_{ij}}(\|A_{i}-A_{j}\|)\ da

We can now give our definition of a manifold learning functor:

Definition 2.2.

Suppose $\mathcal{\mathbf{PMet}}$ is the category of pseudometric spaces and non-expansive maps and $\mathbf{F}\mathbf{Cov}$ is the category of fuzzy non-nested flag covers and natural transformations (see Section 1.4). Then given the subcategories $\mathbf{D}\subseteq\mathcal{\mathbf{PMet}},\mathbf{D}^{\prime}\subseteq\mathbf{F}\mathbf{Cov}$ , the composition $L\circ H:\mathbf{D}\rightarrow\mathbf{F}\mathbf{Loss}$ forms a $\mathbf{D}$ -manifold learning functor if $H:\mathbf{D}\rightarrow\mathbf{D}^{\prime}$ is a hierarchical $\mathbf{D}$ -clustering functor and $L:\mathbf{D}^{\prime}\rightarrow\mathbf{Loss}$ is a functor that maps a fuzzy non-nested flag cover with vertex set $X$ to some $F_{X}\in\mathbf{F}\mathbf{Loss}$ with cardinality $|X|$ .

Intuitively a manifold learning functor $\mathbf{D}\xrightarrow{H}\mathbf{D}^{\prime}\xrightarrow{L}\mathbf{F}\mathbf{Loss}$ factors through a hierarchical clustering functor and sends $(X,d_{X})$ to $F$ where $F(a)=(|X|,\{c_{{F(a)}_{ij}},e_{{F(a)}_{ij}}\})$ . We will say that $M=L\circ H$ is in standard form if $M$ maps the one-point metric space $(\{*\},0)$ to some $F$ where $c_{{F(a)}_{ij}}(x)=e_{{F(a)}_{ij}}(x)=0$ and $\forall\epsilon,\delta\in\mathbb{R}_{\geq 0},H(X,d_{X}+\epsilon)(-\log(\delta))\simeq H(X,d_{X})(-\log(\delta+\epsilon))$ . Each manifold learning functor corresponds to a manifold learning problem that maps $(X,d_{X})$ to $(|X|,m,\mathbf{l}_{M(X,d_{X})})$ .

2.1 A Spectrum of Manifold Learning Functors

Recall the single and maximal linkage hierarchical overlapping clustering algorithms $\mathcal{SL}$ and $\mathcal{ML}$ . These algorithms map the pseudometric space $(X,d_{X})$ to the fuzzy non-nested flag cover $(X,\mathcal{C}_{X_{a}})$ where $\mathcal{C}_{X_{a}}$ is respectively the connected components of the $-\log(a)$ -Vietoris-Rips complex of $(X,d_{X})$ and the maximally linked sets of the relation $R_{a}$ in which $x_{1}R_{a}x_{2}$ if $d_{X}(x_{1},x_{2})\leq-\log(a)$ [21, 12]. If we apply functoriality to Proposition 6 in Shiebler [21] we see:

Proposition 2.

Suppose $\mathbf{D}$ is a subcategory of $\mathcal{\mathbf{PMet}}$ such that $\mathcal{\mathbf{PMet}}_{\mathit{bij}}\subseteq\mathbf{D}$ , $L\circ H$ is a $\mathbf{D}$ -manifold learning functor such that $H$ is non-trivial [21, 12] and for all $a\in(0,1]$ , the functor $H(-)(a):\mathbf{D}\rightarrow\mathbf{Cov}$ has clustering parameter $\delta_{H,a}$ . Then for $a\in(0,1]$ and $(X,d_{X})\in\mathbf{D}$ we have maps:

\displaystyle(L\circ\mathcal{ML})(X,d_{X})(e^{-\delta_{H,a}})\leq(L\circ H)(X,d_{X})(a)\leq(L\circ\mathcal{SL})(X,d_{X})(e^{-\delta_{H,a}})

(1)

that are natural in $a$ and $(X,d_{X})$ .

Intuitively, this proposition states that every manifold learning functor maps $(X,d_{X})$ to a loss that is both no more interconnected than the loss that does not distinguish points within the same connected component of the Vietoris-Rips complex and no less interconnected than the loss that treats each pair of points independently.

There are many manifold learning functors that lie between these extremes. In particular, for any functor $L:\mathcal{\mathbf{PMet}}_{\mathit{inj}}\rightarrow\mathbf{Loss}$ and sequence of clustering functors $\mathcal{ML},H_{1},H_{2},...,H_{n},\mathcal{SL}$ whose outputs refine each other, we can apply functoriality to form a sequence of manifold learning functors $L\circ\mathcal{ML}\leq L\circ H_{1}\leq...\leq L\circ H_{n}\leq L\circ\mathcal{SL}$ . For example, consider the family $\mathcal{L}_{k}$ of hierarchical overlapping clustering functors from Culbertson et al. [12]: for $k\in\mathbb{N}$ , the cover $\mathcal{L}_{k}(X,d_{X})(a)$ is the maximal linked sets of the relation $R_{a}$ where $xR_{a}x^{\prime}$ if there is a sequence $x=x_{1},x_{2}...,x_{k-1},x_{k}=x^{\prime}$ in $X$ where $d(x_{i},x_{i+1})\leq-\log(a)$ . The functor $L\circ\mathcal{L}_{k}$ therefore maps $(X,d_{X})$ to a loss that distinguishes only between points whose shortest path in the Vietoris-Rips complex is longer than $k$ . For $k>1$ this loss is more interconnected than $L\circ\mathcal{ML}$ and less interconnected than $L\circ\mathcal{SL}$ . This also suggests a recipe for generating new manifold learning algorithms (see Section 4): first express an existing manifold learning problem in the form $L\circ H$ , and then form $L\circ\mathcal{SL},L\circ\mathcal{ML}$ , or any of the functors along the spectrum $L\circ\mathcal{L}_{k}$ .

2.2 Characterizing Manifold Learning Problems

Similarly to how Carlsson et al. [9] characterize clustering algorithms in terms of their functoriality over different subcategories of pseudometric spaces, we can characterize manifold learning algorithms based on the subcategory $\mathbf{D}\subseteq\mathcal{\mathbf{PMet}}$ over which they are functorial.

We have already introduced isometry-invariant manifold learning problems. Any $\mathcal{\mathbf{PMet}}_{\mathit{isom}}$ -manifold learning functor is isometry-invariant, and an isometry-invariant manifold learning problem is expansive-contractive if the loss that it aims to minimize decomposes into the sum of an expansive term $e$ that decreases as distances increase and a contractive term $c$ that increases as distances increase. Intuitively, expansive-contractive manifold learning problems use the term $e$ to push together points that are close in the original space and use the term $c$ to push apart points that are far in the original space. Any $\mathcal{\mathbf{PMet}}_{\mathit{bij}}$ -manifold learning functor is expansive-contractive.

An expansive-contractive manifold learning problem is positive extensible if $c$ increases and $e$ decreases when we add new points to increase $|X|$ . If instead $c$ decreases and $e$ increases when we add new points to increase $|X|$ , we say it is negative extensible. Intuitively, many positive-extensible manifold learning problems are minmax problems that aim to simultaneously minimize $|c|$ and maximize $|e|$ . Any $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ -manifold learning functor is positive extensible and any $\mathcal{\mathbf{PMet}}_{\mathit{inj}}$ -manifold learning functor is negative extensible.

Proposition 3.

Suppose $M$ is a standard form $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ -manifold learning functor and $M^{\prime}$ is a standard form $\mathcal{\mathbf{PMet}}_{\mathit{inj}}$ -manifold learning functor. Then for any $(X,d_{X})$ and $a\in(0,1]$ we have that $e_{{M(X,d_{X})(a)}_{ij}}$ , $c_{{M^{\prime}(X,d_{X})(a)}_{ij}}$ are non-positive and $c_{{M(X,d_{X})(a)}_{ij}}$ , $e_{{M^{\prime}(X,d_{X})(a)}_{ij}}$ are non-negative.

In the next section we show examples of manifold learning algorithms in each of these categories.

2.3 Examples

2.3.1 Metric Multidimensional Scaling ( $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ -Manifold Learning Functor)

The most straightforward strategy for learning embeddings is to minimize the difference between the pairwise distance matrix of the original space and the pairwise Euclidean distance matrix of the learned embeddings. The Metric Multidimensional Scaling algorithm [2] does exactly this. Given a finite pseudometric space $(X,d_{X})$ , the Metric Multidimensional Scaling embedding optimization problem is $(|X|,m,l)$ where $l(A)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}(d_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2}$ . When the distance matrix of the pseudometric space is double-centered (i.e. mean values of rows/columns are zero) this is the same objective that Principal Components Analysis (PCA) optimizes [3]. Now define $\mathit{MDS}:\mathbf{F}\mathbf{Cov}_{{sur}}\rightarrow\mathbf{F}\mathbf{Loss}$ to map the fuzzy non-nested flag cover $H:(0,1]^{\text{op}}\rightarrow\mathbf{Cov}_{{inj}}$ with vertex set $X$ to $F:(0,1]^{\text{op}}\rightarrow\mathbf{Loss}$ where $F(a)=(|X|,\{c_{{F(a)}_{ij}},e_{{F(a)}_{ij}}\},0)$ and:

	$\displaystyle c_{{F(a)}_{ij}}(x)=\begin{cases}x^{2}&\exists S\in H(a),\ x_{i},x_{j}\in S\})\\ x^{2}+2x^{2}\left(1/W_{ij}-1/a\right)&\text{else}\end{cases}$
	$\displaystyle e_{{F(a)}_{ij}}(x)=\begin{cases}0&\exists S\in H(a),\ x_{i},x_{j}\in S\})\\ \frac{2\log(W_{ij})}{W_{ij}}-\frac{2\log(a)}{a}&\text{else}\end{cases}$

where:

\displaystyle W_{ij}=\sup_{\geq 0}\{a\ |\ a\in(0,1],\ \exists S\in H(a),\ x_{i},x_{j}\in S\}

Proposition 4.

$\mathit{MDS}$ is a functor, and $\mathit{MDS}\circ\mathcal{ML}$ is a $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ -manifold learning functor that maps the finite pseudometric space $(X,d_{X})$ to the Metric Multidimensional Scaling embedding optimization problem.

2.3.2 IsoMap ( $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ -Manifold Learning Functor)

For many real world datasets it is the case that the distances between nearby points are more reliable and less noisy than the distances between far away points. The IsoMap algorithm [23] redefines the distances between far apart points in terms of the distances between near points. Given a finite pseudometric space $(X,d_{X})$ , the IsoMap embedding optimization problem is $(|X|,m,l)$ where $l(A)=\sum_{\begin{subarray}{c}i\in 1...n\\ j\in 1...n\end{subarray}}(d^{\prime}_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2}$ such that $d^{\prime}_{X}(x_{i},x_{j})$ is the length of the shortest path between $x_{i}$ and $x_{j}$ in the graph in which there exists an edge of length $d_{X}(x,x^{\prime})$ between each pair of points $(x,x^{\prime})\in X$ with $d_{X}(x,x^{\prime})\leq\delta$ .

Proposition 5.

For any $\delta\in\mathbb{R}_{\geq 0}$ , there exists a hierarchical $\mathcal{\mathbf{PMet}}$ -clustering functor $IsoCluster_{\delta}$ such that the $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ -manifold learning functor $\mathit{MDS}\circ IsoCluster_{\delta}$ maps the finite pseudometric space $(X,d_{X})$ to the IsoMap embedding optimization problem.

2.3.3 $k$ -Path Scaling ( $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ -Manifold Learning Functor)

Given a finite pseudometric space $(X,d_{X})$ and $k\in\mathbb{N}$ , suppose $d^{\prime}_{X}(x_{i},x_{j})$ is the smallest $\delta$ such that there exists a path of $\leq k$ edges between $x_{i}$ and $x_{j}$ in the $\delta$ -Vietoris Rips complex of $(X,d_{X})$ . Then the $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ -manifold learning functor $\mathit{MDS}\circ\mathcal{L}_{k}$ maps $(X,d_{X})$ to the $k$ -Path Scaling embedding optimization problem $(|X|,m,l)$ , where $l(A)=\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(d^{\prime}_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2}$ .

2.3.4 $k$ -Vertex-Connected Scaling ( $\mathcal{\mathbf{PMet}}_{\mathit{bij}}$ -Manifold Learning Functor)

For $k\in\mathbb{N}$ the hierarchical overlapping clustering functor $\mathcal{VL}_{k}$ maps the finite pseudometric space $(X,d_{X})$ to the fuzzy non-nested flag cover $(X,\mathcal{C}_{X_{a}})$ where $\mathcal{C}_{X_{a}}$ is the set of maximal $\min(|X|,k)$ -vertex-connected subgraphs of the $-\log(a)$ -Vietoris-Rips complex of $(X,d_{X})$ . Note that $\mathcal{VL}_{1}=\mathcal{SL}$ and $lim_{k\rightarrow\infty}\mathcal{VL}_{k}=\mathcal{ML}$ . Note also that for $k>1$ the map $\mathcal{VL}_{k}$ is functorial over $\mathcal{\mathbf{PMet}}_{\mathit{inj}}$ but not all of $\mathcal{\mathbf{PMet}}$ since a non-injective map may split a $k$ -vertex-linked subgraph [12].

Now given a finite pseudometric space $(X,d_{X})$ and $k\in\mathbb{N}$ , suppose $d^{\prime}_{X}(x_{i},x_{j})$ is the smallest $\delta$ such that there exists a $\min(|X|,k)$ -vertex-connected subgraph of the $\delta$ -Vietoris-Rips complex of $(X,d_{X})$ that contains both $x_{i}$ and $x_{j}$ . Then the $\mathcal{\mathbf{PMet}}_{\mathit{bij}}$ -manifold learning functor $\mathit{MDS}\circ\mathcal{VL}_{k}$ maps $(X,d_{X})$ to the $k$ -Vertex Connected Scaling embedding optimization problem $(|X|,m,l)$ where $l(A)=\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(d^{\prime}_{X}(x_{i},x_{j})-\|A_{i}-A_{j}\|)^{2}$ . Note that unlike $\mathit{MDS}\circ\mathcal{L}_{k}$ , for $k>1$ the map $\mathit{MDS}\circ\mathcal{VL}_{k}$ is not functorial over all of $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ since $\mathcal{VL}_{k}$ is not functorial over $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ .

2.3.5 UMAP ( $\mathcal{\mathbf{PMet}}_{\mathit{isom}}$ -Manifold Learning Functor)

The UMAP algorithm builds a local uber-metric space around each point in $X$ , converts each local uber-metric space to a fuzzy simplicial complex, and minimizes a loss function based on a fuzzy union of these fuzzy simplicial complexes. Given a finite pseudometric space $(X,d_{X})$ , the UMAP embedding optimization problem is $(|X|,m,l)$ where $l$ is the fuzzy cross-entropy:

\displaystyle l(A)=\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}W_{ij}\ \log\left(\frac{W_{ij}}{e^{-\|A_{i}-A_{j}\|}}\right)+(1-W_{ij})\ \log\left(\frac{1-W_{ij}}{1-e^{-\|A_{i}-A_{j}\|}}\right)

and $W_{ij}$ is the weight of the fuzzy union of the $1$ -simplices connecting $x_{i}$ and $x_{j}$ in the Vietoris-Rips complexes formed from the $|X|$ local uber-metric spaces $(X,d_{x_{i}})$ where:

\displaystyle d_{x_{i}}(x_{j},x_{k})=\begin{cases}d_{X}(x_{j},x_{k})-\min_{l=1...n}d_{X}(x_{i},x_{l})&i=j,i=k\\ \infty&\text{else}\end{cases}

Proposition 6.

There exists a hierarchical $\mathcal{\mathbf{PMet}}_{\mathit{isom}}$ -clustering functor $\mathit{FuzzySimplex}$ that decomposes into the composition of four functors that:

1.

build a local uber-metric space around each point in $X$ ;
2.

convert each local uber-metric space to a fuzzy simplicial complex;
3.

take a fuzzy union of these fuzzy simplicial complexes;
4.

convert the resulting fuzzy simplicial complex to a fuzzy non-nested flag cover.

Proposition 7.

There exists a functor $\mathit{FCE}:\mathbf{F}\mathbf{Cov}_{{bij}}\rightarrow\mathbf{F}\mathbf{Loss}$ where $\mathit{FCE}\circ\mathit{FuzzySimplex}$ is a $\mathcal{\mathbf{PMet}}_{\mathit{isom}}$ -manifold learning functor that maps the pseudometric space $(X,d_{X})$ to the UMAP embedding optimization problem.

Since the UMAP distance rescaling procedure does not preserve non-expansive maps, even if a map from $(X,d_{X})$ to $(X^{\prime},d_{X^{\prime}})$ is non-expansive, this will not necessarily be the case for all of the local uber-metric spaces $(X,d_{x_{i}})$ that we build from $(X,d_{X})$ and $(X^{\prime},d_{X^{\prime}})$ . For this reason $\mathit{FCE}\circ\mathit{FuzzySimplex}$ is not functorial over $\mathcal{\mathbf{PMet}}_{\mathit{bij}}$ .

3 Stability of Manifold Learning Algorithms

We can use this functorial perspective on manifold learning to reason about the stability of manifold learning algorithms under dataset noise. An $\epsilon$ -interleaving between the functors $F,G:\mathbb{R}_{\geq 0}\rightarrow\mathbf{C}$ is a collection of commuting natural transformations between $F(\delta)\rightarrow G(\delta+\epsilon)$ and $G(\delta)\rightarrow F(\delta+\epsilon)$ [10, 11]. The interleaving distance $d_{I}$ between such functors is the smallest $\epsilon$ such that an $\epsilon$ -interleaving exists. In order to study interleavings between functors in $\mathbf{F}\mathbf{Cov}$ or $\mathbf{F}\mathbf{Loss}$ whose domain is $(0,1]^{\text{op}}$ rather than $\mathbb{R}_{\geq 0}$ , we will say that the functors $F,G$ are $\epsilon_{*}$ -interleaved when there is an $\epsilon$ -interleaving between the functors $F\circ r$ and $G\circ r$ where $r(x)=e^{-x}$ . We will also write $d_{I_{*}}(F,G)=d_{I}(F\circ r,G\circ r)$ .

Proposition 8.

Given a subcategory $\mathbf{D}$ of $\mathcal{\mathbf{PMet}}$ , a standard form $\mathbf{D}$ -manifold learning functor $M=L\circ H$ and a pair of finite pseudometric spaces $(X,d_{X}),(Y,d_{Y})$ such that there exists a pair of morphisms $f:(X,d_{X})\rightarrow(Y,d_{Y}+\epsilon),g:(Y,d_{Y})\rightarrow(X,d_{X}+\epsilon)$ in $\mathbf{D}$ , we have $d_{I_{*}}(M(X,d_{X}),M(Y,d_{Y}))\leq\epsilon$ .

Proposition 8 is similar in spirit to previous results that use the Gromov-Hausdorff distance between metric spaces to bound the bottleneck or homotopy interleaving distances between their corresponding Vietoris-Rips complexes [11, 8, 20, 6]. As a special case, if $M$ is an $\mathcal{\mathbf{PMet}}_{\mathit{bij}}$ -manifold learning functor and there exists an $\epsilon$ -isometry between $(X,d_{X}),(Y,d_{Y})$ then $d_{I_{*}}(M(X,d_{X}),M(Y,d_{Y}))\leq\epsilon$ . We can use this to prove the following:

Proposition 9.

Suppose we have a standard form $\mathcal{\mathbf{PMet}}_{\mathit{sur}}$ -manifold learning functor $M$ , a pair of $\epsilon$ -isometric finite pseudometric spaces $(X,d_{X}),(Y,d_{Y})$ and the matrices $A_{X},A_{Y}$ that respectively minimize $\mathbf{l}_{M(X,d_{X})}$ and $\mathbf{l}_{M(Y,d_{Y})}$ . Then if:

\displaystyle|c_{{M(X,d_{X})(a)}_{ij}}(x)|\leq\frac{K_{\mathbf{c}}}{2},|c_{{M(Y,d_{Y})(a)}_{ij}}(x)|\leq\frac{K_{\mathbf{c}}}{2}

and:

\displaystyle|e_{{M(X,d_{X})(a)}_{ij}}(x)|\leq\frac{K_{\mathbf{e}}}{2},|e_{{M(Y,d_{Y})(a)}_{ij}}(x)|\leq\frac{K_{\mathbf{e}}}{2}

we have:

\displaystyle\mathbf{l}_{M(X,d_{X})}\left(A_{Y}\right)\leq\mathbf{l}_{M(X,d_{X})}\left(A_{X}\right)+K_{\mathbf{c}}n^{2}(1-e^{-\epsilon})+K_{\mathbf{e}}n^{2}(e^{\epsilon}-1)

(2)

If $e_{{M(X,d_{X})(a)}_{ij}}(x)$ is constant in $x$ (such as for any $M$ that factors as $M=\mathit{MDS}\circ H$ ) we have:

\displaystyle\mathbf{l}_{M(X,d_{X})}\left(A_{Y}\right)\leq\mathbf{l}_{M(X,d_{X})}\left(A_{X}\right)+K_{\mathbf{c}}n^{2}(1-e^{-\epsilon})

(3)

For a very simple example, consider Multidimensional Scaling without dimensionality reduction. In this case $M=\mathit{MDS}\circ\mathcal{ML}$ and $(X,d_{X}),(Y,d_{Y})$ are each finite ordered $n$ -element subspaces of $\mathbb{R}^{m}$ with Euclidean distance. If we write the vectors in $X$ and $Y$ as matrices $A_{X},A_{Y}\in\mathbf{Mat}_{n,m}$ , then these matrices respectively minimize $\mathbf{l}_{M(X,d_{X})}$ and $\mathbf{l}_{M(Y,d_{Y})}$ , and $\mathbf{l}_{M(X,d_{X})}(A_{X})=\mathbf{l}_{M(Y,d_{Y})}(A_{Y})=0$ . Since the function that sends the $i$ th element of $X$ to the $i$ th element of $Y$ must be an $\inf\{2\epsilon\ |\ \forall_{i}\|A_{X_{i}}-A_{Y_{i}}\|\leq\epsilon\}$ -isometry, Proposition 9 bounds the average of the squared distances between the paired rows of two matrices in terms of the largest such distance.

These bounds apply to a very general class of manifold learning algorithms, including topologically unstable algorithms like IsoMap [5]. As an example, consider using IsoMap to project $n$ evenly spaced points that lie upon the surface of a radius $r$ circle in $\mathbb{R}^{2}$ onto $\mathbb{R}^{1}$ . In this case $(X,d_{X})$ is a finite ordered $n$ -element subspace of $\mathbb{R}^{2}$ with Euclidean distance, $M=\mathit{MDS}\circ IsoCluster_{\delta}$ and for any matrix $A_{X}\in\mathbf{Mat}_{n,1}$ that consists of $n$ evenly spaced points along the real line such that $A_{X_{i+1}}-A_{X_{i}}=2r\sin(\frac{2\pi}{2n})$ we have $\mathbf{l}_{M(X,d_{X})}(A_{X})=0$ . Now suppose that we instead apply IsoMap to a noisy view of $(X,d_{X})$ : a finite ordered $n$ -element subspace $(Y,d_{Y})$ of $\mathbb{R}^{2}$ where $d_{Y}$ is Euclidean distance and $\forall_{i=1...n}d_{X}(X_{i},Y_{i})=d_{Y}(X_{i},Y_{i})=\|X_{i}-Y_{i}\|\leq\epsilon$ . Then for any matrix $A_{Y}\in\mathbf{Mat}_{n,1}$ that minimizes $\mathbf{l}_{M(Y,d_{Y})}$ , Proposition 9 bounds the average squared difference between $|A_{Y_{i+1}}-A_{Y_{i}}|$ and $2r\sin(\frac{2\pi}{2n})$ .

4 Experiments in Functorial Recombination

One benefit of the functorial perspective on manifold learning is that it provides a natural way to produce new manifold learning algorithms by recombining the components of existing algorithms. Suppose we are able to express two existing manifold learning algorithms $M_{1},M_{2}$ in this framework such that $M_{1}=L_{1}\circ H_{1}$ and $M_{2}=L_{2}\ \circ H_{2}$ where $H_{1},H_{2}$ are hierarchical clustering functors. Then we can use the compositionality of functors to define the manifold learning algorithms $L_{2}\circ H_{1}$ or $L_{1}\circ H_{2}$ . We can use this procedure to combine the strengths of multiple algorithms in a way that preserves functoriality (and therefore also stability by Proposition 9). For example, if we compose the $\mathit{FuzzySimplex}$ functor from Proposition 6 with $\mathit{MDS}$ we form the $\mathcal{\mathbf{PMet}}_{\mathit{isom}}$ -manifold learning functor $\mathit{MDS}\circ\mathit{FuzzySimplex}$ which maps $(X,d_{X})$ to the embedding optimization problem $(|X|,m,l)$ where $l(A)=\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(-\log(\alpha_{ij})-\|A_{i}-A_{j}\|)^{2}$ and $\alpha_{ij}$ is the strength of the fuzzy simplex that UMAP forms between $x_{i}$ and $x_{j}$ .

For a more illustrative example, consider a DNA recombination task in which we attempt to match a string of DNA that has been repeatedly mutated back to the original string. One way to solve this task is to generate embeddings for each string of DNA and look at the nearest neighbors of the mutated string. We can simulate this task as follows

1.

Generate $N$ original random sequences of DNA of length $L$ (strings of “A”, “C”, “G”, “T”).
2.

For each sequence, mutate the sequence $M$ times to produce a mutation list, or a list of sequences which each start with an original DNA sequence and end with a final DNA sequence.
3.

Collect each of the $M$ sequences in each of these $N$ mutation lists into an $N*M$ element finite pseudometric space with Hamming distance.
4.

Build embeddings from this pseudometric space and compute the percentage of mutation lists for which the nearest neighbor of the last DNA sequence in that list among the set of all original sequences is the first sequence in that list (the accuracy).

A manifold learning algorithm that performs well on this task will need to take advantage of the intermediate mutation states to recognize that the first state and final state in a mutation list should be embedded as close together as possible. We can follow the procedure in Section 2.1 to adapt the Metric Multidimensional Scaling algorithm $\mathit{MDS}\circ\mathcal{ML}$ (Section 2.3.1) into such an algorithm by forming the maximally interconnected functor $\mathit{MDS}\circ\mathcal{SL}$ . Intuitively, this functor maps $(X,d_{X})$ to a loss function that corresponds to the optimization objective for Metric Multidimensional Scaling where Euclidean distance is replaced with:

\displaystyle d^{*}_{X}(x,x^{\prime})=\inf\{\delta\ |\ \exists x=x_{1},x_{2},...,x_{n}=x^{\prime}\in X,d_{X}(x_{i},x_{i+1})\leq\delta\}

We call this the Single Linkage Scaling algorithm (Algorithm 1). Since this algorithm embeds points that are connected by a sequence in the original space as close together as possible, we expect Single Linkage Scaling to outperform Metric Multidimensional Scaling on this task. This is exactly what we see in Table 4. We also show the embeddings for each sequence in each list in Figure 1.

Algorithm 1 Single Linkage Scaling

1:procedure SingleLinkageScaling((

(X,d_{X}),m

))

2: Initialize the

|X|\times|X|

matrix

B

to all zeros

\forall i,j\leq|X|

B_{ij}=\inf\{\delta\ |\ \exists x_{i}=x_{1},x_{2},...,x_{n}=x_{j}\in X,d_{X}(x_{k},x_{k+1})\leq\delta\}

A\leftarrow\min_{A\in\mathbf{Mat}_{|X|,m}}\sum_{\begin{subarray}{c}i,j\in 1...|X|\end{subarray}}(\|A_{i}-A_{j}\|-B_{ij})^{2}

6: return

A

Refer to caption — Figure 1: Embeddings of DNA sequences from the DNA recombination task with $L=1000,N=100,M=10$ . Each color indicates a unique DNA sequence mutation list. Note that Single Linkage Scaling ( $\mathit{MDS}\circ\mathcal{SL}$ ) on the right embeds sequences in the same mutation list more closely together than Metric Multidimensional Scaling ( $\mathit{MDS}\circ\mathcal{ML}$ ) on the left.

Algorithm	Accuracy
$N=100$
	Accuracy
$N=100$
	Accuracy
$N=200$
	Accuracy
$N=200$
	$M=20$
Metric Multidimensional Scaling
Embedding Size 2	0.21 ( $\pm$ 0.05)	0.01 ( $\pm$ 0.02)	0.29 ( $\pm$ 0.02)	0.01 ( $\pm$ 0.00)
Single Linkage Scaling
Embedding Size 2	0.61 ( $\pm$ 0.02)	0.68 ( $\pm$ 0.05)	0.76 ( $\pm$ 0.01)	0.32 ( $\pm$ 0.02)
Metric Multidimensional Scaling
Embedding Size 5	0.74 ( $\pm$ 0.01)	0.13 ( $\pm$ 0.02)	0.84 ( $\pm$ 0.01)	0.04 ( $\pm$ 0.01)
Single Linkage Scaling
Embedding Size 5	0.93 ( $\pm$ 0.05)	0.91 ( $\pm$ 0.02)	0.96 ( $\pm$ 0.02)	0.34 ( $\pm$ 0.02)

Table 1: Accuracy on the DNA recombination task of the Metric Multidimensional Scaling (

\mathit{MDS}\circ\mathcal{ML}

) and Single Linkage Scaling (

\mathit{MDS}\circ\mathcal{SL}

) algorithms (higher numbers are better). The accuracy is the percentage of the

N

mutation lists of length

M

for which the nearest neighbor of the last sequence in the list among the set of all original DNA sequences is the first sequence in that list. The reported numbers are means (and standard deviations) across

10

simulations. All DNA sequences are of length

L=1000

5 Discussion and Future Work

We have taken the first steps towards a categorical framework for manifold learning. By defining an algorithm as a functor from a category of metric spaces, we can explicitly express the kind of dataset transformations under which it is equivariant. We have shown that for many popular manifold learning algorithms, including Metric Multidimensional Scaling and IsoMap, the optimization objective changes in a predictable way as we modify the metric space.

The functorial perspective also suggests a new strategy for exploratory data analysis with manifold learning. Since we can decompose manifold learning algorithms into two components (clustering and loss) we can examine how slight variations of the clustering algorithm affect the learned embeddings. We have shown in Section 2.1 that every manifold learning functor $L\circ H$ lies on a spectrum of interconnectedness between $L\circ\mathcal{ML}$ and $L\circ\mathcal{SL}$ , and we can form new algorithms by moving along this spectrum. For example, we saw in Section 4 that replacing the $\mathcal{ML}$ functor with $\mathcal{SL}$ in the Metric Multidimensional Scaling algorithm substantially changes the learned embeddings and improves performance on a DNA recombination task. There are also many algorithms that lie between these two options, including the $k$ -Path Scaling and $k$ -Vertex-Connected Scaling algorithms that we introduced in Section 2.3.

Another major benefit of expressing algorithms as functors is that functors preserve categorical properties like interleaving distance. This allows us to easily reason about the stability properties of both existing algorithms and new algorithms that we create by recombining functors. Other authors have used this strategy to prove stability properties of the homology of geometric filtered complexes [11]. In Section 3 we have used this strategy to define bounds on how dataset noise affects optimization quality. In future work we hope to use these techniques to derive more powerful theorems around the resilience of other kinds of unsupervised or supervised algorithms to noise. For example, we may also be able to tighten our bounds by switching our perspective from finite metric spaces to distributions [7] or even involving categorical probability [15] to replace interleaving distance with a probabilistic analog. Due to the simplicity and flexibility of this strategy, other researchers have begun to develop more flexible characterizations of interleaving distance that we can apply in even more situations [20].

References

[1]
[2] Hervé Abdi (2007): Metric multidimensional scaling (MDS): analyzing distance matrices. Encyclopedia of Measurement and Statistics. SAGE Publications.
[3] Hervé Abdi & Lynne J Williams (2010): Principal component analysis. Wiley interdisciplinary reviews: computational statistics 2(4), pp. 433–459, 10.1002/wics.101.
[4] Stephen Bailey (2012): Principal Component Analysis with Noisy and/or Missing Data. Publications of the Astronomical Society of the Pacific 124(919), pp. 1015–1023, 10.1086/668105. Available at http://www.jstor.org/stable/10.1086/668105.
[5] Mukund Balasubramanian (2002): The isomap algorithm and topological stability. Science 295(5552), 10.1126/science.295.5552.7a.
[6] Andrew J Blumberg & Michael Lesnick (2017): Universality of the homotopy interleaving distance. arXiv preprint arXiv:1705.01690.
[7] Adam Brown, Omer Bobrowski, Elizabeth Munch & Bei Wang (2020): Probabilistic convergence and stability of random mapper graphs. Journal of Applied and Computational Topology, pp. 1–42, 10.1007/s41468-020-00063-x.
[8] Peter Bubenik & Jonathan A Scott (2014): Categorification of persistent homology. Discrete & Computational Geometry 51(3), pp. 600–627, 10.1007/s00454-014-9573-x.
[9] Gunnar Carlsson & Facundo Mémoli (2013): Classifying clustering schemes. Foundations of Computational Mathematics, 10.1007/s10208-012-9141-9.
[10] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J Guibas & Steve Y Oudot (2009): Proximity of persistence modules and their diagrams. In: Proceedings of the twenty-fifth annual symposium on Computational geometry, pp. 237–246, 10.1145/1542362.1542407.
[11] Frédéric Chazal, Vin De Silva & Steve Oudot (2014): Persistence stability for geometric complexes. Geometriae Dedicata 173(1), pp. 193–214, 10.1007/s10711-013-9937-z.
[12] Jared Culbertson, Dan P Guralnik, Jakob Hansen & Peter F Stiller (2016): Consistency constraints for overlapping data clustering. arXiv preprint arXiv:1608.04331.
[13] Jared Culbertson, Dan P Guralnik & Peter F Stiller (2018): Functorial hierarchical clustering with overlaps. Discrete Applied Mathematics 236, pp. 108–123, 10.1016/j.dam.2017.10.015.
[14] Brendan Fong & David I. Spivak (2019): An Invitation to Applied Category Theory: Seven Sketches in Compositionality. Cambridge University Press, 10.1017/9781108668804.
[15] Tobias Fritz (2020): A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics 370, p. 107239, 10.1016/j.aim.2020.107239.
[16] Samuel Gerber, Tolga Tasdizen & Ross Whitaker (2007): Robust non-linear dimensionality reduction using successive 1-dimensional Laplacian eigenmaps. In: Proceedings of the 24th international conference on Machine learning, pp. 281–288, 10.1145/1273496.1273532.
[17] Jon M Kleinberg (2003): An impossibility theorem for clustering. Advances in Neural Information Processing Systems, 10.5555/2968618.2968676.
[18] Tom Leinster (2016): Basic Category Theory. Cambridge University Press, 10.1017/CBO9780511525858.
[19] Leland McInnes, John Healy, Nathaniel Saul & Lukas Großberger (2018): UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software 3(29), p. 861, 10.21105/joss.00861.
[20] Luis N Scoccola (2020): Locally Persistent Categories And Metric Properties Of Interleaving Distances. PhD Thesis at Western University (Ontario). Available at https://ir.lib.uwo.ca/etd/7119/.
[21] Dan Shiebler (2020): Functorial Clustering via Simplicial Complexes. Topological Data Analysis and Beyond Workshop at NeurIPS 2020. Available at https://openreview.net/pdf?id=ZkDLcXCP5sV.
[22] David I Spivak (2012): Metric realization of fuzzy simplicial sets. Self published notes. Available at http://math.mit.edu/~dspivak/files/metric_realization.pdf.
[23] Joshua B Tenenbaum, Vin De Silva & John C Langford (2000): A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), pp. 2319–2323, 10.1126/science.290.5500.2319.