¹¹institutetext: University of Oxford

Flattening Multiparameter Hierarchical Clustering Functors

Dan Shiebler

Abstract

We bring together topological data analysis, applied category theory, and machine learning to study multiparameter hierarchical clustering. We begin by introducing a procedure for flattening multiparameter hierarchical clusterings. We demonstrate that this procedure is a functor from a category of multiparameter hierarchical partitions to a category of binary integer programs. We also include empirical results demonstrating its effectiveness. Next, we introduce a Bayesian update algorithm for learning clustering parameters from data. We demonstrate that the composition of this algorithm with our flattening procedure satisfies a consistency property.

1 Introduction

One of the most useful ways to process a dataset represented as a finite metric space is to cluster the dataset, or break it into groups. An important step is choosing the desired granularity of the clustering, and multiparameter hierarchical clustering algorithms accept a hyperparameter vector to control this.

Since applying a multiparameter hierarchical clustering algorithm with different hyperparameter values may produce different partitions, we can view such algorithms as mapping a finite metric space $(X,d_{X})$ to a function from the hyperparameter space to the space of partitions of $X$ . A flattening procedure then maps this function to a single partition.

Many popular clustering algorithms, such as HDBSCAN [7] and ToMATo [3], include a flattening step that operates on the same intuition that drives the analysis of persistence diagrams in TDA. That is, the most important clusters (homological structures) of a dataset are those which exist at multiple scales (have large differences between their birth and death times).

In this paper we will characterize and study clustering algorithms as functors, similarly to [1, 4, 12]. We will particularly focus on multiparameter hierarchical clustering algorithms with partially ordered hyperparameter spaces. This perspective allows us to guarantee that the clustering algorithms we study preserve both non-expansive maps between metric spaces and the ordering of the hyperparameter space. Our contributions are:

•

We describe an algorithm for flattening multiparameter hierarchical clusterings, which we demonstrate is a functorial map from a category of multiparameter hierarchical partitions to a category of binary integer programs.
•

We introduce a Bayesian update algorithm for learning a distribution over clustering hyperparameters from data.
•

We prove that the composition of the Bayesian update algorithm and the flattening procedure is consistent.

2 Multiparameter Hierarchical Clustering

In this work we will define flat clustering algorithms to map a finite metric space $(X,d_{X})$ to a partition of $X$ . We will primarily work with the following categories:

Definition 1

In the category $\mathcal{\mathbf{Met}}$ objects are finite metric spaces $(X,d_{X})$ and morphisms are non-expansive maps, or functions $f:X\rightarrow Y$ such that $d_{Y}(f(x_{1}),f(x_{2}))\leq d_{X}(x_{1},x_{2})$ .

Definition 2

In the category $\mathbf{Part}$ objects are tuples $(X,\mathcal{P}_{X})$ where $\mathcal{P}_{X}$ is a partition of the set $X$ . Morphisms in $\mathbf{Part}$ are functions $f:X\rightarrow Y$ such that if $S\in\mathcal{P}_{X}$ then $\exists S^{\prime}\in Y,f(S)\subseteq S^{\prime}$ .

We will also work in the subcategories $\mathcal{\mathbf{Met}}_{bij},\mathbf{Part}_{bij}$ of $\mathcal{\mathbf{Met}},\mathbf{Part}$ respectively in which morphisms are further restricted to be bijective.

Definition 3

Given a subcategory $\mathbf{D}$ of $\mathcal{\mathbf{Met}}$ , a flat clustering functor on $\mathbf{D}$ is a functor $F:\mathbf{D}\rightarrow\mathbf{Part}$ that is the identity on the underlying set $X$ . In the case that $\mathbf{D}$ is unspecified we simply call $F$ a flat clustering functor.

Now recall that the set of connected components of the $\delta$ -Vietoris-Rips complex of $(X,d_{X})$ is the partioning of $X$ into subsets with maximum pairwise distance no greater than $\delta$ . Given $a\in(0,1]$ , an example of a flat clustering functor on $\mathcal{\mathbf{Met}}$ is the $a$ -single linkage functor $\mathcal{SL}(a)$ , which maps a metric space to the connected components of its $-log(a)$ -Vietoris-Rips complex [12, 4, 1]. Given $a_{1},a_{2}\in(0,1]$ , an example of a flat clustering functor on $\mathcal{\mathbf{Met}}_{bij}$ is the robust single linkage functor $\mathcal{SL}_{\mathcal{R}}(a_{1},a_{2})$ which maps a metric space $(X,d_{X})$ to the connected components of the $-log(a_{2})$ -Vietoris-Rips complex of $(X,d^{a_{1}}_{X})$ where:

\displaystyle d^{a_{1}}_{X}(x_{1},x_{2})=max(d_{X}(x_{1},x_{2}),\mu_{X_{a_{1}}}(x_{1}),\mu_{X_{a_{1}}}(x_{2}))

and $\mu_{X_{a_{1}}}(x_{1})$ is the distance from $x_{1}$ to its $\left\lfloor{a_{1}*|X|}\right\rfloor$ th nearest neighbor [2]. Intuitively, robust single linkage reduces the impact of dataset noise by increasing distances in sparse regions of the space. Note that robust single linkage is not a flat clustering functor on $\mathcal{\mathbf{Met}}$ because it includes a $k$ -nearest neighbor computation that is sensitive to $|X|$ .

Like single linkage and robust single linkage, many flat clustering algorithms are configured by a hyperparameter vector that governs their behavior. In the case that this hyperparameter vector is an element of a partial order $O$ we can represent the output of such an algorithm with a functor $O\rightarrow\mathbf{Part}$ .

Recall that $\mathbf{Part}^{O}$ is the category of functors from $O$ to $\mathbf{Part}$ and natural transformations between them. We will write $\mathbf{Part}^{\overline{O}}$ to represent the subcategory of such functors that commute with the forgetful functor $U:\mathbf{Part}\rightarrow\mathbf{Set}$ . Given $F:O\rightarrow\mathbf{Part}$ in $\mathbf{Part}^{\overline{O}}$ we will call the image of $U\circ F$ the underlying set of $F$ . Note that there also exists a forgetful functor $\mathbf{Part}^{\overline{O}}\rightarrow\mathbf{Set}$ that maps $F:O\rightarrow\mathbf{Part}$ to its underlying set and that any natural transformation in $\mathbf{Part}^{\overline{O}}$ between the functors $F_{X}:O\rightarrow\mathbf{Part}$ and $F_{Y}:O\rightarrow\mathbf{Part}$ with underlying sets $X$ and $Y$ is fully specified by a function $f:X\rightarrow Y$ .

Definition 4

Given a partial order $O$ and a subcategory $\mathbf{D}$ of $\mathcal{\mathbf{Met}}$ , an $O$ -clustering functor on $\mathbf{D}$ is a functor $H:\mathbf{D}\rightarrow\mathbf{Part}^{\overline{O}}$ that commutes with the forgetful functors from $\mathbf{D}$ and $\mathbf{Part}^{\overline{O}}$ into $\mathbf{Set}$ . In the case that $\mathbf{D}$ is unspecified we simply call $H$ an $O$ -clustering functor.

For example, single linkage $\mathcal{SL}:\mathcal{\mathbf{Met}}\rightarrow\mathbf{Part}^{\overline{(0,1]^{op}}}$ is a $(0,1]^{op}$ -clustering functor on $\mathcal{\mathbf{Met}}$ and robust single linkage $\mathcal{SL}_{\mathcal{R}}:\mathcal{\mathbf{Met}}_{bij}\rightarrow\mathbf{Part}_{bij}^{\overline{(0,1]^{op}\times(0,1]^{op}}}$ is a $(0,1]^{op}\times(0,1]^{op}$ -clustering functor on $\mathcal{\mathbf{Met}}_{bij}$ .

Definition 5

Given the functor $F_{X}\in\mathbf{Part}^{\overline{O}}$ with underlying set $X$ , its partition collection is the set $S_{X}$ of all subsets $S\subseteq X$ such that there exists some $a\in O$ where $S\in F_{X}(a)$ .

We will write the elements of $S_{X}$ (subsets of $X$ ) with the notation $S_{X}=\{s_{{1}_{X}},s_{{2}_{X}},\cdots,s_{{n}_{X}}\}$ .

In practice it is often convenient to “flatten” a functor $F_{X}\in\mathbf{Part}^{\overline{O}}$ to a single partition of $X$ by selecting a non-overlapping collection of sets from its partition collection $S_{X}$ . Since we will express this selection problem as a binary integer program we will work in the following category:

Definition 6

The objects in $\mathbf{BIP}$ are tuples $(n,m,c,A,B,u)$ where $n,m\in\mathbb{N}$ , $c,u$ are $m$ -element real-valued vectors, $A$ is an $n\times m$ real-valued matrix and $B$ is an $n\times m$ $\{0,1\}$ -valued matrix. Intuitively, the tuple $(n,m,c,A,B,u)$ represents the following binary integer program: find an $m$ -element $\{0,1\}$ -valued vector $v$ that maximizes $c^{T}v$ subject to $Av+Bv\leq u$ .

The morphisms between $(n,m,c,A,B,u)$ and $(n^{\prime},m^{\prime},c^{\prime},A^{\prime},B^{\prime},u^{\prime})$ are tuples $(P_{c},P_{u},P_{A},P_{A^{*}},P_{B},P_{B^{*}})$ where $P_{u},P_{A}$ are $n^{\prime}\times n$ real-valued matrices, $P_{A^{*}},P_{c}$ are $m\times m^{\prime}$ real-valued matrices, $P_{B}$ is an $n^{\prime}\times n$ $\{0,1\}$ -valued matrix and $P_{B^{*}}$ is an $m\times m^{\prime}$ $\{0,1\}$ -valued matrix such that:

\displaystyle P_{c}c^{\prime}=c\qquad P_{u}u=u^{\prime}\qquad P_{A}AP_{A^{*}}=A^{\prime}\qquad P_{B}BP_{B^{*}}=B^{\prime}

where the operation $P_{B}BP_{B^{*}}$ is performed with logical matrix multiplication. Intuitively, a morphism $(P_{c},P_{u},P_{A},P_{A^{*}},P_{B},P_{B^{*}})$ maps the binary integer program “find an $m$ -element $\{0,1\}$ -valued vector $v$ that maximizes $(P_{c}c^{\prime})^{T}v$ subject to $Av+Bv\leq u$ ” to the binary integer program “find an $m^{\prime}$ -element $\{0,1\}$ -valued vector $v$ that maximizes $c^{\prime T}v$ subject to $P_{A}AP_{A^{*}}v+P_{B}BP_{B^{*}}v\leq P_{u}u$ ”.

When we construct a binary integer program from an object $F_{X}$ in $\mathbf{Part}^{\overline{O}}$ with underlying set $X$ , we weight the elements of its partition collection $S_{X}$ according to a model of the importance of different regions of $O$ . In this work we will only consider $O$ that are Borel measurable, so we can represent this model with a probability measure $\mu$ over $O$ . This probabilistic interpretation will be useful in Section 2.2 when we update this model with labeled data. We can then view the flattening algorithm as choosing a non-overlapping subset $\mathcal{P}_{X}\subseteq S_{X}$ (the elements of $\mathcal{P}_{X}$ are subsets of $X$ where no element of $X$ belongs to more than a single set in $\mathcal{P}_{X}$ ) that maximizes the expectation of the function that maps $a$ to the number of $s_{{i}_{X}}\in\mathcal{P}_{X}$ that are also in $F_{X}(a)$ . Formally, the algorithm maximizes $\int_{a\in O}\left|\{s_{{i}_{X}}\ |\ s_{{i}_{X}}\in\mathcal{P}_{X},s_{{i}_{X}}\in F_{X}(a)\}\right|d\mu$ . If $\mu$ is uniform this is similar to the Topologically Motivated HDBSCAN described in [7].

Proposition 1

Given a probability measure $\mu$ over $O$ , there exists a functor $Flatten_{\mu}:\mathbf{Part}_{bij}^{\overline{O}}\rightarrow\mathbf{BIP}$ that maps $F_{X}:O\rightarrow\mathbf{Part}_{bij}$ with partition collection $S_{X}$ to a tuple $(|S_{X}|,|S_{X}|,c,A,B,u)$ such that any solution to the problem “find an $m$ -element $\{0,1\}$ -valued vector $v$ that maximizes $c^{T}v$ subject to $Av+Bv\leq u$ ” specifies a non-overlapping subset $\mathcal{P}_{X}\subseteq S_{X}$ that maximizes $\int_{a\in O}\left|\{s_{{i}_{X}}\ |\ s_{{i}_{X}}\in\mathcal{P}_{X},s_{{i}_{X}}\in F_{X}(a)\}\right|d\mu$ .

Proof

Given a probability measure $\mu$ over $O$ , $Flatten_{\mu}:\mathbf{Part}_{bij}^{\overline{O}}\rightarrow\mathbf{BIP}$ maps the functor $F_{X}:O\rightarrow\mathbf{Part}_{bij}$ with partition collection $S_{X}$ to the binary integer program $(|S_{X}|,|S_{X}|,c,A,B,u)$ where $c,u$ are $|S_{X}|$ -element real-valued vectors and $A,B$ are respectively real-valued and $\{0,1\}$ -valued $|S_{X}|\times|S_{X}|$ matrices where:

	$\displaystyle u_{i}=\|S_{X}\|\qquad c_{i}=\int_{\{a\ \|\ s_{{i}_{X}}\in F_{X}(a)\}}d\mu$
	$\displaystyle A_{i,j}=\begin{cases}\|S_{X}\|-1&i=j\\ 0&else\end{cases}\qquad B_{i,j}=\begin{cases}1&s_{{i}_{X}}\cap s_{{j}_{X}}\neq\varnothing\\ 0&else\end{cases}$

A natural transformation between $F_{X},F_{Y}:O\rightarrow\mathbf{Part}_{bij}$ with underlying sets $X,Y$ and partition collections $S_{X},S_{Y}$ that is specified by the surjective function $f:X\rightarrow Y$ is sent to the tuple $(P_{c},P_{u},P_{A},P^{T}_{A},P_{B},P^{T}_{B})$ where $P_{c}$ is an $|S_{X}|\times|S_{Y}|$ real-valued matrix, $P_{A},P_{u}$ are $|S_{Y}|\times|S_{X}|$ real-valued matrices, and $P_{B}$ is an $|S_{Y}|\times|S_{X}|$ $\{0,1\}$ -valued matrix such that:

	$\displaystyle P_{c_{i,j}}=\begin{cases}\frac{\int_{\{a\ \|\ s_{{i}_{X}}\in F_{X}(a),\ s_{{j}_{Y}}\in F_{Y}(a)\}}\ d\mu}{\int_{\{a\ \|\ s_{{j}_{Y}}\in F_{Y}(a)\}}\ d\mu}&f(s_{{i}_{X}})\subseteq s_{{j}_{Y}}\\ 0&else\end{cases}\qquad P_{u_{j,i}}=\begin{cases}\frac{\|S_{Y}\|}{\|S_{X}\|}&i=j\\ 0&else\end{cases}$
	$\displaystyle P_{A_{j,i}}=\begin{cases}\sqrt{\frac{\|S_{Y}\|-1}{\|S_{X}\|-1}}&i=j\\ 0&else\end{cases}\qquad P_{B_{j,i}}=\begin{cases}1&f(s_{{i}_{X}})\subseteq s_{{j}_{Y}}\\ 0&else\end{cases}$

First we will show that any feasible solution to the integer program $Flatten_{\mu}F_{X}$ corresponds to a selection of elements from $S_{X}$ with no overlaps. If $s_{{i}_{X}}\cap s_{{j}_{X}}\neq\varnothing$ , then the $i$ th row of $A+B$ will have $|S_{X}|$ in position $i$ and $1$ in position $j$ . This implies that if $v_{i}=1$ then $(A+B)_{i}^{T}v\leq|S_{X}|$ if and only if $v_{j}=0$ . Note also that by the definition of binary integer programming this is the selection of elements that maximizes:

\displaystyle\sum_{s_{{i}_{X}}\in\mathcal{P}_{X}}c_{i}=\sum_{s_{{i}_{X}}\in\mathcal{P}_{X}}\int_{\{a\ |\ s_{{i}_{X}}\in F_{X}(a)\}}d\mu=\int_{a\in O}\left|\{s_{{i}_{X}}\ |\ s_{{i}_{X}}\in\mathcal{P}_{X},s_{{i}_{X}}\in F_{X}(a)\}\right|d\mu

Next, we will show that $Flatten_{\mu}$ is a functor. Consider $F_{X},F_{Y}:O\rightarrow\mathbf{Part}_{bij}$ with underlying sets $X,Y$ and partition collections $S_{X},S_{Y}$ and suppose:

\displaystyle Flatten_{\mu}F_{X}=(|S_{X}|,|S_{X}|,c_{X},A_{X},B_{X},u_{X})\qquad Flatten_{\mu}F_{Y}=(|S_{Y}|,|S_{Y}|,c_{Y},A_{Y},B_{Y},u_{Y})

Consider also a natural transformation specified by the function $f:X\rightarrow Y$ and define the image of $Flatten_{\mu}$ on this natural transformation to be $(P_{c},P_{u},P_{A},P^{T}_{A},P_{B},P^{T}_{B})$ . We first need to show that:

\displaystyle P_{c}c_{Y}=c_{X}\qquad P_{u}u_{X}=u_{Y}\qquad P_{A}A_{X}P^{T}_{A}=A_{Y}\qquad P_{B}B_{X}P^{T}_{B}=B_{Y}

Where $P_{B}B_{X}P^{T}_{B}=B_{Y}$ is performed with logical matrix multiplication. In order to see that $P_{c}c_{Y}=c_{X}$ , note that:

	$\displaystyle(P_{c}c_{Y})_{i}=\sum_{\{j\ \|\ f(s_{{i}_{X}})\subseteq s_{{j}_{Y}}\}}\left(\int_{\{a\ \|\ s_{{j}_{Y}}\in F_{Y}(a)\}}d\mu\right)\left(\frac{\int_{\{a\ \|\ s_{{i}_{X}}\in F_{X}(a),\ s_{{j}_{Y}}\in F_{Y}(a)\}}\ d\mu}{\int_{\{a\ \|\ s_{{j}_{Y}}\in F_{Y}(a)\}}\ d\mu}\right)=$
	$\displaystyle\sum_{\{j\ \|\ f(s_{{i}_{X}})\subseteq s_{{j}_{Y}}\}}\int_{\{a\ \|\ s_{{i}_{X}}\in F_{X}(a),\ s_{{j}_{Y}}\in F_{Y}(a)\}}\ d\mu=\int_{\{a\ \|\ s_{{i}_{X}}\in F_{X}(a)\}}\ d\mu=c_{X_{i}}$

Next, to see that $P_{u}u_{X}=u_{Y}$ , note that $(P_{u}u_{X})_{j}=\frac{|S_{Y}|}{|S_{X}|}|S_{X}|=|S_{Y}|=u_{Y_{j}}$ . Next, to see that $P_{A}A_{X}P^{T}_{A}=A_{Y}$ , recall that $P_{A}$ is an $|S_{Y}|\times|S_{X}|$ matrix and note that since $f$ is surjective it must be that $S_{Y}\leq S_{X}$ . Therefore both the left $|S_{Y}|\times|S_{Y}|$ submatrix of $P_{A}$ and the top $|S_{Y}|\times|S_{Y}|$ submatrix of $P_{A}^{T}$ are diagonal, so the product $P_{A}A_{X}P^{T}_{A}$ is a diagonal $|S_{Y}|\times|S_{Y}|$ matrix with:

\displaystyle(P_{A}A_{X}P^{T}_{A})_{jj}=P_{A_{jj}}A_{X_{jj}}P^{T}_{A_{jj}}=\frac{|S_{Y}|-1}{|S_{X}|-1}(|S_{X}|-1)=|S_{Y}|-1=A_{Y_{jj}}

Next, to see that $P_{B}B_{X}P^{T}_{B}=B_{Y}$ , note first that $P_{B}B_{X}$ is a $\{0,1\}$ -valued $|S_{Y}|\times|S_{X}|$ matrix where:

\displaystyle(P_{B}B_{X})_{ji}=\exists_{k=1...|S_{X}|}P_{B_{j,k}}\wedge B_{X_{k,i}}=\exists\ s_{{k}_{X}}\in S_{X},f(s_{{k}_{X}})\subseteq s_{{j}_{Y}}\wedge s_{{k}_{X}}\cap s_{{i}_{X}}\neq\varnothing

And therefore that:

	$\displaystyle(P_{B}B_{X}P^{T}_{B})_{jj^{\prime}}=$
	$\displaystyle\exists\ s_{{l}_{X}}\in S_{X}\left(\exists\ s_{{k}_{X}}\in S_{X},f(s_{{k}_{X}})\subseteq s_{{j}_{Y}}\wedge s_{{k}_{X}}\cap s_{{l}_{X}}\neq\varnothing\right)\wedge\left(f(s_{{l}_{X}})\subseteq s_{{j^{\prime}}_{Y}}\right)=$
	$\displaystyle\exists\ s_{{l}_{X}},s_{{k}_{X}}\in S_{X},f(s_{{k}_{X}})\subseteq s_{{j}_{Y}}\wedge f(s_{{l}_{X}})\subseteq s_{{j^{\prime}}_{Y}}\wedge s_{{k}_{X}}\cap s_{{l}_{X}}\neq\varnothing=$
	$\displaystyle s_{{j}_{Y}}\cap s_{{j^{\prime}}_{Y}}\neq\varnothing$

Finally, note that $Flatten_{\mu}$ preserves the identity since $P_{B}=P_{A}=I$ when $S_{X}=S_{Y}$ and it preserves composition by the laws of matrix multiplication. ∎

For example, if $\mu$ is uniform then the connected components of the Vietoris-Rips filtration of $(X,d_{X})$ that have the largest differences between their birth and death times will be a solution to $(Flatten_{\mu}\circ\mathcal{SL})(X,d_{X})$ . Note also that $Flatten_{\mu}$ is only functorial over $\mathbf{Part}_{bij}^{\overline{O}}$ , and not all of $\mathbf{Part}^{\overline{O}}$ . Intuitively, this is because $Flatten_{\mu}$ maps natural transformations between elements of $\mathbf{Part}_{bij}^{\overline{O}}$ to linear maps that only exist when the functions underlying these natural transformations are bijective.

2.1 The Multiparameter Flattening Algorithm

Given an $O$ -clustering functor $H$ and a distribution $\mu$ over $O$ we can use Monte Carlo integration and $Flatten_{\mu}$ to implement the following algorithm:

1:procedure MultiparameterFlattening(

H,\mu,(X,d_{X}),n

)

2: Initialize an empty list

L

3: Repeat

n

times:

4: Sample the hyperparameter vector

a

according to

\mu

5: Add each set in

H(X,d_{X})(a)

L

6: Define

S_{X}

to be the list of unique elements of

L

7: Define

c

such that

c_{i}

is the number of times that

s_{{i}_{X}}

appears in

L

8: Set

A,B,u

with

Flatten_{\mu}

9: Return the solution to the binary integer program

(|S_{X}|,|S_{X}|,c,A,B,u)

We include an example of this procedure on Github ¹¹1https://github.com/dshieble/FunctorialHyperparameters that builds on McInnes et. al.’s [7]’s HDBSCAN implementation. In Table 2.1 we demonstrate that applying this procedure and solving the resulting binary integer program can perform better than choosing an optimal parameter value.

Refer to caption — Table 1: We compare the performance of applying $Flatten_{\mu}$ to HDBSCAN with simply running HDBSCAN with the value of the distance scaling parameter $\alpha$ that achieves the best Adjusted Rand Score. We evaluate over the Fashion MNIST [13] and 20 Newsgroups [6] datasets by using the scikit-learn implementation of the Adjusted Rand Score [9]. We see that the $Flatten_{\mu}$ procedure performs consistently better, which suggests that it may be a good option for unsupervised learning applications such as data visualization or pre-processing.

Algorithm	Adjusted Rand Score on
	Adjusted Rand Score on
	20 Newsgroups Dataset
HDBSCAN With Optimal
Distance Scaling Parameter $\alpha$	$0.217$	$0.181$
HDBSCAN With Flattening Over
Distance Scaling Parameter $\alpha$	$0.262$	$0.231$

Flattening Multiparameter Hierarchical Clustering Functors

Abstract

1 Introduction

2 Multiparameter Hierarchical Clustering

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

Proposition 1

Proof

2.1 The Multiparameter Flattening Algorithm

2.2 Multiparameter Flattening with Supervision

Proposition 2

Proof

3 Discussion and Next Steps

References

	$\displaystyle both(\mathcal{P}_{X})=\{x_{i},x_{j}\ \|\ \exists s_{X}\in\mathcal{P}_{X},x_{i},x_{j}\in s_{X}\wedge\exists s^{\prime}_{X}\in H(X,d_{X})(a),x_{i},x_{j}\in s^{\prime}_{X}\}$
	$\displaystyle neither(\mathcal{P}_{X})=\{x_{i},x_{j}\ \|\ \not\exists s_{X}\in\mathcal{P}_{X},x_{i},x_{j}\in s_{X}\wedge\not\exists s^{\prime}_{X}\in H(X,d_{X})(a),x_{i},x_{j}\in s^{\prime}_{X}\}$