AUTHOR GUIDELINES FOR MLSP PROCEEDINGS MANUSCRIPTS
Abstract
Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods are currently the state of the art for such computations, but require computations. In addition, Sinkhorn-based methods commonly use an Euclidean ground distance between datapoints. However, with the prevalence of manifold structured scientific data, it is often desirable to consider geodesic ground distance. Here, we tackle both issues by proposing Geodesic Sinkhorn—based on diffusing a heat kernel on a manifold graph. Notably, Geodesic Sinkhorn requires only computation, as we approximate the heat kernel with Chebyshev polynomials based on the sparse graph Laplacian. We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy. In particular we define the barycentric distance as the distance between two such barycenters. Using this definition, we identify an optimal transport distance and path associated with the effect of treatment on cellular data.
1 Introduction
Optimal Transport (OT) distances, or Wasserstein distances are computed by lifting ground distances between points to distances between measures. This distance is computed relative to a ground distance on the support of the distributions, making it more informative than distances based only on a pointwise comparison of the densities. However to compute the Wasserstein, one needs to find the optimal transport plan from the source distribution to a target distribution, this is a linear programming problem requiring for discrete distributions of size [peyre_computational_2020].
An efficient modification of the optimal transport problem is to consider an entropy-regularized transportation. This formulation is solved with the Sinkhorn algorithm [sinkhorn1967concerning] by iteratively rescaling a Gaussian kernel based from the distance matrix. It is equivalent to the Schrödinger Bridge problem, for which similar algorithms were developed [fortet1940resolution, kullback1968probability, knight2013fast]. In the discrete case, it requires for distributions of size , since it relies on matrix-vector products. Furthermore, this formulation allows for fast computation of the discrete barycenter with fixed support (the average distributions w.r.t. the Sinkhorn distance). An important drawback of the Sinkhorn algorithm is the necessity to store and multiply the pairwise distance matrix with a vector.
Additionally, the ground distance is commonly chosen as the Euclidean distance. The Euclidean distance is often suboptimal for high-dimensional datasets over larger distances according to the manifold hypothesis, which says observations lie near a low dimensional (curved) manifold in high dimensional space [moon_manifold_2018]. For higher dimensional datasets assumed to be sampled from lower dimensional manifold, using a distance closer to the manifold for OT as shown promising results [chen2020uncovering, huguet2022manifold, solomon2015convolutional, tong2022embedding, tong2021diffusion].
In this work we present Geodesic Sinkhorn111https://anonymous.4open.science/r/geo_sinkh_anon-61F7/README.rst; a Sinkhorn-based method for fast optimal transport with a heat-geodesic ground distance. Our method is based on the geometry of the dataset constructed from a common graph and uses the heat kernel on the graph to defined a heat-geodesic distance. Key to this approach, we will never need to construct or operate on an distance matrix, and we will only use the sparse Laplacian matrix and sparse matrix vector products. For sparse graphs, this can be used for computation of the Sinkhorn distance with a manifold ground distance, improving on the implementations based on dense matrices.
Increasing the state-of-the art efficiency in Sinkhorn computation opens us up to being able to perform complex operations on large groups of datasets. In particular, we consider interpolating between datasets: 1) the McCann interpolant between two datasets, and 2) the barycenter between datasets. We show that using our heat-geodesic distance improve the interpolation accuracy compare to OT with Euclidean distance. The barycenter corresponds to the average distribution of a set of distributions. Our methods allows for finer grained barycenters on a data manifold, which motivate us to define a novel notion of dissimilarity between families of distributions called barycentric distance.
We use Geodesic Sinkhorn for interpolation of dynamic distribution on ten single-cell RNA sequencing (snRNA-seq) time series datasets. Further we apply the barycentric distance to single cell data from patient-derived cancer organoids (PDOs) to assess the effect of treatments (such as drugs and chemotherapy). Here we have one set of PDOs from control conditions, and another set that are treated. The true treatment effect is thus the distance between these barycenters. In addition, we use Geodesic Sinkhorn’s barycenter to compare the effect from one family of distribution to another.
Our main contributions include:
-
•
A new method for computing optimal transport distances on a manifold called Geodesic sinkhorn, which is highly efficient in time and memory.
-
•
Defining the barycentric distance; a novel distance between families of distribution, and showing its utility in deriving treatment effect from control and treated patient samples.
-
•
Presenting new large single-cell RNA-sequencing datasets to the community as benchmarks for optimal transport.
2 Related Work
There are few methods for computing optimal transport efficiently between large datasets. The current state of the art is the Sinkhorn algorithm [cuturi2013sinkhorn] which computes a regularized OT map between discrete distributions of points in time. The Low-Rank (LR) Sinkhorn algorithm [scetbon2021low], speed up the computation by representing the cost matrix by a low-rank factorization. Similarly \citetscetbon2020linear, use a kernel representation of the cost matrix for faster computation. Some other approximations are based on the dual formulation of the Wasserstein distance, like DiffusionEMD [tong2021diffusion] that approximates a distance equivalent to the Wasserstein with geodesic cost, and Tree-based methods [le2019tree, backurs_scalable_2020, sato_fast_2020] that solve the dual formulation using a tree metric. These methods admit a closed form, and scale almost linearly with . \citetle2022sobolev generalizes this tree formulation to graphs, however edges in the graph are interpreted as costs rather than affinities. This leads to a different problem setup to manifold approximation methods.
Geodesic Sinkhorn is related prior work linking the entropy-regularized optimal transport problem triangular mesh with the heat operator [crane_geodesics_2013, solomon2015convolutional], but using different graph filtering techniques. These approaches approximate the application of the heat kernel to a vector by discretizing the heat equation and solving systems of linear equations, this technique was used in different contexts either with the cotangent Laplacian [solomon2015convolutional] or to learn a ground metric [heitz2021ground]. Solving these systems for each Sinkhorn iteration can be done efficiently with a sparse Cholesky decomposition. However, this method’s efficiency depends mainly on the efficiency of the Cholesky decomposition which can be slow depending on the sparsity pattern is for an matrix, and necessitates solving systems of linear equation per Sinkhorn iteration.
In Table 1, we summarize the advantages of the previous methods, where we note Sinkhorn with the heat kernel approximated by implicit Euler discretization as Sinkhorn Euler.
Method | Sparse Matrix | Barycenter | Mccann | Complexity per iteration |
---|---|---|---|---|
Exact OT | No | Yes | Yes | |
Sinkhorn [cuturi2013sinkhorn] | No | Yes | Yes | |
LR Sinkhorn [scetbon2021low] | No | Yes | Yes | |
Diffusion EMD [tong2021diffusion] | Yes | No | No | |
Sinkhorn Euler [solomon2015convolutional] | Yes | Yes | Yes | Cholesky + Linear System |
Geodesic Sinkhorn (ours) | Yes | Yes | Yes |
3 Preliminaries
In this section, we start by review the basics of OT and the Wasserstein distance, as well the Sinkhorn distance. Then we review two notions fundamental to our method; the heat equation on a graph, and the Chebyshev approximation of the heat kernel.
Wasserstein Distance
In the following, we assume that all distributions admit a density or a probability mass function, and we use the same notation for both. Let be two probability distributions on a measurable space with metric , let be the set of joint probability distributions on the space where for any measurable subset , and . The -Wasserstein distance is defined as:
(1) |
In this work we consider as it is a valid metric between distributions and has other attractive theoretical properties. An exact algorithm based on linear programming can solve this problem in time for discrete distributions of size .
Sinkhorn Distances
The Kullback-Leibler (KL) divergence between and some strictly positive on is defined as
(2) |
The Sinkhorn distance222With a slight abuse of language we use the term distance, although the entropy-regularized formulation does not respect the identity of indiscernibles. is a relaxation of equation 1 where the infimum is over all coupling in for . Introduced in \citetcuturi2013sinkhorn, the optimization of this distance can be solved by considering the entropy-regularized transport
(3) |
where we define the entropy of a coupling as , and .
This formulation converges to the Wasserstein distance as , and can be solved with the Sinkhorn algorithm with complexity of the order for discrete distributions of size [benamou2015iterative, cuturi2013sinkhorn]. In the discrete case, the transport matrix admits the form , where are vectors of size . The Sinkhorn algorithm iteratively updates the vectors as , where .
Following [solomon2015convolutional], using the kernel gives an alternative interpretation of the Sinkhorn distance as
(4) |
The problem in equation 3 is strictly convex and continuous yielding a unique minimizer. In the discrete case, this leads to an algorithm for the entropy-regularized Wasserstein distance based on the Sinkhorn algorithm enforcing the marginal constraints on the kernel while minimizing the distance as quantified by .
The underlying metric is generally unknown, thus the kernel cannot be evaluated. The authors of [solomon2015convolutional] proposed to approximate with the heat kernel on . According to Varadhan’s formula [varadhan_behavior_1967], the geodesic distance on a manifold can be recovered from the heat transfer at small timescales as
(5) |
Hence, motivating the use of the heat-geodesic distance , with associated kernel
(6) |
Interestingly, the Sinkhorn-based methods admit an efficient algorithm to solve the barycenter problem which we present next.
Interpolation with discrete support
By constraining the support to a set (or a graph), we can efficiently interpolate between more than two distributions. The barycenter problem [benamou2015iterative, solomon2015convolutional, peyre_computational_2020, cuturi2014fast] generalizes the notion of average between points to an average between distributions. For a set of distributions supported on , the objective is to find a distribution minimizing the average distance
where denotes the space of probability distributions supported on , and are non-negative weights. Finding the barycenter is a challenging optimization problem, however the barycenter for Sinkhorn-based methods admits an efficient computation (see Algorithm LABEL:alg:sinkhorn_bary in appendix). It involves updating vectors , which define a transport plan from to the barycenter . The barycenter for two distributions supported on a (discrete) semi-circle is depicted in Fig. 1. Here the two input distributions zero and one have support over 5,000 points each. In the case where is the union of these point sets, the barycenter is a discrete distribution over these 10,000 points, with density concentrated at the top of the semi-circle. The support of the barycenter is constrained to , for most Sinkhorn-based method the size of needs to be small for computational reason. Our method does not suffer from such a limitation. Hence, we can consider barycenter with greater expressively, and interpolate between large sets of distributions.
Interpolation in
Calculating the barycenter with is computationally challenging. With , this is computationally intractable. However, for the special case of , The -barycenter can be computed efficiently using what is known as the McCann interpolant [mccann1997convexity]. For an optimal transport matrix between and with weights , , the Mccann interpolant at time is
where and and is the Dirac distribution. The Mccann interpolant is often used for interpolation between consecutive distributions [schiebinger_optimal-transport_2019].
Next we consider the ground distance, and following Varadhan’s formula, we consider the approximation of heat diffusion for OT with a heat-geodesic distance.
Heat Diffusion on a Graph
Consider an undirected graph with a set of vertices and a set of edges , and its weighted adjacency matrix with non-negative edge weights, and the diagonal degree matrix , where . We define the combinatorial Laplacian as , for any function we have . The combinatorial Laplacian is a symmetric positive semi-definite matrix, and has an eigen-decomposition with orthonormal eigenvectors and diagonal eigenvalue matrix , such that . The combinatorial Laplacian is a natural extension of the negative of the Laplacian operator to graph. For a signal on , the diffusion of on the graph evolves according to the heat equation
This ordinary differential equation describes how the heat, represented by the signal , propagates on the graph over time. Since the eigenvectors of form an orthonormal basis of , we have the solution
The heat operator on is defined by the matrix exponential . By orthogonality of the eigenvectors of , we can write and . Computing by eigendecomposition would require operations. Recall that, for the Sinkhorn algorithm, we are only concerned with the application of the heat operator on a signal . In Geodesic Sinkhorn we use Chebyshev polynomials [shuman_chebyshev_2011, marcotte2022fast] to approximate the application of the heat operator to a signal. For a short timescale , using the heat kernel accounts for using the geodesic distance as ground distance in the entropy-regularized optimal transport formulation equation 3.
Chebyshev Polynomials
Polynomial sequences are often used to approximate functions or operator. With Chebyshev polynomial, we can approximate the application of the matrix exponential to a signal on the graph. An attractive property of Chebyshev polynomials is that the approximation error decay exponentially with the maximum degree . They are defined by the recursive relation with , and for . On these polynomials are orthogonal w.r.t. the weight , and can be used to express the operator . Assuming the largest eigenvalue , we can write
where the scalar coefficient depend on time and can be evaluated with the Bessel function. The approximation of is based on the first term of the series, and we note it . The approximation of the application of the heat operator to a signal is
It results in matrix-vector products which can be efficient since in general is a sparse matrix. On a -nearest neighbor graph, this can be , where is a regularization parameter. The assumption that is not always verified for the combinatorial Laplacian, but it can be rescaled as to ensure this, or use the normalized Laplacian that has eigenvalues in which also avoid computing the second-largest eigenvalue. Chebyshev polynomials admits interesting theoretical properties. In particular, the existence of approximation bounds depending on the largest degree [marcotte2022fast], hence the can be determined for a given approximation error tolerance.
4 Geodesic Sinkhorn Distances
We define the Geodesic Sinkhorn distance between any signals or distributions on a graph by the entropy-regularized OT with the heat kernel on the graph. This construction is also valid between any point cloud datasets. In that case, for datasets sampled from a set of distributions , we construct a common graph using an affinity kernel on the datasets and compare two samples by taking the distance between two indicator functions on the graph. We approximate the heat kernel with Chebyshev polynomials of order . In Algorithm 4.1, we present the main steps to evaluate the Geodesic Sinkhorn. It is based upon Sinkhorn iterations [sinkhorn1967concerning, cuturi2013sinkhorn], where and denote respectively the elementwise division and multiplication. Note that, as opposed to the usual Sinkhorn algorithm, we never have to store a dense distance matrix, but only the usually sparse graph Laplacian.
Definition 4.1.
The Geodesic Sinkhorn distance between two distributions , on a graph is
[tb] Geodesic Sinkhorn {algorithmic} \STATEInput: Laplacian , distributions (signals) , on , maximum Chebyshev degree , regularization , vertices weights. \STATEOutput: \STATE// Initialization \STATE \STATE// Sinkhorn iterations \FOR \STATE \STATE \ENDFOR\STATEReturn:

In the following proposition, we find the ground distance implicitly used in the optimal transport defined by Geodesic Sinkhorn. We recall that two distances are equivalent if for all there exists two positive constants such that , and we note .
Proposition 4.2.
There exists a maximum Chebyshev polynomial degree such that the ground distance in Geodesic Sinkhorn is equivalent to the one based on the heat kernel on
In particular, the Wasserstein distances with these ground distances are equivalent.
Proof.
Provided in Appendix LABEL:sec:appendix_theory. ∎
In \citetsolomon2015convolutional,heitz2021ground, using the Euler implicit discretization results in a ground cost of the form , where is the identity matrix, and can be seen as another approximation for the matrix exponential.
The efficiency of Geodesic Sinkhorn improves the notion of barycenter as it is possible to consider much larger graph , thus a finer grained support of the barycenter. This leads us to define a novel distance between families of distributions.
Definition 4.3.
For two finite families of distributions and supported on , we define the barycentric distance between the families as
where are respectively the barycenters of and .
This distance follows similar intuitions as the distances in clustering algorithms like average-linkage, where a distance between clusters is defined via an aggregation of distances. The previous definition is valid for any distance between distributions or barycenters. However, OT barycenters are known to be more informative than others[cuturi2014fast]. We will further explore this comparison in our experiments. We use it to distinguish between two groups in a medical setting where a set of patients received a treatment (defining the family ), and another set acts as control family . Following this idea, we define a notion of effect between two families.
Definition 4.4.
For two family of distributions and supported on , we define the Expected Barycenter Effect of as
where are respectively the barycenters of and , and and the features .
Note that we compute the average on the family of distributions instead of the average on their support, hence we evaluate their expectations in a closed form. This definition also extends to a conditional equivalent where families of distribution can be subdivided with discrete covariate variables. When the barycenters are computed with the total variation, this definition is equivalent to the naive Average Treatment Effect(ATE) [imbens2015causal]; i.e. difference of empirical means of the two groups.
In Fig. 2 we present the focus of this section and the three types of interpolation we considered; between two distributions without constraint on the support; barycenters between multiple distributions, and a distance between families of distributions.

5 Results
We demonstrate the accuracy and efficiency of the Geodesic Sinkhorn distance on three tasks:
-
1.
Time-series interpolation in section 5.1 similar to the setup of [schiebinger_optimal-transport_2019] consisting of ten scRNA-seq datasets.
-
2.
Nearest-Wasserstein-neighbor distribution calculation on simulated data with manifold structure similar to the setup of [tong2021diffusion].
-
3.
A newly defined Barycentric distance between families of distributions and its application to quantify the effect of a treatment on patient-derived organoids.
In all our experiments, we constructed the weighted adjacency matrix of the graph with the -decay kernel [moon_visualizing_2019]. This kernel is based on a k-nearest neighbor graph and attribute weights with exponential decay and is extensively used in single-cell analysis. We used the default parameters that is five nearest neighbors and .
Dataset | Sinkhorn | Sinkhron Euler | Geo Sinkhorn (ours) |
---|---|---|---|
Cite Donor0 | 48.545 0.057 | 46.254 3.192 | 44.440 0.108 |
Cite Donor1 | 48.220 0.055 | 45.897 3.254 | 44.165 0.103 |
Cite Donor2 | 50.281 0.016 | 47.773 3.958 | 45.673 0.092 |
Cite Donor3 | 49.339 0.081 | 46.565 3.553 | 45.022 0.146 |
Clark | 13.500 0.003 | – | 13.288 0.008 |
EB | 12.415 0.008 | 12.298 0.140 | 12.133 0.011 |
Multiome Donor0 | 56.648 0.048 | 55.373 7.234 | 53.431 0.077 |
Multiome Donor1 | 54.028 0.126 | 52.396 4.394 | 50.238 0.022 |
Multiome Donor2 | 58.798 0.155 | 57.182 5.511 | 55.041 0.058 |
WOT | 8.096 0.003 | – | 7.397 0.106 |
5.1 Time series Interpolation
To evaluate Geodesic Sinkhorn’s performance on inferring dynamics, we test its performance on a task for time series interpolation. In this setting the most used datasets are Embryoid Body [moon_visualizing_2019], and WOT [schiebinger_optimal-transport_2019]. We curated ten scRNA-seq datasets; WOT [schiebinger_optimal-transport_2019], Clark [clark_single-cell_2019], Embryoid Body [moon_visualizing_2019], and seven more from the 2022 multimodel single-cell integration challenge333https://www.kaggle.com/competitions/open-problems-multimodal/ to test our method. The observations are the gene expression of single cells from a distribution evolving through time. The goal is to interpolate the distribution between two timepoints. The number of timepoints in each dataset range from 4 to 40. Here for a dataset with single-cell distributions over time for we compute the exact Euclidean 2-Wasserstein distance between the interpolated distribution at time and the ground truth distribution , (see Appendix LABEL:sec:_appendix_additional_res for details). Since we interpolate between two distributions we used the McCann interpolant as its support is . We compare our Geodesic Sinkhorn interpolation with either the Exact Mccann interpolant with Euclidean distances (Exact Mccann), the Sinkhorn Mccann interpolant with Euclidean ground distance (Sinkhorn Mccann), Sinkhorn Euler Mccann with the Euler heat approximation in Tab. 2. Sinkhorn with Euler approximation ran out of memory on the Clark and WOT datasets. For space reasons, we leave comparison to additional interpolation methods to the appendix (Tab. LABEL:tab:full_time_interp, Tab. LABEL:tab:time_serie_kernel). We see that across all 10 the Geodesic Sinkhorn interpolation with the Mccann interpolant outperforms all other methods, hence showcasing the importance of the heat-geodesic distance and our kernel approximation.
5.2 Nearest-Wasserstein-neighbor distributions
Computation time of heat approximation
We compare the time required to approximate the heat kernel using either the Chebyshev polynomials () or the implicit Euler discretization (30 steps) with Cholesky decomposition [solomon2015convolutional, heitz2021ground]. In Fig 3, we report the time is second requires to diffuse a uniform signal on a graph where we sample 10k observations from a number of distributions ranging from 5 to 45. We also report the time without the steps that can be prefactored; the computation of the normalized Laplacian for the Chebyshev method (Cheb w/o pre), and the Cholesky decomposition for the Euler method (Euler w/o pre). Using Chebyshev polynomials can be more than two times faster than the Euler method. Recall that this step needs to be applied twice per Sinkhorn iteration, our method can thus be drastically faster. In the appendix Fig. LABEL:fig:time_swiss_roll_W1 and Fig. LABEL:fig:time_swiss_roll we show more comparative results on the time efficiency of our methods against Sinkhorn, Diffusion EMD, and the LR Sinkhorn, our method can be 20 time faster than Sinkhorn and LR Sinkhorn.
Computation time and accuracy
In this experiment, we compare our method against Sinkhorn [cuturi2013sinkhorn], and LR Sinkhorn [scetbon2021low], both algorithm with Euclidean and squared Euclidean ground distance, with DiffusionEMD [tong_trajectorynet_2020], and Sinkorn with Euler approximation of the heat filter. We created 15 Gaussian distributions sampled randomly on a swiss roll dataset, and sampled 10k observations from each distribution. We consider a k-nearest neighbors task on these distributions. We evaluate the methods with the ground truth, since we know the exact geodesic distance on the manifold. We test two implementations of our method, either using the Chebyshev approximation from the python library PyGSP[michael_defferrard_2017_1003158] (Geodesic Sinkhorn), with the graph from PHATE (P) and UMAP (S), or using the Chebyshev implementation from [marcotte2022fast] (Geodesic Sinkhorn*). The latter implementation can be easily sectioned, thus avoiding superfluous calculation. For instance, we compute the coefficient only once on the graph, then use them for all 15 distributions. In Tab. 3, we report the average and standard deviation over 10 seeds of the Spearman and Pearson correlations to the ground truth, and the runtime in seconds with and without the computation of the graph. Our method is always more accurate than Sinkhorn and LR Sinkhorn, while being about 6 to 20 times faster. Our method’s accuracy is similar to the approximation with Euler, but can be almost four times faster. DiffusionEMD and Geodesic Sinkhorn performances are similar, but diffusion EMD does not provide a transportation plan nor meaningful barycenters.
TODO: [[[REVIEW that in Tab.3, marcotte is slower than pygsp.]]]

SpearmanR | PearsonR | P@5 | time(s) no graph | time(s) | |
---|---|---|---|---|---|
Name | |||||
Diffusion EMD | 0.620.097 | 0.7360.023 | 0.660.072 | 2.8450.135 | 7.8770.531 |
Sinkhorn | 0.3870.044 | 0.5230.036 | 0.4710.028 | 112.4060.206 | 112.4060.206 |
Sinkhorn | 0.4110.036 | 0.4850.027 | 0.4920.053 | 133.6865.234 | 133.6865.234 |
LR Sinkhorn | -0.310.07 | -0.1310.086 | 0.2370.037 | 578.631107.82 | 578.631107.82 |
LR Sinkhorn | 0.3660.048 | 0.3790.051 | 0.4470.023 | 204.1913.656 | 204.1913.656 |
Geodesic Sinkhorn | 0.8470.023 | 0.7540.016 | 0.8330.034 | 10.1761.249 | 16.6821.705 |
5.3 Barycentric distance
Synthetic data
We test if we can identify a linear treatment effect with the Expected Barycenter Effect (EBE). In this experiment, we create a control family of distributions of ten standard Gaussian distributions. The treatment group consists of nine Gaussian distributions , and one outlier centered at different means. For each distribution, we sample 500 observations, and reproduce the experiment over ten seeds. In Tab. 4, we report the EBE and its standard deviation with the Geodesic Sinkhorn, the total variation distance, and Sinkhorn. Since the TV only compares the mean, it is sensitive to the outlier, whereas our method can identify the true treatment effect.
Outlier | EBE Geo Sinkhorn | EBE Sinkhorn | EBE TV |
---|---|---|---|
-60 | 5.0160.226 | -0.1030.005 | -1.4290.144 |
-30 | 5.0530.196 | 0.3550.049 | 1.5710.144 |
0 | 4.9170.315 | 4.9540.157 | 4.5710.144 |
No outlier | 5.0590.159 | 5.0540.16 | 5.0710.144 |
We compare distances to compute the barycenter to retrieve an underlying distribution. We suppose two ground truth distributions; a standard Gaussian and a Gaussian with mean five and unit variance, but we only observe two groups of noisy distributions of unequal sizes
We compute and , then the (exact) 2-Wasserstein between the barycenters and which we desire to be close to known 2-Wasserstein between the ground truth distributions. In Tab 5, we report the score , where is the exact 2-Wasserstein between barycenters and the ground truth. A score close to zero is better; it implies that the barycenters are close to the ground truth distributions. We report results for distributions in and dimensions as well as different diffusion time and regularization parameter . In the Appendix (Tab.LABEL:tab:ATE_perturb_appendix, Tab. LABEL:tab:_gaussian_fam_appendix) we report additional results for Geodesic Sinkhorn with the graph used in UMAP [wolf2018scanpy, mcinnes2018umap].
Dim | Reg. | Geo Sinkhorn | Sinkhorn | TV |
---|---|---|---|---|
2 | 0.1440.104 | 0.1420.094 | 0.6360.065 | |
0.1410.107 | 0.2220.192 | 0.6360.065 | ||
0.1330.135 | 0.9450.019 | 0.6360.065 | ||
10 | 0.1910.086 | 0.1860.081 | 0.5410.094 | |
0.1920.089 | 0.4260.309 | 0.5410.094 | ||
0.1870.111 | 0.8940.035 | 0.5410.094 |
Single-cell signalling data
In this experiment, we show an application of the barycentric distance and the expected barycenter effect. We use single-cell signalling data produced by mass cytometry (MC) for a screening study to compare the treatment effect of different chemotherapies on 10 colorectal cancer (CRC) patient-derived organoids (PDOs) [zapatero2022pdos]. These PDOs can be grouped into chemoresistant PDOs, that show little-to-no effect when treated with chemotherapies; and chemosensitive PDOs, that present strong shifts in their phenotypes upon treatment. The observations include single-cell data information on the cell cycle and signalling changes upon treatment of PDOs with different CRC treatments at a range of concentrations. To compute the CBT distance, we define signals on the graph as indicator functions for the treatments and PDOs respecting the condition chemoresistant or chemosensitive, and compute the barycenter of each treatment. For example, the barycenter corresponding to the treatment S (SN-38, active metabolite of Irinotecan) is . In Fig. 4, we present the barycentric distances matrices between treatments a) and between four concentrations of treatment SN-38 (S) b). In both cases, the control groups corresponds to AH and DMSO, the two rightmost columns. We compare the distance matrices between Sinkhorn (left) and our method (right). Our method provide a finer distinction between treatments and concentrations, especially for the chemosensitive group. As observed in [zapatero2022pdos], chemosensitive PDOs show little-to-no response to lower concentrations of SN-38 (S1), but their phenotype shifts very strongly upon treatment with higher concentrations (S2, S3, and S4) (Fig. 4b). When comparing combinations of different treatments with SN-38, Geodesic Sinkhorn better resolves the difference between SN-38 alone and in combination with Cetuximab (C), showing that S is the main agent creating the treatment effect and the combination with C does not resolve in a synergistic effect (Fig. 4a). Note that we only consister the relative magnitude of the distances since the two algorithm use different ground distances. In Fig. 5, we show the Expected Barycenter Effect (Def. 4.4) of SN-38 for the two conditions, our method correctly detects the chemosensitive marker cPARP.


6 Conclusion
In this work, we considered the use of optimal transport for graphs and large datasets in high dimensions potentially sampled from a lower dimensional manifold. We proposed Geodesic Sinkhorn, a fast implementation of the Sinkhorn algorithm using the graph Laplacian and Chebyshev polynomials. Our method is well adapted for large and high dimensional dataset as it is defined with a geodesic ground distance, which takes into account the underlying geometry of the data, and requires less computation time and less memory. We showed on 10 scRNA-seq time series datasets that our method has greater interpolation accuracy than Sinkhorn and the exact Mccann interpolant. On a synthetic dataset, we showed that Geodesic Sinkhorn can be about 23 times faster than Sinkhorn, even when taking into account the construction of the graph. On the classification task, our method was faster than other Sinkhorn-based algorithms, while having greater accuracy. With the Wasserstein barycenter, we defined the barycentric distance to compare entire families of distribution, and the expected barycenter effect, then applied both methods to a large PDO drug screen dataset. In future work, we aim to study the theoretical implications of our new definitions.
7 REFERENCES
List and number all bibliographical references at the end of the paper. The references can be numbered in alphabetic order or in order of appearance in the document. When referring to them in the text, type the corresponding reference number in square brackets as shown at the end of this sentence [C2].