This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\NewDocumentCommand\exsub

s m O m\IfBooleanT#1E_#2[#4]\IfBooleanF#1E_#2#3[#4#3]

Optimal Transport with Cyclic Symmetry

Shoichiro Takeda, Yasunori Akagi, Naoki Marumo, Kenta Niwa
Abstract

We propose novel fast algorithms for optimal transport (OT) utilizing a cyclic symmetry structure of input data. Such OT with cyclic symmetry appears universally in various real-world examples: image processing, urban planning, and graph processing. Our main idea is to reduce OT to a small optimization problem that has significantly fewer variables by utilizing cyclic symmetry and various optimization techniques. On the basis of this reduction, our algorithms solve the small optimization problem instead of the original OT. As a result, our algorithms obtain the optimal solution and the objective function value of the original OT faster than solving the original OT directly. In this paper, our focus is on two crucial OT formulations: the linear programming OT (LOT) and the strongly convex-regularized OT, which includes the well-known entropy-regularized OT (EROT). Experiments show the effectiveness of our algorithms for LOT and EROT in synthetic/real-world data that has a strict/approximate cyclic symmetry structure. Through theoretical and experimental results, this paper successfully introduces the concept of symmetry into the OT research field for the first time.

1 Introduction

Given two probability vectors and a cost matrix, the discrete optimal transport (OT) problem seeks an optimal solution to minimize the cost of transporting the probability vector toward another one. Its total transportation cost is an effective tool that compares two probability vectors. Therefore, OT has been studied in various research areas, e.g., text embedding (Kusner et al. 2015), image matching (Liu et al. 2020), domain adaptation (Courty et al. 2017), graph comparison (Nikolentzos, Meladianos, and Vazirgiannis 2017), and interpolation (Solomon et al. 2015).

There are many formulations for OT. Kantorovich (1942) was the first to formulate OT as the linear programming problem, and the linear OT (LOT) made great progress toward solving OT. Recently, the strongly convex-regularized OT (SROT) has attracted much attention, especially, the entropy-regularized OT (EROT) (Cuturi 2013; Blondel, Seguy, and Rolet 2018; Peyré and Cuturi 2019; Guo, Ho, and Jordan 2020). SROT is superior to LOT in terms of guaranteeing a unique solution and computational stability.

Many algorithms have been studied to solve OT. The network simplex algorithm (Ahuja, Magnanti, and Orlin 1993) is a well-known classical algorithm for LOT and has been widely used. The Sinkhorn algorithm (Cuturi 2013) and primal-dual descent algorithms (Dvurechensky, Gasnikov, and Kroshnin 2018; Guo, Ho, and Jordan 2020) have been proposed to solve EROT faster. Recently, algorithms utilizing special structures of input data have been in the spotlight for solving OT faster, e.g., algorithms that utilize the low-rankness of the input data (Tenetov, Wolansky, and Kimmel 2018; Altschuler et al. 2019). Besides, several algorithms utilize the Gibbs kernel structure of the input cost matrix in the Sinkhorn algorithm, such as separability (Solomon et al. 2015; Bonneel, Peyré, and Cuturi 2016) and translation invariance (Getreuer 2013; Peyré and Cuturi 2019).

Refer to caption
Figure 1: Overview of our algorithm for LOT with cyclic symmetry (C-LOT). This algorithm reduces C-LOT to a small LOT that has significantly fewer variables and solves the small LOT instead, resulting in fast computation. Note that the small cost matrix is not just a part of the original one; it aggregates the original cost matrix on the basis of cyclic symmetry, see 14.

In this paper, we propose novel fast algorithms for OT utilizing a new special structure, cyclic symmetry, of input data. Specifically, we assume nn-order cyclic symmetry for the input data; the input dd-dimensional probability vector is a concatenation of nn copies of an m(d/n)m(\coloneqq d/n)-dimensional vector, and the input d×dd\times d cost matrix is a block-circulant matrix consisting of nn matrices with size m×mm\times m (see Assumption 1). Such OT with cyclic symmetry appears universally in various real-world examples: image processing, urban planning, and graph processing (see examples in Section 4). Intuitively, we can obtain an optimal solution to such a problem faster by solving OT for only one of the symmetric components of the input data and concatenating nn copies of the obtained solution. However, this approach cannot work due to ignoring interactions between the symmetric components (see Appendix A). Unlike such an intuitive way, we propose novel fast algorithms utilizing cyclic symmetry for two crucial OT formulations: LOT and SROT.

First, we propose a fast algorithm for LOT with cyclic symmetry (C-LOT). Figure 1 shows an overview of this algorithm. Our main idea is to reduce C-LOT, which has d2d^{2} variables, to a small LOT, which has only m2m^{2} variables, by utilizing cyclic symmetry. To achieve this reduction, we introduce auxiliary variables considering cyclic symmetry and rewrite C-LOT as a min\min-min\min optimization problem. Surprisingly, the inner min\min problem can be solved analytically, and the min\min-min\min problem becomes a small LOT. Therefore, this algorithm solves C-LOT faster by solving the small LOT instead. Using the network simplex algorithm to solve the small LOT, its time complexity bound becomes O(m3logmlog(m𝐂)+d2)O(m^{3}\log m\log(m\left\|\mathbf{C}\right\|_{\infty})+d^{2}) where 𝐂\mathbf{C} is the cost matrix and 𝐂maxi,j|Cij|\left\|\mathbf{C}\right\|_{\infty}\coloneqq\max_{i,j}|C_{ij}|. This is greatly improved from O(d3logdlog(d𝐂))O(d^{3}\log d\log(d\left\|\mathbf{C}\right\|_{\infty})) when solving C-LOT directly.

Next, we propose a fast algorithm for SROT with cyclic symmetry (C-SROT). Unlike C-LOT, we cannot reduce C-SROT to a small SROT due to the regularizer. To overcome this issue, we consider the Fenchel dual of C-SROT. By utilizing cyclic symmetry, we show that the Fenchel dual problem has only 2m2m variables, which is significantly fewer than the 2d2d variables in the naive dual of C-SROT. Therefore, this algorithm solves the small Fenchel dual problem by the alternating minimization algorithm (Beck 2017, Chapter 14). Since the number of variables is very few, its time complexity for one iteration will be reduced, resulting in fast computation as a whole. Especially, this algorithm for EROT with cyclic symmetry (C-EROT), which is a subclass of C-SROT, becomes a Sinkhorn-like algorithm. We call it cyclic Sinkhorn algorithm. The interesting point is that the Gibbs kernel in the cyclic Sinkhorn algorithm differs from that in the original Sinkhorn algorithm and is designed by considering cyclic symmetry. Its time complexity bound of each iteration is O(m2)O(m^{2}), which is significantly improved from O(d2)O(d^{2}) when solving C-EROT by the original Sinkhorn algorithm.

Finally, we propose a two-stage Sinkhorn algorithm for C-EROT with approximate cyclic symmetry. In the real world, there are many cases where the input data exhibit only approximate cyclic symmetry due to slight noise and displacement. The cyclic Sinkhorn algorithm cannot be applied to such cases because strict cyclic symmetry of the input data is assumed. To overcome this issue, the two-stage Sinkhorn algorithm first runs the cyclic Sinkhorn algorithm to quickly obtain a strict symmetric solution. It then runs the original Sinkhorn algorithm to modify the solution. As a result, this algorithm obtains the optimal solution to C-EROT with approximate cyclic symmetry faster by utilizing cyclic symmetry at the first stage. In Section 7.2, we experimentally confirmed the fast computation of this algorithm when input data have approximate cyclic symmetry.

In summary, this paper introduces the concept of symmetry into the OT research field for the first time and proposes fast cyclic symmetry-aware algorithms that solve small optimization problems instead of the original OT. We validated the effectiveness of our algorithms in synthetic/real-world data with a strict/approximate cyclic symmetry structure.

2 Related Work

OT was initially formulated by (Monge 1781). Later (Kantorovich 1942) relaxed it as the linear programming problem, which permits splitting a mass from a single source to multiple targets. The linear OT (LOT) is easier to solve than Monge’s form and has made great progress toward solving OT. To solve OT, many algorithms have been proposed. For example, the network simplex algorithm (Ahuja, Magnanti, and Orlin 1993) is one of the classical algorithms for LOT and has been widely used. Recently, algorithms have been proposed to solve OT faster by adding the entropy regularizer (Cuturi 2013; Altschuler, Niles-Weed, and Rigollet 2017; Lin, Ho, and Jordan 2019b; Alaya et al. 2019). The dual form of the entropy-regularized OT can be solved faster by the Sinkhorn algorithm that updates dual variables via matrix-vector products (Sinkhorn 1967). For further acceleration, many improvements to the Sinkhorn algorithm have been proposed. For example, (Altschuler, Niles-Weed, and Rigollet 2017), (Lin, Ho, and Jordan 2019b), and (Alaya et al. 2019) proposed using greedy, randomized, and safe-screening strategies, respectively, to efficiently update the dual variables. Primal-dual algorithms have received much attention (Dvurechensky, Gasnikov, and Kroshnin 2018; Lin, Ho, and Jordan 2019a; Guo, Ho, and Jordan 2020) because they report faster computation than the Sinkhorn algorithm and its variants but are rarely used in practice due to the difficulty of implementation (Pham et al. 2020). This paper focuses on the network simplex algorithm and Sinkhorn algorithm because they are widely used.

As another line of work to solve OT faster, utilizing special structures of input data has been well studied (Solomon et al. 2015; Bonneel, Peyré, and Cuturi 2016; Peyré and Cuturi 2019; Getreuer 2013; Tenetov, Wolansky, and Kimmel 2018; Altschuler et al. 2019). Inspired by the fact that geodesic distance matrices can be low-rank approximated (Shamai et al. 2015), a low-rank approximation for the cost matrix in OT was introduced to reduce the time complexity of the Sinkhorn algorithm (Tenetov, Wolansky, and Kimmel 2018; Altschuler et al. 2019). Several approaches have utilized the Gibbs kernel structures of the cost matrix appearing in the Sinkhorn algorithms. The key to these approaches is to approximate the calculation involving the Gibbs kernel; by utilizing separability (Solomon et al. 2015; Bonneel, Peyré, and Cuturi 2016) or translation invariant (Peyré and Cuturi 2019; Getreuer 2013) of the Gibbs kernel on a fixed uniform grid, the matrix-vector product in the Sinkhorn algorithm can be replaced with convolutions. Thus, it can be computed faster by, e.g., a fast Fourier transform. This paper introduces the utilization of a new special but ubiquitous structure, cyclic symmetry, in OT.

3 Preliminary

3.1 Notations

0\mathbb{R}_{\geq 0} denotes the set of non-negative real numbers. ,\langle{\cdot},{\cdot}\rangle denotes the inner product; that is, for vectors 𝐱,𝐲d\mathbf{x},\mathbf{y}\in\mathbb{R}^{d}, 𝐱,𝐲=i=0d1xiyi\langle{\mathbf{x}},{\mathbf{y}}\rangle=\sum_{i=0}^{d-1}x_{i}y_{i}, and for matrices 𝐗,𝐘d×d\mathbf{X},\mathbf{Y}\in\mathbb{R}^{d\times d}, 𝐗,𝐘=i,j=0d1XijYij\langle{\mathbf{X}},{\mathbf{Y}}\rangle=\sum_{i,j=0}^{d-1}X_{ij}Y_{ij}. The probability simplex is denoted as Δd{xidi=0d1xi=1,xi0}\Delta^{d}\coloneqq\{x_{i}\in\mathbb{R}^{d}\mid\sum_{i=0}^{d-1}x_{i}=1,\,x_{i}\geq 0\}. 𝟏d\mathbf{1}_{d} denotes the all-ones vector in d\mathbb{R}^{d}.

3.2 Regularized Optimal Transport (ROT)

We define the regularized OT (ROT) that adds a convex regularizer to the linear OT (LOT) introduced by (Kantorovich 1942). Given two probability vectors 𝐚,𝐛Δd\mathbf{a},\mathbf{b}\in\Delta^{d} and a cost matrix 𝐂0d×d\mathbf{C}\in\mathbb{R}^{d\times d}_{\geq 0}, ROT can be defined as

min𝐓d×d𝐂,𝐓+i,j=0d1ϕ(Tij),s.t.𝐓𝟏d=𝐚,𝐓𝟏d=𝐛,\begin{gathered}\min_{\mathbf{T}\in\mathbb{R}^{d\times d}}\langle{\mathbf{C}},{\mathbf{T}}\rangle+\sum_{i,j=0}^{d-1}\phi(T_{ij}),\\ \mathrm{s.t.}\quad\mathbf{T}\mathbf{1}_{d}=\mathbf{a},\quad\mathbf{T}^{\top}\mathbf{1}_{d}=\mathbf{b},\end{gathered} (1)

where 𝐓\mathbf{T} is called a transportation matrix and ϕ:{+}\phi:\mathbb{R}\to\mathbb{R}\cup\{+\infty\} is a convex function, called a regularizer. We assume ϕ(x)=+\phi(x)=+\infty if x<0x<0; this assumption imposes the non-negative constraint on 𝐓\mathbf{T}.

ROT 1 is a generalization of various OT formulations. For example, 1 leads to LOT when ϕ\phi is given by

ϕ(x)={0ifx0,+otherwise.\displaystyle\phi(x)=\begin{cases}0&\mathrm{if}\ x\geq 0,\\ +\infty&\mathrm{otherwise}.\end{cases} (2)

Also, 1 leads to the strongly convex-regularized OT (SROT) when ϕ\phi is a strongly convex function; a function ϕ\phi is called strongly convex if ϕμ2\phi-\frac{\mu}{2}\|\cdot\| is convex for some μ>0\mu>0. As an important subclass of SROT, 1 leads to the entropy-regularized OT (EROT) introduced by (Cuturi 2013) when ϕ\phi is given by

ϕ(x)={λx(logx1)ifx0,+otherwise,\displaystyle\phi(x)=\begin{cases}\lambda x(\log x-1)&\mathrm{if}\ x\geq 0,\\ +\infty&\mathrm{otherwise},\end{cases} (3)

where λ>0\lambda>0.

4 C-ROT: ROT with Cyclic Symmetry

This section explains our assumption of cyclic symmetry for ROT 1 and real-world examples of this problem.

We assume that 𝐚,𝐛,𝐂\mathbf{a},\mathbf{b},\mathbf{C} in 1 have the following nn-order cyclic symmetry.

Assumption 1.

There exists a divisor nn of dd, and the probability vectors 𝐚,𝐛\mathbf{a},\mathbf{b} in 1 have a periodic structure:

𝐚=(𝜶𝜶𝜶),𝐛=(𝜷𝜷𝜷),\begin{gathered}\mathbf{a}=\begin{pmatrix}\boldsymbol{\alpha}\\ \boldsymbol{\alpha}\\ \vdots\\ \boldsymbol{\alpha}\\ \end{pmatrix},\qquad\mathbf{b}=\begin{pmatrix}\boldsymbol{\beta}\\ \boldsymbol{\beta}\\ \vdots\\ \boldsymbol{\beta}\\ \end{pmatrix},\end{gathered} (4)

where 𝛂,𝛃0m\boldsymbol{\alpha},\boldsymbol{\beta}\in\mathbb{R}^{m}_{\geq 0} and mdnm\coloneqq\frac{d}{n} is an integer. Also, the cost matrix 𝐂\mathbf{C} in 1 has a block-circulant structure:

𝐂=(𝐂0𝐂1𝐂n1𝐂n1𝐂0𝐂1𝐂1𝐂n1𝐂0),\displaystyle\mathbf{C}=\begin{pmatrix}\mathbf{C}_{0}&\mathbf{C}_{1}&\cdots&\mathbf{C}_{n-1}\\ \mathbf{C}_{n-1}&\mathbf{C}_{0}&\ddots&\vdots\\ \vdots&\ddots&\ddots&\mathbf{C}_{1}\\ \mathbf{C}_{1}&\cdots&\mathbf{C}_{n-1}&\mathbf{C}_{0}\\ \end{pmatrix}, (5)

where 𝐂0,,𝐂n10m×m\mathbf{C}_{0},\dots,\mathbf{C}_{n-1}\in\mathbb{R}^{m\times m}_{\geq 0}.

In this paper, we call ROT 1 with Assumption 1 Cyclic ROT (C-ROT). This problem appears universally in various real-world examples given below.

Example 1 (Image with Cyclic Symmetry).

Cyclic symmetry in images has been a central image research topic. Especially, because image data are represented in a rectangle form, mirror or 9090^{\circ} rotational symmetry has been utilized for various tasks; mirror symmetry has been utilized for the face recognition (Zhao et al. 2003) and rendering (Wu, Rupprecht, and Vedaldi 2023), and 9090^{\circ} rotational symmetry in medical and galaxy images has been utilized for the image segmentation (Pang et al. 2022) and morphology prediction (Dieleman, Willett, and Dambre 2015). Thus, we here consider ROT between images with cyclic symmetry, 𝐀\mathbf{A} and 𝐁0h×w\mathbf{B}\in\mathbb{R}^{h\times w}_{\geq 0}. For images with mirror symmetry, we assume mirror symmetry along the vertical axis;

Aij=Ai,wj1,Bij=Bi,wj1,\displaystyle A_{ij}=A_{i,w-j-1},\quad B_{ij}=B_{i,w-j-1}, (6)

for 0i<h0\leq i<h and 0j<w0\leq j<w. We vectorize these images by appropriately ordering pixels as follows:

𝐚\displaystyle\mathbf{a} =(Aπ(0),Aπ(1),,Aπ(hw1)),\displaystyle=\left(A_{\pi(0)},A_{\pi(1)},\ldots,A_{\pi(hw-1)}\right)^{\top}, (7)
𝐛\displaystyle\mathbf{b} =(Bπ(0),Aπ(1),,Bπ(hw1)),\displaystyle=\left(B_{\pi(0)},A_{\pi(1)},\ldots,B_{\pi(hw-1)}\right)^{\top}, (8)
π(k)\displaystyle\pi(k) ={(kmodh,k/h)0k<hw2(kmodh,3w2k/h1)hw2k<hw.\displaystyle=\begin{cases}\left(k\bmod h,\lfloor k/h\rfloor\right)&0\leq k<\frac{hw}{2}\\ \left(k\bmod h,\frac{3w}{2}-\lfloor k/h\rfloor-1\right)&\frac{hw}{2}\leq k<hw\end{cases}. (9)

By defining 𝐂\mathbf{C} as the Manhattan, Euclidean, or Chebyshev distance matrix between pixel positions, 𝐚,𝐛,𝐂\mathbf{a},\mathbf{b},\mathbf{C} satisfy Assumption 1; thus, C-ROT for n=2n=2 will appear. Similarly, by appropriately ordering pixels for 𝐚,𝐛\mathbf{a},\mathbf{b} in the case of 9090^{\circ} rotational symmetry, C-ROT for n=4n=4 will appear.

Example 2 (Urban Planning with Cyclic Symmetry).

ROT has straightforward applications in logistics and economy (Kantorovich 1942; Guillaume 2012). Imagine a situation where planners design the structure of a city, this structure is simply given by two probability distributions: the distributions of residents 𝐚\mathbf{a} and services 𝐛\mathbf{b}. In this context, the objective function value of ROT enables us to measure how close residents and services are and evaluate the city’s efficiency. Several city structures, such as Howard’s garden city (Howard 1965), assume that residents and services are equally located along cyclic symmetry to improve quality of life. In such structures, 𝐚,𝐛\mathbf{a},\mathbf{b} and 𝐂\mathbf{C}, where CijC_{ij} is given by the Euclidean distance between each resident aia_{i} and service bjb_{j}, satisfy Assumption 1; thus, C-ROT will appear.

Example 3 (Graph with Cyclic Symmetry).

Graphs are commonly used to model real-world data. For example, chemical molecules and crystal structures can be modeled using graphs (Bonchev 1991; Xie and Grossman 2018), and their graphs often exhibit cyclic symmetry (Jaffé and Orchin 2002; Ladd 2014). To compare two graphs, computing their distance has been well-studied and OT-based approaches have been proposed (Nikolentzos, Meladianos, and Vazirgiannis 2017; Petric Maretic et al. 2019). We here consider ROT for computing a distance between two graphs with cyclic symmetry. Following (Nikolentzos, Meladianos, and Vazirgiannis 2017), we represent features for the vertices of a graph as the eigenvectors of its adjacency matrix. Like chemical molecules and crystal structures, we assume the vertex features are equally distributed along cyclic symmetry. By defining ai=bj1da_{i}=b_{j}\coloneqq\frac{1}{d} to ensure the same amount of outgoing/incoming flow from/to a vertex and CijC_{ij} as the Manhattan, Euclidean, or Chebyshev distance in the eigenvectors’ feature space, 𝐚,𝐛\mathbf{a},\mathbf{b}, and 𝐂\mathbf{C} satisfy Assumption 1. Thus, C-ROT for two graphs will appear.

5 Fast Algorithms for C-ROT

In this section, we propose fast algorithms for C-ROT. Note that several proofs are in the supplementary material.

5.1 Block-Cyclic Structure of Optimal Solution

We first show the following lemma.

Lemma 1.

Under Assumption 1, there exists an optimal solution to 1 that has the following structure:

𝐓=(𝐓0𝐓1𝐓n1𝐓n1𝐓0𝐓1𝐓1𝐓n1𝐓0),\displaystyle\mathbf{T}=\begin{pmatrix}\mathbf{T}_{0}&\mathbf{T}_{1}&\cdots&\mathbf{T}_{n-1}\\ \mathbf{T}_{n-1}&\mathbf{T}_{0}&\ddots&\vdots\\ \vdots&\ddots&\ddots&\mathbf{T}_{1}\\ \mathbf{T}_{1}&\cdots&\mathbf{T}_{n-1}&\mathbf{T}_{0}\\ \end{pmatrix}, (10)

where 𝐓0,,𝐓n10m×m\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}_{\geq 0}.

The proof is shown in Appendix B.

From Assumptions 1 and 1, 𝐂\mathbf{C} and 𝐓\mathbf{T} have the same block-circulant structure. Plugging 5 and 10 into C-ROT 1 yields the following optimization problem:

min𝐓0,,𝐓n1m×mk=0n1𝐂k,𝐓k+k=0n1i,j=0m1ϕ(Tijk)s.t.k=0n1𝐓k𝟏m=𝜶,k=0n1𝐓k𝟏m=𝜷,\begin{gathered}\min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}}\ \sum_{k=0}^{n-1}\left\langle\mathbf{C}_{k},\mathbf{T}_{k}\right\rangle+\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi(T_{ijk})\\ \text{s.t.}\quad\sum_{k=0}^{n-1}\mathbf{T}_{k}\mathbf{1}_{m}=\boldsymbol{\alpha},\quad\sum_{k=0}^{n-1}\mathbf{T}_{k}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta},\end{gathered} (11)

where TijkT_{ijk} is the (i,j)(i,j)-th entry of 𝐓k\mathbf{T}_{k}. Note that the objective function value of 11 is exactly 1n\frac{1}{n} of that of 1.

5.2 Algorithm for C-LOT

We here propose a fast algorithm for cyclic LOT (C-LOT), which is the special case of C-ROT 1 where ϕ\phi is given by 2. From 11, C-LOT 1 can be rewritten as

min𝐓0,,𝐓n10m×mk=0n1𝐂k,𝐓ks.t.k=0n1𝐓k𝟏m=𝜶,k=0n1𝐓k𝟏m=𝜷.\begin{gathered}\min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}_{\geq 0}}\ \sum_{k=0}^{n-1}\left\langle\mathbf{C}_{k},\mathbf{T}_{k}\right\rangle\\ \text{s.t.}\quad\sum_{k=0}^{n-1}\mathbf{T}_{k}\mathbf{1}_{m}=\boldsymbol{\alpha},\quad\sum_{k=0}^{n-1}\mathbf{T}_{k}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta}.\end{gathered} (12)

By introducing auxiliary variables 𝐒k=0n1𝐓k\mathbf{S}\coloneqq\sum_{k=0}^{n-1}\mathbf{T}_{k} and rewriting 12 for 𝐒\mathbf{S}, we can show the following theorem.

Theorem 1.

We consider a small LOT

min𝐒0m×m𝐆,𝐒s.t.𝐒𝟏m=𝜶,𝐒𝟏m=𝜷,\displaystyle\min_{\mathbf{S}\in\mathbb{R}^{m\times m}_{\geq 0}}\left\langle\mathbf{G},\mathbf{S}\right\rangle\quad\text{s.t.}\quad\mathbf{S}\mathbf{1}_{m}=\boldsymbol{\alpha},\quad\mathbf{S}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta}, (13)

where

Gijmin0kn1Cijk.G_{ij}\coloneqq\min_{0\leq k\leq n-1}C_{ijk}. (14)

Let 𝐒\mathbf{S}^{*} be an optimal solution of 13. Then, (𝐓k)k=0,,n1\left(\mathbf{T}^{*}_{k}\right)_{k=0,\ldots,n-1} defined by

Tijk={Sijifk=min(argmin0kn1Cijk),0otherwise\displaystyle T^{*}_{ijk}=\begin{cases}S^{*}_{ij}&\mathrm{if}\ k=\min\left(\mathop{\rm argmin}\limits_{0\leq k\leq n-1}C_{ijk}\right),\\ 0&\mathrm{otherwise}\end{cases} (15)

is an optimal solution to 12. Also, the optimal objective function value of 12 is the same as that of 13.

Note that argmin0kn1Cijk\mathrm{argmin}_{0\leq k\leq n-1}C_{ijk} will return a set of indices if the same minimum value exists in several indices, and we can choose any one but the smallest index by min\min.

Proof.

We fix 𝐒k=0n1𝐓k\mathbf{S}\coloneqq\sum_{k=0}^{n-1}\mathbf{T}_{k} in 12. The matrix 𝐒\mathbf{S} satisfies 𝐒𝟏m=𝜶,𝐒𝟏m=𝜷\mathbf{S}\mathbf{1}_{m}=\boldsymbol{\alpha},\mathbf{S}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta} and we can rewrite 12 as

min𝐒0m×m,𝐒𝟏m=𝜶,𝐒𝟏m=𝜷(min𝐓0,,𝐓n10m×m,k=0n1𝐓k=𝐒k=0n1𝐂k,𝐓k).\displaystyle\min_{\begin{subarray}{c}\mathbf{S}\in\mathbb{R}^{m\times m}_{\geq 0},\\ \mathbf{S}\mathbf{1}_{m}=\boldsymbol{\alpha},\ \mathbf{S}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta}\end{subarray}}\left(\min_{\begin{subarray}{c}\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}_{\geq 0},\\ \sum_{k=0}^{n-1}\mathbf{T}_{k}=\mathbf{S}\end{subarray}}\ \sum_{k=0}^{n-1}\left\langle\mathbf{C}_{k},\mathbf{T}_{k}\right\rangle\right). (16)

The inner problem can be solved analytically and independently for each (i,j)(i,j)-th entry of 𝐓0,,𝐓n1\mathbf{T}_{0},\ldots,\mathbf{T}_{n-1}; the optimal solution is given by 15, and the optimal objective function value is 𝐆,𝐒\left\langle{\mathbf{G}},{\mathbf{S}}\right\rangle. Next, we solve the outer optimization problem for 𝐒\mathbf{S}. Because 𝐒0m×m,𝐒𝟏m=𝜶\mathbf{S}\in\mathbb{R}_{\geq 0}^{m\times m},\mathbf{S}\mathbf{1}_{m}=\boldsymbol{\alpha}, 𝐒𝟏m=𝜷\mathbf{S}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta} and the objective function is 𝐆,𝐒\left\langle{\mathbf{G}},{\mathbf{S}}\right\rangle, this optimization problem is identical with 13. ∎

Theorem 1 indicates that C-LOT 1 can be reduced to the small LOT 13, which has significantly fewer m2m^{2} variables than d2=m2n2d^{2}=m^{2}n^{2} variables of the original C-LOT 1. Therefore, we will obtain the optimal solution to C-LOT 1 by solving the small LOT 13 instead. The proposed algorithm is summarized in Algorithm 1. Also, Figure 1 shows the overview of this algorithm.

We evaluate the time complexity of Algorithm 1. The time complexity depends on the algorithm to solve the small LOT 13. We here use the network simplex algorithm, the most popular algorithm to solve LOT, to evaluate the time complexity. Tarjan (1997) showed that the time complexity of the network simplex algorithm to solve LOT 1 with the regularizer 2 is O(d3logdlog(d𝐂))O(d^{3}\log d\log(d\left\|\mathbf{C}\right\|_{\infty})), where 𝐂maxi,j|Cij|\left\|\mathbf{C}\right\|_{\infty}\coloneqq\max_{i,j}|C_{ij}|. This enables the time complexity of line 2 in Algorithm 1 to be bounded by O(m3logmlog(m𝐂))O(m^{3}\log m\log(m\left\|\mathbf{C}\right\|_{\infty})). Because line 1 and lines 3–7 can be conducted in O(d2)O(d^{2}) time, the total time complexity of Algorithm 1 is O(m3logmlog(m𝐂)+d2)O(m^{3}\log m\log(m\left\|\mathbf{C}\right\|_{\infty})+d^{2}). This is significantly improved from O(d3logdlog(d𝐂))O(d^{3}\log d\log(d\left\|\mathbf{C}\right\|_{\infty})) when solving C-LOT 1 directly.

Algorithm 1 Fast Algorithm for C-LOT
1:𝐚,𝐛Δd\mathbf{a},\mathbf{b}\in\Delta^{d} and 𝐂0d×d\mathbf{C}\in\mathbb{R}^{d\times d}_{\geq 0} under Assumption 1.
2:Compute 𝐆\mathbf{G} whose entry is given by 14.
3:Compute the optimal solution 𝐒\mathbf{S}^{*} to 13.
4:for i,j,ki,j,k do
5:     Compute TijkT_{ijk} by the relationship 15
6:end for
7:Compute 𝐓\mathbf{T} by Lemma 1 with (Tijk)(T_{ijk})
8:return 𝐓\mathbf{T}

5.3 Algorithm for C-SROT

We propose a fast algorithm for cyclic SROT (C-SROT) which is the special case of C-ROT 1 where ϕ\phi is a strongly convex regularizer. Note that because ϕ\phi defined by 2 is not strongly convex, we cannot apply this algorithm to C-LOT.

The following theorem follows from Fenchel’s duality theorem and optimality conditions in convex analysis (see, e.g., (Rockafellar 1970, Section 31)).

Theorem 2.

The Fenchel dual of the problem 11 is

max𝐰,𝐳m𝐰,𝜶+𝐳,𝜷k=0n1i,j=0m1ϕ(wi+zjCijk),\displaystyle\begin{aligned} \max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ &\langle{\mathbf{w}},{\boldsymbol{\alpha}}\rangle+\langle{\mathbf{z}},{\boldsymbol{\beta}}\rangle\\ &-\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi^{\star}(w_{i}+z_{j}-C_{ijk}),\end{aligned} (17)

where ϕ:{+}\phi^{\star}:\mathbb{R}\to\mathbb{R}\cup\{+\infty\} is the Fenchel conjugate of ϕ\phi defined by ϕ(y)sup{yxϕ(x)x}\phi^{\star}(y)\coloneqq\sup\{yx-\phi(x)\mid x\in\mathbb{R}\}. Also, the optimal solutions to the problem 11, 𝐓k\mathbf{T}^{*}_{k}, and to the problem 17, 𝐰\mathbf{w}^{*} and 𝐳\mathbf{z}^{*}, have the following relationship:

Tijk=(ϕ)(wi+zjCijk).\displaystyle T^{*}_{ijk}=\left(\phi^{\star}\right)^{\prime}(w^{*}_{i}+z^{*}_{j}-C_{ijk}). (18)

The proof is shown in Appendix C.

Note that ϕ\phi^{\star} is a smooth and differentiable convex function because ϕ\phi is strongly convex. Theorem 2 indicates that we will obtain the optimal solution to C-SROT 1 by solving the dual problem 17 instead because we can reconstruct it by the relationship 18 and Lemma 1.

We here propose to apply the alternating minimization algorithm (Beck 2017, Chapter 14) to 17; we iteratively optimize the objective function of 17 with respect to 𝐰\mathbf{w} while fixing 𝐳\mathbf{z}, and vice versa. When we fix 𝐳\mathbf{z}, the partial derivative of the objective function with respect to wiw_{i} is

αik=0n1j=0m1(ϕ)(wi+zjCijk),\displaystyle\alpha_{i}-\sum_{k=0}^{n-1}\sum_{j=0}^{m-1}\left(\phi^{\star}\right)^{\prime}(w_{i}+z_{j}-C_{ijk}), (19)

and wiw_{i} is optimal if 19 equals to 0. Because 19 monotonically decreases with respect to wiw_{i}, we can find such wiw_{i} easily by, e.g., the well-known Newton’s method. This also applies to the optimization with respect to 𝐳\mathbf{z} while fixing 𝐰\mathbf{w}. The alternating minimization algorithm for a smooth convex function is guaranteed to attain fast convergence (see (Beck and Tetruashvili 2013) for more details).

The distinguishing feature of this algorithm is treating a few dual variables. If the alternating minimization algorithm is used for the dual problem of 1 without considering cyclic symmetry, the number of dual variables is 2d=2mn2d=2mn. In contrast, our algorithm treats only 2m2m dual variables, which is significantly reduced to 1n\frac{1}{n}. Therefore, the computational time required for one iteration in the alternating minimization will be considerably reduced.

5.4 Algorithm for C-EROT

We here propose a fast algorithm for cyclic EROT (C-EROT), which is the crucial special case of C-ROT 1 where ϕ\phi is given by 3. Because 3 is strongly convex, we can apply the cyclic-aware alternating minimization algorithm introduced in Section 5.3 to C-EROT.

Because ϕ(y)=λexp(yλ)\phi^{\star}(y)=\lambda\exp(\frac{y}{\lambda}), 19 can be written as

αiexp(wiλ)j=0m1Kijexp(zjλ),\displaystyle\alpha_{i}-\exp\left(\frac{w_{i}}{\lambda}\right)\sum_{j=0}^{m-1}K_{ij}\exp\left(\frac{z_{j}}{\lambda}\right), (20)

where

Kijk=0n1exp(Cijkλ).\displaystyle K_{ij}\coloneqq\sum_{k=0}^{n-1}\exp\left(-\frac{C_{ijk}}{\lambda}\right). (21)

From 20, we can get optimal wiw_{i} in closed form:

wi=λ(logαilog(j=0m1Kijexp(zjλ))).w_{i}=\lambda\left(\log\alpha_{i}-\log\left(\sum_{j=0}^{m-1}K_{ij}\exp\left(\frac{z_{j}}{\lambda}\right)\right)\right). (22)

We can rewrite 22 and describe the optimal qjq_{j} as follows:

pi=αij=0m1Kijqj,qj=βji=0m1Kijpi,p_{i}=\frac{\alpha_{i}}{\sum_{j=0}^{m-1}K_{ij}q_{j}},\quad q_{j}=\frac{\beta_{j}}{\sum_{i=0}^{m-1}K_{ij}p_{i}}, (23)

where piexp(wiλ),qjexp(zjλ)p_{i}\coloneqq\exp\left(\frac{w_{i}}{\lambda}\right),\ q_{j}\coloneqq\exp\left(\frac{z_{j}}{\lambda}\right). This algorithm resembles the Sinkhorn algorithm (Cuturi 2013); we call it cyclic Sinkhorn algorithm. Note that the optimal solution 𝐓\mathbf{T} to C-EROT 1 can be easily reconstructed from the optimal 𝐰\mathbf{w} and 𝐳\mathbf{z} by 18 and Lemma 1. The proposed algorithm is summarized in Algorithm 2.

Algorithm 2 Fast Cyclic Sinkhorn Algorithm for C-EROT
1:𝐚,𝐛Δd,𝐂0d×d\mathbf{a},\mathbf{b}\in\Delta^{d},\mathbf{C}\in\mathbb{R}^{d\times d}_{\geq 0} under Assumption 1 and λ>0\lambda>0.
2:Compute 𝐊\mathbf{K} whose entry is given by 21.
3:Initialize 𝐪𝟏m\mathbf{q}\leftarrow\mathbf{1}_{m}.
4:repeat
5:     𝐩𝜶(𝐊𝐪)\mathbf{p}\leftarrow\boldsymbol{\alpha}\oslash(\mathbf{K}\mathbf{q}) \triangleright \oslash denotes elementwise division
6:     𝐪𝜷(𝐊𝐩)\mathbf{q}\leftarrow\boldsymbol{\beta}\oslash(\mathbf{K}^{\top}\mathbf{p})
7:until convergence
8:for i,j,ki,j,k do
9:     Tijkpiqjexp(Cijkλ)T_{ijk}\leftarrow{p_{i}q_{j}}\exp\left(-\frac{C_{ijk}}{\lambda}\right)
10:end for
11:Compute 𝐓\mathbf{T} by Lemma 1 with (Tijk)(T_{ijk})
12:return 𝐓\mathbf{T}

We evaluate the time complexity of Algorithm 2. The time complexity depends on the matrix-vector product iterations in lines 4 and 5 in Algorithm 2 to solve the Fenchel dual problem 17. In the original Sinkhorn algorithm, the time complexity of each iteration is O(d2)=O(m2n2)O(d^{2})=O(m^{2}n^{2}) (Cuturi 2013). In contrast, in our cyclic Sinkhorn algorithm, the time complexity of each iteration is O(m2)O(m^{2}); thus, our algorithm solves C-EROT significantly faster than the original Sinkhorn algorithm.

6 Two-Stage Algorithm for C-EROT with Approximate Cyclic Symmetry

There are many real-world cases in which input data show only approximate cyclic symmetry. In Example 1, 𝐂\mathbf{C} satisfies Assumption 1 strictly when using the pixel-wise Euclidean distance, but input distributions 𝐚,𝐛\mathbf{a},\mathbf{b} (namely, images) often satisfy Assumption 1 only approximately due to slight noise and displacement. Thus, the above-proposed algorithms cannot be applied to such cases because they assume to satisfy Assumption 1 strictly. To overcome this issue, we here propose a fast two-stage Sinkhorn algorithm for C-EROT with approximate cyclic symmetry. Because EROT is commonly used thanks to its differentiability and computational efficiency (Peyré and Cuturi 2019; Guo, Ho, and Jordan 2020), we focused on C-EROT here. The two-stage Sinkhorn algorithm first runs the cyclic Sinkhorn algorithm (Algorithm 2) to quickly obtain a strict symmetric solution. It then runs the original Sinkhorn algorithm (Cuturi 2013) to modify the solution. Therefore, this algorithm obtains the optimal solution to C-EROT with approximate cyclic symmetry faster by utilizing cyclic symmetry at the first stage. The proposed algorithm is described in Algorithm 3.

Algorithm 3 Fast Two-Stage Algorithm for C-EROT with Approximate Cyclic Symmetry
1:𝐚,𝐛Δd\mathbf{a},\mathbf{b}\in\Delta^{d}, 𝐂0d×d\mathbf{C}\in\mathbb{R}^{d\times d}_{\geq 0} and λ>0\lambda>0.
2:// Stage1: Cyclic Sinkhorn algorithm
3:for i=0,,m1i=0,\dots,m-1 do
4:     αi=1nk=0n1ai+mk\alpha_{i}=\frac{1}{n}\sum_{k=0}^{n-1}a_{i+mk} \triangleright the average of nn-divided 𝐚\mathbf{a}
5:     βi=1nk=0n1bi+mk\beta_{i}=\frac{1}{n}\sum_{k=0}^{n-1}b_{i+mk} \triangleright the average of nn-divided 𝐛\mathbf{b}
6:end for
7:Compute 𝐊\mathbf{K} whose entry is given by 21.
8:Initialize 𝐪^𝟏m\widehat{\mathbf{q}}\leftarrow\mathbf{1}_{m}.
9:repeat
10:     𝐩^𝜶(𝐊𝐪^)\widehat{\mathbf{p}}\leftarrow\boldsymbol{\alpha}\oslash(\mathbf{K}\widehat{\mathbf{q}}) \triangleright \oslash denotes elementwise division
11:     𝐪^𝜷(𝐊𝐩^)\widehat{\mathbf{q}}\leftarrow\boldsymbol{\beta}\oslash(\mathbf{K}^{\top}\widehat{\mathbf{p}})
12:until convergence
13:// Stage2: Sinkhorn algorithm (Cuturi 2013)
14:Initialize 𝐩,𝐪\mathbf{p},\mathbf{q} as the nn concatenated 𝐩^,𝐪^\widehat{\mathbf{p}},\widehat{\mathbf{q}}, respectively.
15:Compute Kij=exp(Cijλ)K_{ij}=\exp\left(-\frac{C_{ij}}{\lambda}\right).
16:repeat
17:     𝐩𝐚(𝐊𝐪)\mathbf{p}\leftarrow\mathbf{a}\oslash(\mathbf{K}\mathbf{q})
18:     𝐪𝐛(𝐊𝐩)\mathbf{q}\leftarrow\mathbf{b}\oslash(\mathbf{K}^{\top}\mathbf{p})
19:until convergence
20:return 𝐓diag(𝐩)𝐊diag(𝐪)\mathbf{T}\leftarrow\mathrm{diag}(\mathbf{p})\mathbf{K}\mathrm{diag}(\mathbf{q})

If satisfying Assumption 1 strictly, the time complexity of this algorithm is the same as that of the cyclic Sinkhorn algorithm. If not, it will be complex due to mixing the two Sinkhorn algorithms at Stages 1 and 2. This analysis is for future research, but we experimentally confirmed that this algorithm shows fast computation when input data have approximate cyclic symmetry in Section 7.2.

7 Experiments

To validate the effectiveness of our algorithms, we conducted experiments on synthetic/real-world data that satisfy Assumption 1 strictly/approximately. In all experiments, we evaluated whether our algorithms, which solve small optimization problems instead of the original OT, show the same results as the original OT but with faster computation. These experiments were performed on a Windows laptop with Intel Core i7-10750H CPU, 32 GB memory. All the codes were implemented in Python.

Table 1: Experimental results in synthetic data. “Obj. value” indicates objective function value.
Algorithm nn d=5000d=5000 d=10000d=10000
Obj. value Marginal error Time (sec.) Obj. value Marginal error Time (sec.)
Network Simplex 6.034±0.8246.034\pm 0.824 0.000±0.0000.000\pm 0.000 6.523±1.0136.523\pm 1.013 6.526±0.9176.526\pm 0.917 0.000±0.0000.000\pm 0.000 33.660±3.23833.660\pm 3.238
Cyclic Network Simplex 2 6.034±0.8246.034\pm 0.824 0.000±0.0000.000\pm 0.000 1.477±0.2351.477\pm 0.235 6.526±0.9176.526\pm 0.917 0.000±0.0000.000\pm 0.000 7.084±0.7287.084\pm 0.728
5 6.034±0.8246.034\pm 0.824 0.000±0.0000.000\pm 0.000 0.300±0.0300.300\pm 0.030 6.526±0.9176.526\pm 0.917 0.000±0.0000.000\pm 0.000 1.391±0.1551.391\pm 0.155
10 6.034±0.8246.034\pm 0.824 0.000±0.0000.000\pm 0.000 0.136±0.0260.136\pm 0.026 6.526±0.9176.526\pm 0.917 0.000±0.0000.000\pm 0.000 0.618±0.0730.618\pm 0.073
25 6.034±0.8246.034\pm 0.824 0.000±0.0000.000\pm 0.000 0.080±0.0190.080\pm 0.019 6.526±0.9176.526\pm 0.917 0.000±0.0000.000\pm 0.000 0.381±0.0440.381\pm 0.044
50 6.034±0.8246.034\pm 0.824 0.000±0.0000.000\pm 0.000 0.056±0.0150.056\pm 0.015 6.526±0.9176.526\pm 0.917 0.000±0.0000.000\pm 0.000 0.329±0.0340.329\pm 0.034
Sinkhorn 6.233±0.8216.233\pm 0.821 0.000±0.0000.000\pm 0.000 3.271±1.4453.271\pm 1.445 6.745±0.9166.745\pm 0.916 0.000±0.0000.000\pm 0.000 14.589±4.21314.589\pm 4.213
Cyclic Sinkhorn 2 6.233±0.8216.233\pm 0.821 0.000±0.0000.000\pm 0.000 0.918±0.4630.918\pm 0.463 6.745±0.9166.745\pm 0.916 0.000±0.0000.000\pm 0.000 3.973±0.9223.973\pm 0.922
5 6.233±0.8216.233\pm 0.821 0.000±0.0000.000\pm 0.000 0.207±0.1700.207\pm 0.170 6.745±0.9166.745\pm 0.916 0.000±0.0000.000\pm 0.000 1.262±0.3241.262\pm 0.324
10 6.233±0.8216.233\pm 0.821 0.000±0.0000.000\pm 0.000 0.116±0.0360.116\pm 0.036 6.745±0.9166.745\pm 0.916 0.000±0.0000.000\pm 0.000 0.636±0.2590.636\pm 0.259
25 6.233±0.8216.233\pm 0.821 0.000±0.0000.000\pm 0.000 0.093±0.0360.093\pm 0.036 6.745±0.9166.745\pm 0.916 0.000±0.0000.000\pm 0.000 0.381±0.1270.381\pm 0.127
50 6.233±0.8216.233\pm 0.821 0.000±0.0000.000\pm 0.000 0.067±0.0340.067\pm 0.034 6.745±0.9166.745\pm 0.916 0.000±0.0000.000\pm 0.000 0.320±0.0530.320\pm 0.053
Table 2: Experimental results in real-world data.
Algorithm (h,w)=(64,64),d=4096(h,w)=(64,64),\quad d=4096 (h,w)=(96,96),d=9216(h,w)=(96,96),\quad d=9216
Obj. value Marginal error Time (sec.) Obj. value Marginal error Time (sec.)
Sinkhorn 4.320±2.0564.320\pm 2.056 0.000±0.0000.000\pm 0.000 16.610±6.50216.610\pm 6.502 6.296±3.1006.296\pm 3.100 0.000±0.0000.000\pm 0.000 117.152±53.442117.152\pm 53.442
Cyclic Sinkhorn 4.289±2.0484.289\pm 2.048 0.001±0.0010.001\pm 0.001 3.837±1.2863.837\pm 1.286 6.250±3.0896.250\pm 3.089 0.001±0.0010.001\pm 0.001 26.087±11.98526.087\pm 11.985
Two-Stage Sinkhorn 4.320±2.0564.320\pm 2.056 0.000±0.0000.000\pm 0.000 13.877±6.24413.877\pm 6.244 6.296±3.1006.296\pm 3.100 0.000±0.0000.000\pm 0.000 91.790±43.00091.790\pm 43.000

7.1 Synthetic Data w/ Strict Cyclic Symmetry

We created 2020 synthetic random data for each of the two dimensions, d{5000,10000}d\in\{5000,10000\}, that satisfy Assumption 1 strictly in n=50n=50 (for details, see Appendix D). For validation, we evaluated the average and standard deviation over each 2020 data of the objective function values, marginal constraint errors defined by 𝐓𝟏d𝐛2||\mathbf{T}^{\top}\mathbf{1}_{d}-\mathbf{b}||_{2}, and the computation time when using different algorithms: the network simplex algorithm (Ahuja, Magnanti, and Orlin 1993), Algorithm 1 using the network simplex algorithm in line 2 (we call it cyclic network simplex algorithm), the Sinkhorn algorithm (Cuturi 2013), and the cyclic Sinkhorn algorithm (Algorithm 2). We set λ=0.5\lambda=0.5 for the regularizer 3. The computation time was recorded between inputting the data and outputting the optimal solution. Because these synthetic data also satisfy Assumption 1 for all nn that are divisors of 5050, namely n{2,5,10,25,50}n\in\{2,5,10,25,50\}, we conducted experiments for each nn; larger nn leads to smaller problems that output the same result. The network simplex algorithm was implemented using LEMON (Dezső, Jüttner, and Kovács 2011).

Table 1 lists the results. The network simplex algorithm and cyclic one had the same optimal objective function value, but the latter showed faster computation times as nn becomes larger. This was also the case when using the Sinkhorn algorithm and the cyclic one. These results support the effectiveness of our proposed algorithms; higher cyclic symmetry (i.e., larger nn) results in faster computation time.

7.2 Real Data w/ Approximate Cyclic Symmetry

For real-world data, we tested the case of mirror symmetry (n=2n=2) in Example 1 with the NYU Symmetry Database (Cicconet et al. 2017). In this dataset, we selected 2020 images with mirror symmetry along the vertical axis (the images are shown in Appendix E). These images were converted to gray-scale, resized to be 64×6464\times 64 or 96×9696\times 96 pixels, and normalized so that the sum of the intensity is 11. We then obtained 𝐚,𝐛\mathbf{a},\mathbf{b} by 7 and 𝐂\mathbf{C} by the pixel-wise Euclidean distance. For validation, we evaluated the same metrics as in Section 7.1 over 190(=C220)190(={}_{20}\mathrm{C}_{2}) image pairs. Because EROT is commonly used in real applications (Peyré and Cuturi 2019), we focused on C-EROT here and compared the Sinkhorn algorithm, the cyclic one (Algorithm 2), and the two-stage one (Algorithm 3). Note that, in the two-stage Sinkhorn algorithm, we stopped Stage 1 before the end of convergence to prevent the solution far from the optimal one for real images (for details, see Appendix F). We set λ\lambda as the same as in Section 7.1 for the regularizer 3.

Table 2 lists the results. The cyclic Sinkhorn algorithm showed the fastest computation time. However, because this algorithm assumes to satisfy Assumption 1 strictly, its objective function value differed from that of the original Sinkhorn algorithm, and marginal error occurred. In contrast, the two-stage Sinkhorn algorithm showed the same objective function value as that of the original one and no marginal error but with faster computation time than using the original one. These results indicate that the cyclic Sinkhorn algorithm can be a good choice for real-world data because of its fastest computation time if users tolerate the objective function value difference and the marginal error. If not, the two-stage Sinkhorn algorithm is promising for real-world data, which solves C-EROT with approximate cyclic symmetry faster than the original Sinkhorn algorithm.

8 Discussions and Limitations

Through this paper, we confirmed that our algorithms can solve C-ROT faster. For further progress, we discuss the following future issues. (I) In Assumption 1, we assume knowing the cyclic order nn in advance. Because cyclic symmetry arises naturally from the physical structure of input data, this assumption is reasonable in some real-world cases. However, we must improve our algorithms for unknown-order cyclic symmetry. (II) It is unknown whether our algorithms can be generalized for other symmetries, e.g., dihedral symmetry (Gatermann and Parrilo 2004). Further development of our algorithms for general symmetries remains as future work. (III) The main contribution of this paper is showing the utilization of cyclic symmetry in OT with theoretical proofs, but we must test our algorithms in various real-world data for further development.

9 Conclusion

We proposed novel fast algorithms for OT with cyclic symmetry. We showed that such OT can be reduced to a smaller optimization problem that has significantly fewer variables as higher cyclic symmetry exists in the input data. Our algorithms solve the small problem instead of the original OT and achieve fast computation. Through experiments, we confirmed the effectiveness of our algorithms in synthetic/real-world data with strict/approximate cyclic symmetry. This paper cultivates a new research direction, OT with symmetry, and paves the way for future research.

References

  • Ahuja, Magnanti, and Orlin (1993) Ahuja, R. K.; Magnanti, T. L.; and Orlin, J. B. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc.
  • Alaya et al. (2019) Alaya, M. Z.; Berar, M.; Gasso, G.; and Rakotomamonjy, A. 2019. Screening Sinkhorn Algorithm for Regularized Optimal Transport. In Advances in Neural Information Processing Systems.
  • Altschuler et al. (2019) Altschuler, J.; Bach, F.; Rudi, A.; and Niles-Weed, J. 2019. Massively scalable Sinkhorn distances via the Nyström method. In Advances in Neural Information Processing Systems.
  • Altschuler, Niles-Weed, and Rigollet (2017) Altschuler, J.; Niles-Weed, J.; and Rigollet, P. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Advances in Neural Information Processing Systems.
  • Beck (2017) Beck, A. 2017. First-Order Methods in Optimization. Society for Industrial and Applied Mathematics.
  • Beck and Tetruashvili (2013) Beck, A.; and Tetruashvili, L. 2013. On the convergence of block coordinate descent type methods. SIAM journal on Optimization, 23(4): 2037–2060.
  • Blondel, Seguy, and Rolet (2018) Blondel, M.; Seguy, V.; and Rolet, A. 2018. Smooth and Sparse Optimal Transport. In International Conference on Artificial Intelligence and Statistics.
  • Bonchev (1991) Bonchev, D. 1991. Chemical graph theory: introduction and fundamentals, volume 1. CRC Press.
  • Bonneel, Peyré, and Cuturi (2016) Bonneel, N.; Peyré, G.; and Cuturi, M. 2016. Wasserstein Barycentric Coordinates: Histogram Regression Using Optimal Transport. ACM Transactions on Graphics, 35(4).
  • Boyd and Vandenberghe (2004) Boyd, S.; and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press.
  • Cicconet et al. (2017) Cicconet, M.; Birodkar, V.; Lund, M.; Werman, M.; and Geiger, D. 2017. A convolutional approach to reflection symmetry. Pattern Recognition Letters, 95: 44–50.
  • Courty et al. (2017) Courty, N.; Flamary, R.; Tuia, D.; and Rakotomamonjy, A. 2017. Optimal Transport for Domain Adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9): 1853–1865.
  • Cuturi (2013) Cuturi, M. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Advances in Neural Information Processing Systems.
  • Dezső, Jüttner, and Kovács (2011) Dezső, B.; Jüttner, A.; and Kovács, P. 2011. LEMON – an Open Source C++ Graph Template Library. Electronic Notes in Theoretical Computer Science, 264(5): 23–45. Second Workshop on Generative Technologies.
  • Dieleman, Willett, and Dambre (2015) Dieleman, S.; Willett, K. W.; and Dambre, J. 2015. Rotation-invariant convolutional neural networks for galaxy morphology prediction. Monthly Notices of the Royal Astronomical Society, 450(2): 1441–1459.
  • Dvurechensky, Gasnikov, and Kroshnin (2018) Dvurechensky, P.; Gasnikov, A.; and Kroshnin, A. 2018. Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn’s Algorithm. In International Conference on Machine Learning.
  • Gatermann and Parrilo (2004) Gatermann, K.; and Parrilo, P. A. 2004. Symmetry groups, semidefinite programs, and sums of squares. Journal of Pure and Applied Algebra, 192(1-3): 95–128.
  • Getreuer (2013) Getreuer, P. 2013. A Survey of Gaussian Convolution Algorithms. Image Processing On Line, 3: 286–310.
  • Guillaume (2012) Guillaume, C. 2012. Optimal transportation and economic applications. Lecture Notes.
  • Guo, Ho, and Jordan (2020) Guo, W.; Ho, N.; and Jordan, M. 2020. Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter. In International Conference on Artificial Intelligence and Statistics.
  • Howard (1965) Howard, E. 1965. Garden Cities of To-Morrow, volume 23. Mit Press.
  • Jaffé and Orchin (2002) Jaffé, H. H.; and Orchin, M. 2002. Symmetry in chemistry. Courier Corporation.
  • Kantorovich (1942) Kantorovich, L. 1942. On the transfer of masses (in Russian). In Doklady Akademii Nauk, volume 37, 227.
  • Kusner et al. (2015) Kusner, M.; Sun, Y.; Kolkin, N.; and Weinberger, K. 2015. From Word Embeddings To Document Distances. In International Conference on Machine Learning.
  • Ladd (2014) Ladd, M. F. C. 2014. Symmetry of crystals and molecules. Oxford University Press, USA.
  • Lin, Ho, and Jordan (2019a) Lin, T.; Ho, N.; and Jordan, M. 2019a. On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms. In International Conference on Machine Learning.
  • Lin, Ho, and Jordan (2019b) Lin, T.; Ho, N.; and Jordan, M. I. 2019b. On the Acceleration of the Sinkhorn and Greenkhorn Algorithms for Optimal Transport. In ArXiv Preprint: 1906.01437v1.
  • Liu et al. (2020) Liu, Y.; Zhu, L.; Yamada, M.; and Yang, Y. 2020. Semantic Correspondence as an Optimal Transport Problem. In Computer Vision and Pattern Recognition.
  • Monge (1781) Monge, G. 1781. Mémoire sur la théorie des déblais et des remblais. De l’Imprimerie Royale.
  • Nikolentzos, Meladianos, and Vazirgiannis (2017) Nikolentzos, G.; Meladianos, P.; and Vazirgiannis, M. 2017. Matching Node Embeddings for Graph Similarity. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.
  • Pang et al. (2022) Pang, S.; Du, A.; Orgun, M. A.; Wang, Y.; Sheng, Q. Z.; Wang, S.; Huang, X.; and Yu, Z. 2022. Beyond CNNs: Exploiting Further Inherent Symmetries in Medical Image Segmentation. IEEE Transactions on Cybernetics, 1–12.
  • Petric Maretic et al. (2019) Petric Maretic, H.; El Gheche, M.; Chierchia, G.; and Frossard, P. 2019. GOT: An Optimal Transport framework for Graph comparison. In Advances in Neural Information Processing Systems.
  • Peyré and Cuturi (2019) Peyré, G.; and Cuturi, M. 2019. Computational Optimal Transport: With Applications to Data Science. Now Publishers.
  • Pham et al. (2020) Pham, K.; Le, K.; Ho, N.; Pham, T.; and Bui, H. 2020. On Unbalanced Optimal Transport: An Analysis of Sinkhorn Algorithm. In International Conference on Machine Learning.
  • Rockafellar (1970) Rockafellar, R. T. 1970. Convex Analysis, volume 18 of Princeton Mathematical Series. New Jersey: Princeton University Press.
  • Shamai et al. (2015) Shamai, G.; Aflalo, Y.; Zibulevsky, M.; and Kimmel, R. 2015. Classical Scaling Revisited. In International Conference on Computer Vision.
  • Sinkhorn (1967) Sinkhorn, R. 1967. Diagonal equivalence to matrices with prescribed row and column sums. In The American Mathematical Monthly, 402–405.
  • Solomon et al. (2015) Solomon, J.; de Goes, F.; Peyré, G.; Cuturi, M.; Butscher, A.; Nguyen, A.; Du, T.; and Guibas, L. 2015. Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains. ACM Transactions on Graphics, 34(4).
  • Tarjan (1997) Tarjan, R. E. 1997. Dynamic trees as search trees via euler tours, applied to the network simplex algorithm. Mathematical Programming, 78(2): 169–177.
  • Tenetov, Wolansky, and Kimmel (2018) Tenetov, E.; Wolansky, G.; and Kimmel, R. 2018. Fast Entropic Regularized Optimal Transport Using Semidiscrete Cost Approximation. SIAM Journal on Scientific Computing, 40(5): A3400–A3422.
  • Wu, Rupprecht, and Vedaldi (2023) Wu, S.; Rupprecht, C.; and Vedaldi, A. 2023. Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild (Invited Paper). IEEE Trans. Pattern Anal. Mach. Intell., 45(4): 5268–5281.
  • Xie and Grossman (2018) Xie, T.; and Grossman, J. C. 2018. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett., 120: 145301.
  • Zhao et al. (2003) Zhao, W.; Chellappa, R.; Phillips, P. J.; and Rosenfeld, A. 2003. Face Recognition: A Literature Survey. ACM Comput. Surv., 35(4): 399–458.

Appendices: Optimal Transport with Cyclic Symmetry

Appendix A Simple Counter-Example to the Intuitive Utilization of Cyclic Symmetry in OT

As explained in Section 1, the intuitive way, which solves OT for only one of the symmetric components of input data and concatenates nn copies of the obtained solution, cannot work well. To explain this reason clearly, we here present a simple counter-example to this intuitive utilization of cyclic symmetry in OT.

We consider the 9090^{\circ} rotational symmetry case (n=4n=4) of Example 1 in Section 4 with the following gray images.

Refer to caption
Figure A1: Images with 9090^{\circ} rotational symmetry (n=4n=4).

These images obviously have 9090^{\circ} rotational symmetry.

Intuitively, to utilize the 9090^{\circ} rotational symmetry in the above images, it looks good if we consider OT for only one of the symmetric parts of the images, i.e., only the red rectangular areas in Figure A1. However, when the cost matrix 𝐂\mathbf{C} is given by the pixel-wise Euclidean distance matrix, the optimal transportation plan for the (0,1)(0,1)-th entry (the value is 0.250.25) in the left image is to be transported toward (0,2)(0,2)-th entry (the value is 0.250.25), beyond the red rectangular area, in the right image. Thus, the OT for only one of the symmetric parts (the red rectangular areas) of the above images will give an incorrect transportation plan.

Therefore, the intuitive utilization of cyclic symmetry, i.e., solving OT for only one of the symmetric components of input data and concatenating nn copies of the obtained solution, cannot work well. Consequently, we must consider the interaction between the symmetric components and develop novel fast algorithms that appropriately utilize cyclic symmetry with theoretical proofs, leading to our algorithms in the main paper.

Appendix B Proof of Lemma 1

Proof.

Let 𝐓\mathbf{T}^{\prime} be an optimal solution to 1. We define 𝐓\mathbf{T}^{*} as follows:

𝐓1nk=0n1𝐏k𝐓(𝐏k),\displaystyle\mathbf{T}^{*}\coloneqq\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}\left(\mathbf{P}^{k}\right)^{\top}, (A1)

where

𝐏(𝐎m𝐎m𝐎m𝐈m𝐈m𝐎m𝐎m𝐎m𝐎m𝐎m𝐎m𝐎m𝐎m𝐈m𝐎m)\displaystyle\mathbf{P}\coloneqq\begin{pmatrix}\mathbf{O}_{m}&\mathbf{O}_{m}&\cdots&\mathbf{O}_{m}&\mathbf{I}_{m}\\ \mathbf{I}_{m}&\mathbf{O}_{m}&\cdots&\mathbf{O}_{m}&\mathbf{O}_{m}\\ \mathbf{O}_{m}&\ddots&\ddots&\vdots&\vdots\\ \vdots&\ddots&\ddots&\mathbf{O}_{m}&\mathbf{O}_{m}\\ \mathbf{O}_{m}&\cdots&\mathbf{O}_{m}&\mathbf{I}_{m}&\mathbf{O}_{m}\\ \end{pmatrix} (A2)

is the block-circulant permutation matrix, 𝐈m\mathbf{I}_{m} denotes the m×mm\times m identity matrix, and 𝐎m\mathbf{O}_{m} denotes the m×mm\times m zero matrix.

First, we will show that 𝐓\mathbf{T}^{*} is a feasible solution to 1. 𝐓\mathbf{T}^{*} satisfies the constraints of row summation because

𝐓𝟏d=1nk=0n1𝐏k𝐓(𝐏k)𝟏d=1nk=0n1𝐏k𝐓𝟏d=1nk=0n1𝐏k𝐚=1nk=0n1𝐚=𝐚.\displaystyle\mathbf{T}^{*}\mathbf{1}_{d}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}\left(\mathbf{P}^{k}\right)^{\top}\mathbf{1}_{d}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}\mathbf{1}_{d}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{a}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{a}=\mathbf{a}. (A3)

Similarly, we can show that 𝐓\mathbf{T}^{*} satisfies the constraints of column summation (i.e., (𝐓)𝟏d=𝐛(\mathbf{T}^{*})^{\top}\mathbf{1}_{d}=\mathbf{b}). Thus, 𝐓\mathbf{T}^{*} is a feasible solution to 1.

Next, we will show that 𝐓\mathbf{T}^{*} is an optimal solution to 1. For this purpose, we check the optimal function value of 1 when 𝐓=𝐓\mathbf{T}=\mathbf{T}^{*}. Let f(𝐓)f(\mathbf{T}) be the objective function of 1, we get

f(𝐓)=f(1nk=0n1𝐏k𝐓(𝐏k))1nk=0n1f(𝐏k𝐓(𝐏k))=1nk=0n1f(𝐓)=f(𝐓).\displaystyle f(\mathbf{T}^{*})=f\left(\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}\left(\mathbf{P}^{k}\right)^{\top}\right)\leq\frac{1}{n}\sum_{k=0}^{n-1}f\left(\mathbf{P}^{k}\mathbf{T}^{\prime}\left(\mathbf{P}^{k}\right)^{\top}\right)=\frac{1}{n}\sum_{k=0}^{n-1}f(\mathbf{T}^{\prime})=f(\mathbf{T}^{\prime}). (A4)

Note that, we use the convexity of ff and Jensen’s inequality in the inequality relationship of the above equation. Thus, 𝐓\mathbf{T}^{*} is an optimal solution to 1.

Finally, for l=0,,n1l=0,\ldots,n-1, we get

𝐏l𝐓(𝐏l)=1nk=0n1𝐏k+l𝐓(𝐏k+l)=1nk=0n1𝐏k𝐓(𝐏k)=𝐓.\displaystyle\mathbf{P}^{l}\mathbf{T}^{*}(\mathbf{P}^{l})^{\top}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k+l}\mathbf{T}^{\prime}(\mathbf{P}^{k+l})^{\top}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}(\mathbf{P}^{k})^{\top}=\mathbf{T}^{*}. (A5)

Therefore, 𝐓\mathbf{T}^{*} has a block-circulant structure of 10. ∎

Appendix C Proof of Theorem 2

Proof.

We rewrite 11 with Lagrange multipliers 𝐰\mathbf{w} and 𝐳\mathbf{z} for the two equality constraints as follows:

min𝐓0,,𝐓n1m×mmax𝐰,𝐳mk=0n1𝐂k,𝐓k+k=0n1i,j=0m1ϕ(Tijk)+𝐰,𝜶k=0n1𝐓k𝟏m+𝐳,𝜷k=0n1𝐓k𝟏m\displaystyle\min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}}\ \max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ \sum_{k=0}^{n-1}\left\langle\mathbf{C}_{k},\mathbf{T}_{k}\right\rangle+\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi(T_{ijk})+\left\langle\mathbf{w},\boldsymbol{\alpha}-\sum_{k=0}^{n-1}\mathbf{T}_{k}\mathbf{1}_{m}\right\rangle+\left\langle\mathbf{z},\boldsymbol{\beta}-\sum_{k=0}^{n-1}\mathbf{T}_{k}^{\top}\mathbf{1}_{m}\right\rangle (A6)
=\displaystyle= min𝐓0,,𝐓n1m×mmax𝐰,𝐳m𝐰,𝜶+𝐳,𝜷+k=0n1𝐂k𝐰𝟏𝟏𝐳,𝐓k+k=0n1i,j=0m1ϕ(Tijk).\displaystyle\min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}}\ \max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ \langle{\mathbf{w}},{\boldsymbol{\alpha}}\rangle+\langle{\mathbf{z}},{\boldsymbol{\beta}}\rangle+\sum_{k=0}^{n-1}\left\langle{\mathbf{C}_{k}-\mathbf{w}\mathbf{1}^{\top}-\mathbf{1}\mathbf{z}^{\top}},{\mathbf{T}_{k}}\right\rangle+\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi(T_{ijk}).

Note that Problem 11 is convex and the constraints are linear and that Slater’s constraint qualification holds. Hence, the strong duality holds (see, e.g., (Boyd and Vandenberghe 2004, Section 5.2.3)), and we can swap the min\min- and max\max-operations in LABEL:apeq:problem_CROT_lagrange:

=\displaystyle=\ max𝐰,𝐳mmin𝐓0,,𝐓n1m×m𝐰,𝜶+𝐳,𝜷+k=0n1𝐂k𝐰𝟏𝟏𝐳,𝐓k+k=0n1i,j=0m1ϕ(Tijk)\displaystyle\max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ \min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}}\ \langle{\mathbf{w}},{\boldsymbol{\alpha}}\rangle+\langle{\mathbf{z}},{\boldsymbol{\beta}}\rangle+\sum_{k=0}^{n-1}\left\langle{\mathbf{C}_{k}-\mathbf{w}\mathbf{1}^{\top}-\mathbf{1}\mathbf{z}^{\top}},{\mathbf{T}_{k}}\right\rangle+\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi(T_{ijk}) (A7)
=\displaystyle=\ max𝐰,𝐳m𝐰,𝜶+𝐳,𝜷k=0n1i,j=0m1ϕ(wi+zjCijk).\displaystyle\max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ \langle{\mathbf{w}},{\boldsymbol{\alpha}}\rangle+\langle{\mathbf{z}},{\boldsymbol{\beta}}\rangle-\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi^{\star}(w_{i}+z_{j}-C_{ijk}). (A8)

One of the optimality conditions is

Tijk=(ϕ)(wi+zjCijk).\displaystyle T_{ijk}^{*}=(\phi^{\star})^{\prime}(w_{i}+z_{j}-C_{ijk}). (A9)

Appendix D Further Details of Synthetic Data in Section 7.1

We created synthetic data using the “Random Generator” class in Python. We set the random seed to 0. We here considered that the input data have n(=50)n(=50)-order cyclic symmetry.

For creating the synthetic dd-dimensional input probability vectors 𝐚\mathbf{a} and 𝐛\mathbf{b}, we first sampled 𝜶\boldsymbol{\alpha} and 𝜷\boldsymbol{\beta} by m(=dn)m(=\frac{d}{n})-dimensional uniform distribution with the half-open interval [0.0,1.0)[0.0,1.0). We then created 𝐚\mathbf{a} and 𝐛\mathbf{b} by concatenating nn copies of 𝜶\boldsymbol{\alpha} and 𝜷\boldsymbol{\beta}, respectively, like 4 and normalized them so that the sum of each is 11.

For creating the synthetic input cost matrix 𝐂\mathbf{C}, we first sampled 𝐂0,,𝐂n1\mathbf{C}_{0},\dots,\mathbf{C}_{n-1} by m×mm\times m-dimensional Gaussian distribution with the mean 3.03.0 and the standard deviation 5.05.0; note that we selected these parameters to keep the same order of magnitude of metrics, namely the objective function value and the marginal error, in all experiments in Section 7. We then add the absolute minimum value, namely |mink=0,,n1(mini,j=0,,m1Cijk)|\left|\min_{k=0,\dots,n-1}\left(\min_{i,j=0,\dots,m-1}C_{ijk}\right)\right|, to all entries of 𝐂0,,𝐂n1\mathbf{C}_{0},\dots,\mathbf{C}_{n-1} to ensure their non-negativity. After that, we created 𝐂\mathbf{C} by concatenating nn copies of 𝐂0,,𝐂n1\mathbf{C}_{0},\dots,\mathbf{C}_{n-1} like 5.

Appendix E Selected Images for Real-World Data in Section 7.2

In Section 7.2, we selected the 20 images with mirror symmetry (n=2n=2) in the NYU Symmetry Database (Cicconet et al. 2017). We here show their images in Figure A2. As you see, there are various kinds of images.

Refer to caption
Figure A2: Images with mirror symmetry, which were used in Section 7.2. Note that we show the images after being resized to a square as in Section 7.2.

Appendix F Further Details of the Two-Stage Sinkhorn Algorithm used in Section 7.2

In Section 7.2, we stopped Stage 1 in the two-stage Sinkhorn algorithm before convergence to prevent the solution far from optimal for real images. For details, we first run the cyclic Sinkhorn algorithm until the marginal error (diag(𝐩^)𝐊diag(𝐪^))𝟏m𝜷2||\left(\mathrm{diag}(\widehat{\mathbf{p}})\mathbf{K}\mathrm{diag}(\widehat{\mathbf{q}})\right)^{\top}\mathbf{1}_{m}-\boldsymbol{\beta}||_{2} is below 1.0×1031.0\times 10^{-3}. We then run the original Sinkhorn algorithm until the difference between its objective function value and the value obtained by directly solving C-SROT using the original Sinkhorn algorithm is below 1.0×1041.0\times 10^{-4}.

Appendix Appendix Reference

Boyd, S.; and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press.