\NewDocumentCommand\exsub

s m O m\IfBooleanT#1E_#2[#4]\IfBooleanF#1E_#2#3[#4#3]

Optimal Transport with Cyclic Symmetry

Shoichiro Takeda, Yasunori Akagi, Naoki Marumo, Kenta Niwa

Abstract

We propose novel fast algorithms for optimal transport (OT) utilizing a cyclic symmetry structure of input data. Such OT with cyclic symmetry appears universally in various real-world examples: image processing, urban planning, and graph processing. Our main idea is to reduce OT to a small optimization problem that has significantly fewer variables by utilizing cyclic symmetry and various optimization techniques. On the basis of this reduction, our algorithms solve the small optimization problem instead of the original OT. As a result, our algorithms obtain the optimal solution and the objective function value of the original OT faster than solving the original OT directly. In this paper, our focus is on two crucial OT formulations: the linear programming OT (LOT) and the strongly convex-regularized OT, which includes the well-known entropy-regularized OT (EROT). Experiments show the effectiveness of our algorithms for LOT and EROT in synthetic/real-world data that has a strict/approximate cyclic symmetry structure. Through theoretical and experimental results, this paper successfully introduces the concept of symmetry into the OT research field for the first time.

1 Introduction

Given two probability vectors and a cost matrix, the discrete optimal transport (OT) problem seeks an optimal solution to minimize the cost of transporting the probability vector toward another one. Its total transportation cost is an effective tool that compares two probability vectors. Therefore, OT has been studied in various research areas, e.g., text embedding (Kusner et al. 2015), image matching (Liu et al. 2020), domain adaptation (Courty et al. 2017), graph comparison (Nikolentzos, Meladianos, and Vazirgiannis 2017), and interpolation (Solomon et al. 2015).

There are many formulations for OT. Kantorovich (1942) was the first to formulate OT as the linear programming problem, and the linear OT (LOT) made great progress toward solving OT. Recently, the strongly convex-regularized OT (SROT) has attracted much attention, especially, the entropy-regularized OT (EROT) (Cuturi 2013; Blondel, Seguy, and Rolet 2018; Peyré and Cuturi 2019; Guo, Ho, and Jordan 2020). SROT is superior to LOT in terms of guaranteeing a unique solution and computational stability.

Many algorithms have been studied to solve OT. The network simplex algorithm (Ahuja, Magnanti, and Orlin 1993) is a well-known classical algorithm for LOT and has been widely used. The Sinkhorn algorithm (Cuturi 2013) and primal-dual descent algorithms (Dvurechensky, Gasnikov, and Kroshnin 2018; Guo, Ho, and Jordan 2020) have been proposed to solve EROT faster. Recently, algorithms utilizing special structures of input data have been in the spotlight for solving OT faster, e.g., algorithms that utilize the low-rankness of the input data (Tenetov, Wolansky, and Kimmel 2018; Altschuler et al. 2019). Besides, several algorithms utilize the Gibbs kernel structure of the input cost matrix in the Sinkhorn algorithm, such as separability (Solomon et al. 2015; Bonneel, Peyré, and Cuturi 2016) and translation invariance (Getreuer 2013; Peyré and Cuturi 2019).

Refer to caption — Figure 1: Overview of our algorithm for LOT with cyclic symmetry (C-LOT). This algorithm reduces C-LOT to a small LOT that has significantly fewer variables and solves the small LOT instead, resulting in fast computation. Note that the small cost matrix is not just a part of the original one; it aggregates the original cost matrix on the basis of cyclic symmetry, see 14.

In this paper, we propose novel fast algorithms for OT utilizing a new special structure, cyclic symmetry, of input data. Specifically, we assume $n$ -order cyclic symmetry for the input data; the input $d$ -dimensional probability vector is a concatenation of $n$ copies of an $m(\coloneqq d/n)$ -dimensional vector, and the input $d\times d$ cost matrix is a block-circulant matrix consisting of $n$ matrices with size $m\times m$ (see Assumption 1). Such OT with cyclic symmetry appears universally in various real-world examples: image processing, urban planning, and graph processing (see examples in Section 4). Intuitively, we can obtain an optimal solution to such a problem faster by solving OT for only one of the symmetric components of the input data and concatenating $n$ copies of the obtained solution. However, this approach cannot work due to ignoring interactions between the symmetric components (see Appendix A). Unlike such an intuitive way, we propose novel fast algorithms utilizing cyclic symmetry for two crucial OT formulations: LOT and SROT.

First, we propose a fast algorithm for LOT with cyclic symmetry (C-LOT). Figure 1 shows an overview of this algorithm. Our main idea is to reduce C-LOT, which has $d^{2}$ variables, to a small LOT, which has only $m^{2}$ variables, by utilizing cyclic symmetry. To achieve this reduction, we introduce auxiliary variables considering cyclic symmetry and rewrite C-LOT as a $\min$ - $\min$ optimization problem. Surprisingly, the inner $\min$ problem can be solved analytically, and the $\min$ - $\min$ problem becomes a small LOT. Therefore, this algorithm solves C-LOT faster by solving the small LOT instead. Using the network simplex algorithm to solve the small LOT, its time complexity bound becomes $O(m^{3}\log m\log(m\left\|\mathbf{C}\right\|_{\infty})+d^{2})$ where $\mathbf{C}$ is the cost matrix and $\left\|\mathbf{C}\right\|_{\infty}\coloneqq\max_{i,j}|C_{ij}|$ . This is greatly improved from $O(d^{3}\log d\log(d\left\|\mathbf{C}\right\|_{\infty}))$ when solving C-LOT directly.

Next, we propose a fast algorithm for SROT with cyclic symmetry (C-SROT). Unlike C-LOT, we cannot reduce C-SROT to a small SROT due to the regularizer. To overcome this issue, we consider the Fenchel dual of C-SROT. By utilizing cyclic symmetry, we show that the Fenchel dual problem has only $2m$ variables, which is significantly fewer than the $2d$ variables in the naive dual of C-SROT. Therefore, this algorithm solves the small Fenchel dual problem by the alternating minimization algorithm (Beck 2017, Chapter 14). Since the number of variables is very few, its time complexity for one iteration will be reduced, resulting in fast computation as a whole. Especially, this algorithm for EROT with cyclic symmetry (C-EROT), which is a subclass of C-SROT, becomes a Sinkhorn-like algorithm. We call it cyclic Sinkhorn algorithm. The interesting point is that the Gibbs kernel in the cyclic Sinkhorn algorithm differs from that in the original Sinkhorn algorithm and is designed by considering cyclic symmetry. Its time complexity bound of each iteration is $O(m^{2})$ , which is significantly improved from $O(d^{2})$ when solving C-EROT by the original Sinkhorn algorithm.

Finally, we propose a two-stage Sinkhorn algorithm for C-EROT with approximate cyclic symmetry. In the real world, there are many cases where the input data exhibit only approximate cyclic symmetry due to slight noise and displacement. The cyclic Sinkhorn algorithm cannot be applied to such cases because strict cyclic symmetry of the input data is assumed. To overcome this issue, the two-stage Sinkhorn algorithm first runs the cyclic Sinkhorn algorithm to quickly obtain a strict symmetric solution. It then runs the original Sinkhorn algorithm to modify the solution. As a result, this algorithm obtains the optimal solution to C-EROT with approximate cyclic symmetry faster by utilizing cyclic symmetry at the first stage. In Section 7.2, we experimentally confirmed the fast computation of this algorithm when input data have approximate cyclic symmetry.

In summary, this paper introduces the concept of symmetry into the OT research field for the first time and proposes fast cyclic symmetry-aware algorithms that solve small optimization problems instead of the original OT. We validated the effectiveness of our algorithms in synthetic/real-world data with a strict/approximate cyclic symmetry structure.

2 Related Work

OT was initially formulated by (Monge 1781). Later (Kantorovich 1942) relaxed it as the linear programming problem, which permits splitting a mass from a single source to multiple targets. The linear OT (LOT) is easier to solve than Monge’s form and has made great progress toward solving OT. To solve OT, many algorithms have been proposed. For example, the network simplex algorithm (Ahuja, Magnanti, and Orlin 1993) is one of the classical algorithms for LOT and has been widely used. Recently, algorithms have been proposed to solve OT faster by adding the entropy regularizer (Cuturi 2013; Altschuler, Niles-Weed, and Rigollet 2017; Lin, Ho, and Jordan 2019b; Alaya et al. 2019). The dual form of the entropy-regularized OT can be solved faster by the Sinkhorn algorithm that updates dual variables via matrix-vector products (Sinkhorn 1967). For further acceleration, many improvements to the Sinkhorn algorithm have been proposed. For example, (Altschuler, Niles-Weed, and Rigollet 2017), (Lin, Ho, and Jordan 2019b), and (Alaya et al. 2019) proposed using greedy, randomized, and safe-screening strategies, respectively, to efficiently update the dual variables. Primal-dual algorithms have received much attention (Dvurechensky, Gasnikov, and Kroshnin 2018; Lin, Ho, and Jordan 2019a; Guo, Ho, and Jordan 2020) because they report faster computation than the Sinkhorn algorithm and its variants but are rarely used in practice due to the difficulty of implementation (Pham et al. 2020). This paper focuses on the network simplex algorithm and Sinkhorn algorithm because they are widely used.

As another line of work to solve OT faster, utilizing special structures of input data has been well studied (Solomon et al. 2015; Bonneel, Peyré, and Cuturi 2016; Peyré and Cuturi 2019; Getreuer 2013; Tenetov, Wolansky, and Kimmel 2018; Altschuler et al. 2019). Inspired by the fact that geodesic distance matrices can be low-rank approximated (Shamai et al. 2015), a low-rank approximation for the cost matrix in OT was introduced to reduce the time complexity of the Sinkhorn algorithm (Tenetov, Wolansky, and Kimmel 2018; Altschuler et al. 2019). Several approaches have utilized the Gibbs kernel structures of the cost matrix appearing in the Sinkhorn algorithms. The key to these approaches is to approximate the calculation involving the Gibbs kernel; by utilizing separability (Solomon et al. 2015; Bonneel, Peyré, and Cuturi 2016) or translation invariant (Peyré and Cuturi 2019; Getreuer 2013) of the Gibbs kernel on a fixed uniform grid, the matrix-vector product in the Sinkhorn algorithm can be replaced with convolutions. Thus, it can be computed faster by, e.g., a fast Fourier transform. This paper introduces the utilization of a new special but ubiquitous structure, cyclic symmetry, in OT.

3 Preliminary

3.1 Notations

$\mathbb{R}_{\geq 0}$ denotes the set of non-negative real numbers. $\langle{\cdot},{\cdot}\rangle$ denotes the inner product; that is, for vectors $\mathbf{x},\mathbf{y}\in\mathbb{R}^{d}$ , $\langle{\mathbf{x}},{\mathbf{y}}\rangle=\sum_{i=0}^{d-1}x_{i}y_{i}$ , and for matrices $\mathbf{X},\mathbf{Y}\in\mathbb{R}^{d\times d}$ , $\langle{\mathbf{X}},{\mathbf{Y}}\rangle=\sum_{i,j=0}^{d-1}X_{ij}Y_{ij}$ . The probability simplex is denoted as $\Delta^{d}\coloneqq\{x_{i}\in\mathbb{R}^{d}\mid\sum_{i=0}^{d-1}x_{i}=1,\,x_{i}\geq 0\}$ . $\mathbf{1}_{d}$ denotes the all-ones vector in $\mathbb{R}^{d}$ .

3.2 Regularized Optimal Transport (ROT)

We define the regularized OT (ROT) that adds a convex regularizer to the linear OT (LOT) introduced by (Kantorovich 1942). Given two probability vectors $\mathbf{a},\mathbf{b}\in\Delta^{d}$ and a cost matrix $\mathbf{C}\in\mathbb{R}^{d\times d}_{\geq 0}$ , ROT can be defined as

\begin{gathered}\min_{\mathbf{T}\in\mathbb{R}^{d\times d}}\langle{\mathbf{C}},{\mathbf{T}}\rangle+\sum_{i,j=0}^{d-1}\phi(T_{ij}),\\ \mathrm{s.t.}\quad\mathbf{T}\mathbf{1}_{d}=\mathbf{a},\quad\mathbf{T}^{\top}\mathbf{1}_{d}=\mathbf{b},\end{gathered}

(1)

where $\mathbf{T}$ is called a transportation matrix and $\phi:\mathbb{R}\to\mathbb{R}\cup\{+\infty\}$ is a convex function, called a regularizer. We assume $\phi(x)=+\infty$ if $x<0$ ; this assumption imposes the non-negative constraint on $\mathbf{T}$ .

ROT 1 is a generalization of various OT formulations. For example, 1 leads to LOT when $\phi$ is given by

\displaystyle\phi(x)=\begin{cases}0&\mathrm{if}\ x\geq 0,\\ +\infty&\mathrm{otherwise}.\end{cases}

(2)

Also, 1 leads to the strongly convex-regularized OT (SROT) when $\phi$ is a strongly convex function; a function $\phi$ is called strongly convex if $\phi-\frac{\mu}{2}\|\cdot\|$ is convex for some $\mu>0$ . As an important subclass of SROT, 1 leads to the entropy-regularized OT (EROT) introduced by (Cuturi 2013) when $\phi$ is given by

\displaystyle\phi(x)=\begin{cases}\lambda x(\log x-1)&\mathrm{if}\ x\geq 0,\\ +\infty&\mathrm{otherwise},\end{cases}

(3)

where $\lambda>0$ .

4 C-ROT: ROT with Cyclic Symmetry

This section explains our assumption of cyclic symmetry for ROT 1 and real-world examples of this problem.

We assume that $\mathbf{a},\mathbf{b},\mathbf{C}$ in 1 have the following $n$ -order cyclic symmetry.

Assumption 1.

There exists a divisor $n$ of $d$ , and the probability vectors $\mathbf{a},\mathbf{b}$ in 1 have a periodic structure:

\begin{gathered}\mathbf{a}=\begin{pmatrix}\boldsymbol{\alpha}\\ \boldsymbol{\alpha}\\ \vdots\\ \boldsymbol{\alpha}\\ \end{pmatrix},\qquad\mathbf{b}=\begin{pmatrix}\boldsymbol{\beta}\\ \boldsymbol{\beta}\\ \vdots\\ \boldsymbol{\beta}\\ \end{pmatrix},\end{gathered}

(4)

where $\boldsymbol{\alpha},\boldsymbol{\beta}\in\mathbb{R}^{m}_{\geq 0}$ and $m\coloneqq\frac{d}{n}$ is an integer. Also, the cost matrix $\mathbf{C}$ in 1 has a block-circulant structure:

\displaystyle\mathbf{C}=\begin{pmatrix}\mathbf{C}_{0}&\mathbf{C}_{1}&\cdots&\mathbf{C}_{n-1}\\ \mathbf{C}_{n-1}&\mathbf{C}_{0}&\ddots&\vdots\\ \vdots&\ddots&\ddots&\mathbf{C}_{1}\\ \mathbf{C}_{1}&\cdots&\mathbf{C}_{n-1}&\mathbf{C}_{0}\\ \end{pmatrix},

(5)

where $\mathbf{C}_{0},\dots,\mathbf{C}_{n-1}\in\mathbb{R}^{m\times m}_{\geq 0}$ .

In this paper, we call ROT 1 with Assumption 1 Cyclic ROT (C-ROT). This problem appears universally in various real-world examples given below.

Example 1 (Image with Cyclic Symmetry).

Cyclic symmetry in images has been a central image research topic. Especially, because image data are represented in a rectangle form, mirror or $90^{\circ}$ rotational symmetry has been utilized for various tasks; mirror symmetry has been utilized for the face recognition (Zhao et al. 2003) and rendering (Wu, Rupprecht, and Vedaldi 2023), and $90^{\circ}$ rotational symmetry in medical and galaxy images has been utilized for the image segmentation (Pang et al. 2022) and morphology prediction (Dieleman, Willett, and Dambre 2015). Thus, we here consider ROT between images with cyclic symmetry, $\mathbf{A}$ and $\mathbf{B}\in\mathbb{R}^{h\times w}_{\geq 0}$ . For images with mirror symmetry, we assume mirror symmetry along the vertical axis;

\displaystyle A_{ij}=A_{i,w-j-1},\quad B_{ij}=B_{i,w-j-1},

(6)

for $0\leq i<h$ and $0\leq j<w$ . We vectorize these images by appropriately ordering pixels as follows:

$\displaystyle\mathbf{a}$	$\displaystyle=\left(A_{\pi(0)},A_{\pi(1)},\ldots,A_{\pi(hw-1)}\right)^{\top},$	(7)
$\displaystyle\mathbf{b}$	$\displaystyle=\left(B_{\pi(0)},A_{\pi(1)},\ldots,B_{\pi(hw-1)}\right)^{\top},$	(8)
$\displaystyle\pi(k)$	$\displaystyle=\begin{cases}\left(k\bmod h,\lfloor k/h\rfloor\right)&0\leq k<\frac{hw}{2}\\ \left(k\bmod h,\frac{3w}{2}-\lfloor k/h\rfloor-1\right)&\frac{hw}{2}\leq k<hw\end{cases}.$	(9)

By defining $\mathbf{C}$ as the Manhattan, Euclidean, or Chebyshev distance matrix between pixel positions, $\mathbf{a},\mathbf{b},\mathbf{C}$ satisfy Assumption 1; thus, C-ROT for $n=2$ will appear. Similarly, by appropriately ordering pixels for $\mathbf{a},\mathbf{b}$ in the case of $90^{\circ}$ rotational symmetry, C-ROT for $n=4$ will appear.

Example 2 (Urban Planning with Cyclic Symmetry).

ROT has straightforward applications in logistics and economy (Kantorovich 1942; Guillaume 2012). Imagine a situation where planners design the structure of a city, this structure is simply given by two probability distributions: the distributions of residents $\mathbf{a}$ and services $\mathbf{b}$ . In this context, the objective function value of ROT enables us to measure how close residents and services are and evaluate the city’s efficiency. Several city structures, such as Howard’s garden city (Howard 1965), assume that residents and services are equally located along cyclic symmetry to improve quality of life. In such structures, $\mathbf{a},\mathbf{b}$ and $\mathbf{C}$ , where $C_{ij}$ is given by the Euclidean distance between each resident $a_{i}$ and service $b_{j}$ , satisfy Assumption 1; thus, C-ROT will appear.

Example 3 (Graph with Cyclic Symmetry).

Graphs are commonly used to model real-world data. For example, chemical molecules and crystal structures can be modeled using graphs (Bonchev 1991; Xie and Grossman 2018), and their graphs often exhibit cyclic symmetry (Jaffé and Orchin 2002; Ladd 2014). To compare two graphs, computing their distance has been well-studied and OT-based approaches have been proposed (Nikolentzos, Meladianos, and Vazirgiannis 2017; Petric Maretic et al. 2019). We here consider ROT for computing a distance between two graphs with cyclic symmetry. Following (Nikolentzos, Meladianos, and Vazirgiannis 2017), we represent features for the vertices of a graph as the eigenvectors of its adjacency matrix. Like chemical molecules and crystal structures, we assume the vertex features are equally distributed along cyclic symmetry. By defining $a_{i}=b_{j}\coloneqq\frac{1}{d}$ to ensure the same amount of outgoing/incoming flow from/to a vertex and $C_{ij}$ as the Manhattan, Euclidean, or Chebyshev distance in the eigenvectors’ feature space, $\mathbf{a},\mathbf{b}$ , and $\mathbf{C}$ satisfy Assumption 1. Thus, C-ROT for two graphs will appear.

5 Fast Algorithms for C-ROT

In this section, we propose fast algorithms for C-ROT. Note that several proofs are in the supplementary material.

5.1 Block-Cyclic Structure of Optimal Solution

We first show the following lemma.

Lemma 1.

Under Assumption 1, there exists an optimal solution to 1 that has the following structure:

\displaystyle\mathbf{T}=\begin{pmatrix}\mathbf{T}_{0}&\mathbf{T}_{1}&\cdots&\mathbf{T}_{n-1}\\ \mathbf{T}_{n-1}&\mathbf{T}_{0}&\ddots&\vdots\\ \vdots&\ddots&\ddots&\mathbf{T}_{1}\\ \mathbf{T}_{1}&\cdots&\mathbf{T}_{n-1}&\mathbf{T}_{0}\\ \end{pmatrix},

(10)

where $\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}_{\geq 0}$ .

The proof is shown in Appendix B.

From Assumptions 1 and 1, $\mathbf{C}$ and $\mathbf{T}$ have the same block-circulant structure. Plugging 5 and 10 into C-ROT 1 yields the following optimization problem:

\begin{gathered}\min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}}\ \sum_{k=0}^{n-1}\left\langle\mathbf{C}_{k},\mathbf{T}_{k}\right\rangle+\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi(T_{ijk})\\ \text{s.t.}\quad\sum_{k=0}^{n-1}\mathbf{T}_{k}\mathbf{1}_{m}=\boldsymbol{\alpha},\quad\sum_{k=0}^{n-1}\mathbf{T}_{k}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta},\end{gathered}

(11)

where $T_{ijk}$ is the $(i,j)$ -th entry of $\mathbf{T}_{k}$ . Note that the objective function value of 11 is exactly $\frac{1}{n}$ of that of 1.

5.2 Algorithm for C-LOT

We here propose a fast algorithm for cyclic LOT (C-LOT), which is the special case of C-ROT 1 where $\phi$ is given by 2. From 11, C-LOT 1 can be rewritten as

\begin{gathered}\min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}_{\geq 0}}\ \sum_{k=0}^{n-1}\left\langle\mathbf{C}_{k},\mathbf{T}_{k}\right\rangle\\ \text{s.t.}\quad\sum_{k=0}^{n-1}\mathbf{T}_{k}\mathbf{1}_{m}=\boldsymbol{\alpha},\quad\sum_{k=0}^{n-1}\mathbf{T}_{k}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta}.\end{gathered}

(12)

By introducing auxiliary variables $\mathbf{S}\coloneqq\sum_{k=0}^{n-1}\mathbf{T}_{k}$ and rewriting 12 for $\mathbf{S}$ , we can show the following theorem.

Theorem 1.

We consider a small LOT

\displaystyle\min_{\mathbf{S}\in\mathbb{R}^{m\times m}_{\geq 0}}\left\langle\mathbf{G},\mathbf{S}\right\rangle\quad\text{s.t.}\quad\mathbf{S}\mathbf{1}_{m}=\boldsymbol{\alpha},\quad\mathbf{S}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta},

(13)

where

G_{ij}\coloneqq\min_{0\leq k\leq n-1}C_{ijk}.

(14)

Let $\mathbf{S}^{*}$ be an optimal solution of 13. Then, $\left(\mathbf{T}^{*}_{k}\right)_{k=0,\ldots,n-1}$ defined by

\displaystyle T^{*}_{ijk}=\begin{cases}S^{*}_{ij}&\mathrm{if}\ k=\min\left(\mathop{\rm argmin}\limits_{0\leq k\leq n-1}C_{ijk}\right),\\ 0&\mathrm{otherwise}\end{cases}

(15)

is an optimal solution to 12. Also, the optimal objective function value of 12 is the same as that of 13.

Note that $\mathrm{argmin}_{0\leq k\leq n-1}C_{ijk}$ will return a set of indices if the same minimum value exists in several indices, and we can choose any one but the smallest index by $\min$ .

Proof.

We fix $\mathbf{S}\coloneqq\sum_{k=0}^{n-1}\mathbf{T}_{k}$ in 12. The matrix $\mathbf{S}$ satisfies $\mathbf{S}\mathbf{1}_{m}=\boldsymbol{\alpha},\mathbf{S}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta}$ and we can rewrite 12 as

\displaystyle\min_{\begin{subarray}{c}\mathbf{S}\in\mathbb{R}^{m\times m}_{\geq 0},\\ \mathbf{S}\mathbf{1}_{m}=\boldsymbol{\alpha},\ \mathbf{S}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta}\end{subarray}}\left(\min_{\begin{subarray}{c}\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}_{\geq 0},\\ \sum_{k=0}^{n-1}\mathbf{T}_{k}=\mathbf{S}\end{subarray}}\ \sum_{k=0}^{n-1}\left\langle\mathbf{C}_{k},\mathbf{T}_{k}\right\rangle\right).

(16)

The inner problem can be solved analytically and independently for each $(i,j)$ -th entry of $\mathbf{T}_{0},\ldots,\mathbf{T}_{n-1}$ ; the optimal solution is given by 15, and the optimal objective function value is $\left\langle{\mathbf{G}},{\mathbf{S}}\right\rangle$ . Next, we solve the outer optimization problem for $\mathbf{S}$ . Because $\mathbf{S}\in\mathbb{R}_{\geq 0}^{m\times m},\mathbf{S}\mathbf{1}_{m}=\boldsymbol{\alpha}$ , $\mathbf{S}^{\top}\mathbf{1}_{m}=\boldsymbol{\beta}$ and the objective function is $\left\langle{\mathbf{G}},{\mathbf{S}}\right\rangle$ , this optimization problem is identical with 13. ∎

Theorem 1 indicates that C-LOT 1 can be reduced to the small LOT 13, which has significantly fewer $m^{2}$ variables than $d^{2}=m^{2}n^{2}$ variables of the original C-LOT 1. Therefore, we will obtain the optimal solution to C-LOT 1 by solving the small LOT 13 instead. The proposed algorithm is summarized in Algorithm 1. Also, Figure 1 shows the overview of this algorithm.

We evaluate the time complexity of Algorithm 1. The time complexity depends on the algorithm to solve the small LOT 13. We here use the network simplex algorithm, the most popular algorithm to solve LOT, to evaluate the time complexity. Tarjan (1997) showed that the time complexity of the network simplex algorithm to solve LOT 1 with the regularizer 2 is $O(d^{3}\log d\log(d\left\|\mathbf{C}\right\|_{\infty}))$ , where $\left\|\mathbf{C}\right\|_{\infty}\coloneqq\max_{i,j}|C_{ij}|$ . This enables the time complexity of line 2 in Algorithm 1 to be bounded by $O(m^{3}\log m\log(m\left\|\mathbf{C}\right\|_{\infty}))$ . Because line 1 and lines 3–7 can be conducted in $O(d^{2})$ time, the total time complexity of Algorithm 1 is $O(m^{3}\log m\log(m\left\|\mathbf{C}\right\|_{\infty})+d^{2})$ . This is significantly improved from $O(d^{3}\log d\log(d\left\|\mathbf{C}\right\|_{\infty}))$ when solving C-LOT 1 directly.

Algorithm 1 Fast Algorithm for C-LOT

\mathbf{a},\mathbf{b}\in\Delta^{d}

and

\mathbf{C}\in\mathbb{R}^{d\times d}_{\geq 0}

under Assumption 1.

2:Compute

\mathbf{G}

whose entry is given by 14.

3:Compute the optimal solution

\mathbf{S}^{*}

to 13.

4:for

i,j,k

5: Compute

T_{ijk}

by the relationship 15

6:end for

7:Compute

\mathbf{T}

by Lemma 1 with

(T_{ijk})

8:return

\mathbf{T}

5.3 Algorithm for C-SROT

We propose a fast algorithm for cyclic SROT (C-SROT) which is the special case of C-ROT 1 where $\phi$ is a strongly convex regularizer. Note that because $\phi$ defined by 2 is not strongly convex, we cannot apply this algorithm to C-LOT.

The following theorem follows from Fenchel’s duality theorem and optimality conditions in convex analysis (see, e.g., (Rockafellar 1970, Section 31)).

Theorem 2.

The Fenchel dual of the problem 11 is

\displaystyle\begin{aligned} \max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ &\langle{\mathbf{w}},{\boldsymbol{\alpha}}\rangle+\langle{\mathbf{z}},{\boldsymbol{\beta}}\rangle\\ &-\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi^{\star}(w_{i}+z_{j}-C_{ijk}),\end{aligned}

(17)

where $\phi^{\star}:\mathbb{R}\to\mathbb{R}\cup\{+\infty\}$ is the Fenchel conjugate of $\phi$ defined by $\phi^{\star}(y)\coloneqq\sup\{yx-\phi(x)\mid x\in\mathbb{R}\}$ . Also, the optimal solutions to the problem 11, $\mathbf{T}^{*}_{k}$ , and to the problem 17, $\mathbf{w}^{*}$ and $\mathbf{z}^{*}$ , have the following relationship:

\displaystyle T^{*}_{ijk}=\left(\phi^{\star}\right)^{\prime}(w^{*}_{i}+z^{*}_{j}-C_{ijk}).

(18)

The proof is shown in Appendix C.

Note that $\phi^{\star}$ is a smooth and differentiable convex function because $\phi$ is strongly convex. Theorem 2 indicates that we will obtain the optimal solution to C-SROT 1 by solving the dual problem 17 instead because we can reconstruct it by the relationship 18 and Lemma 1.

We here propose to apply the alternating minimization algorithm (Beck 2017, Chapter 14) to 17; we iteratively optimize the objective function of 17 with respect to $\mathbf{w}$ while fixing $\mathbf{z}$ , and vice versa. When we fix $\mathbf{z}$ , the partial derivative of the objective function with respect to $w_{i}$ is

\displaystyle\alpha_{i}-\sum_{k=0}^{n-1}\sum_{j=0}^{m-1}\left(\phi^{\star}\right)^{\prime}(w_{i}+z_{j}-C_{ijk}),

(19)

and $w_{i}$ is optimal if 19 equals to 0. Because 19 monotonically decreases with respect to $w_{i}$ , we can find such $w_{i}$ easily by, e.g., the well-known Newton’s method. This also applies to the optimization with respect to $\mathbf{z}$ while fixing $\mathbf{w}$ . The alternating minimization algorithm for a smooth convex function is guaranteed to attain fast convergence (see (Beck and Tetruashvili 2013) for more details).

The distinguishing feature of this algorithm is treating a few dual variables. If the alternating minimization algorithm is used for the dual problem of 1 without considering cyclic symmetry, the number of dual variables is $2d=2mn$ . In contrast, our algorithm treats only $2m$ dual variables, which is significantly reduced to $\frac{1}{n}$ . Therefore, the computational time required for one iteration in the alternating minimization will be considerably reduced.

5.4 Algorithm for C-EROT

We here propose a fast algorithm for cyclic EROT (C-EROT), which is the crucial special case of C-ROT 1 where $\phi$ is given by 3. Because 3 is strongly convex, we can apply the cyclic-aware alternating minimization algorithm introduced in Section 5.3 to C-EROT.

Because $\phi^{\star}(y)=\lambda\exp(\frac{y}{\lambda})$ , 19 can be written as

\displaystyle\alpha_{i}-\exp\left(\frac{w_{i}}{\lambda}\right)\sum_{j=0}^{m-1}K_{ij}\exp\left(\frac{z_{j}}{\lambda}\right),

(20)

where

\displaystyle K_{ij}\coloneqq\sum_{k=0}^{n-1}\exp\left(-\frac{C_{ijk}}{\lambda}\right).

(21)

From 20, we can get optimal $w_{i}$ in closed form:

w_{i}=\lambda\left(\log\alpha_{i}-\log\left(\sum_{j=0}^{m-1}K_{ij}\exp\left(\frac{z_{j}}{\lambda}\right)\right)\right).

(22)

We can rewrite 22 and describe the optimal $q_{j}$ as follows:

p_{i}=\frac{\alpha_{i}}{\sum_{j=0}^{m-1}K_{ij}q_{j}},\quad q_{j}=\frac{\beta_{j}}{\sum_{i=0}^{m-1}K_{ij}p_{i}},

(23)

where $p_{i}\coloneqq\exp\left(\frac{w_{i}}{\lambda}\right),\ q_{j}\coloneqq\exp\left(\frac{z_{j}}{\lambda}\right)$ . This algorithm resembles the Sinkhorn algorithm (Cuturi 2013); we call it cyclic Sinkhorn algorithm. Note that the optimal solution $\mathbf{T}$ to C-EROT 1 can be easily reconstructed from the optimal $\mathbf{w}$ and $\mathbf{z}$ by 18 and Lemma 1. The proposed algorithm is summarized in Algorithm 2.

Algorithm 2 Fast Cyclic Sinkhorn Algorithm for C-EROT

\mathbf{a},\mathbf{b}\in\Delta^{d},\mathbf{C}\in\mathbb{R}^{d\times d}_{\geq 0}

under Assumption 1 and

\lambda>0

2:Compute

\mathbf{K}

whose entry is given by 21.

3:Initialize

\mathbf{q}\leftarrow\mathbf{1}_{m}

4:repeat

\mathbf{p}\leftarrow\boldsymbol{\alpha}\oslash(\mathbf{K}\mathbf{q})

\triangleright

\oslash

denotes elementwise division

\mathbf{q}\leftarrow\boldsymbol{\beta}\oslash(\mathbf{K}^{\top}\mathbf{p})

7:until convergence

8:for

i,j,k

T_{ijk}\leftarrow{p_{i}q_{j}}\exp\left(-\frac{C_{ijk}}{\lambda}\right)

10:end for

11:Compute

\mathbf{T}

by Lemma 1 with

(T_{ijk})

12:return

\mathbf{T}

We evaluate the time complexity of Algorithm 2. The time complexity depends on the matrix-vector product iterations in lines 4 and 5 in Algorithm 2 to solve the Fenchel dual problem 17. In the original Sinkhorn algorithm, the time complexity of each iteration is $O(d^{2})=O(m^{2}n^{2})$ (Cuturi 2013). In contrast, in our cyclic Sinkhorn algorithm, the time complexity of each iteration is $O(m^{2})$ ; thus, our algorithm solves C-EROT significantly faster than the original Sinkhorn algorithm.

6 Two-Stage Algorithm for C-EROT with Approximate Cyclic Symmetry

There are many real-world cases in which input data show only approximate cyclic symmetry. In Example 1, $\mathbf{C}$ satisfies Assumption 1 strictly when using the pixel-wise Euclidean distance, but input distributions $\mathbf{a},\mathbf{b}$ (namely, images) often satisfy Assumption 1 only approximately due to slight noise and displacement. Thus, the above-proposed algorithms cannot be applied to such cases because they assume to satisfy Assumption 1 strictly. To overcome this issue, we here propose a fast two-stage Sinkhorn algorithm for C-EROT with approximate cyclic symmetry. Because EROT is commonly used thanks to its differentiability and computational efficiency (Peyré and Cuturi 2019; Guo, Ho, and Jordan 2020), we focused on C-EROT here. The two-stage Sinkhorn algorithm first runs the cyclic Sinkhorn algorithm (Algorithm 2) to quickly obtain a strict symmetric solution. It then runs the original Sinkhorn algorithm (Cuturi 2013) to modify the solution. Therefore, this algorithm obtains the optimal solution to C-EROT with approximate cyclic symmetry faster by utilizing cyclic symmetry at the first stage. The proposed algorithm is described in Algorithm 3.

Algorithm 3 Fast Two-Stage Algorithm for C-EROT with Approximate Cyclic Symmetry

\mathbf{a},\mathbf{b}\in\Delta^{d}

\mathbf{C}\in\mathbb{R}^{d\times d}_{\geq 0}

and

\lambda>0

2:// Stage1: Cyclic Sinkhorn algorithm

3:for

i=0,\dots,m-1

\alpha_{i}=\frac{1}{n}\sum_{k=0}^{n-1}a_{i+mk}

\triangleright

the average of

n

-divided

\mathbf{a}

\beta_{i}=\frac{1}{n}\sum_{k=0}^{n-1}b_{i+mk}

\triangleright

the average of

n

-divided

\mathbf{b}

6:end for

7:Compute

\mathbf{K}

whose entry is given by 21.

8:Initialize

\widehat{\mathbf{q}}\leftarrow\mathbf{1}_{m}

9:repeat

10:

\widehat{\mathbf{p}}\leftarrow\boldsymbol{\alpha}\oslash(\mathbf{K}\widehat{\mathbf{q}})

\triangleright

\oslash

denotes elementwise division

11:

\widehat{\mathbf{q}}\leftarrow\boldsymbol{\beta}\oslash(\mathbf{K}^{\top}\widehat{\mathbf{p}})

12:until convergence

13:// Stage2: Sinkhorn algorithm (Cuturi 2013)

14:Initialize

\mathbf{p},\mathbf{q}

as the

n

concatenated

\widehat{\mathbf{p}},\widehat{\mathbf{q}}

, respectively.

15:Compute

K_{ij}=\exp\left(-\frac{C_{ij}}{\lambda}\right)

16:repeat

17:

\mathbf{p}\leftarrow\mathbf{a}\oslash(\mathbf{K}\mathbf{q})

18:

\mathbf{q}\leftarrow\mathbf{b}\oslash(\mathbf{K}^{\top}\mathbf{p})

19:until convergence

20:return

\mathbf{T}\leftarrow\mathrm{diag}(\mathbf{p})\mathbf{K}\mathrm{diag}(\mathbf{q})

If satisfying Assumption 1 strictly, the time complexity of this algorithm is the same as that of the cyclic Sinkhorn algorithm. If not, it will be complex due to mixing the two Sinkhorn algorithms at Stages 1 and 2. This analysis is for future research, but we experimentally confirmed that this algorithm shows fast computation when input data have approximate cyclic symmetry in Section 7.2.

7 Experiments

To validate the effectiveness of our algorithms, we conducted experiments on synthetic/real-world data that satisfy Assumption 1 strictly/approximately. In all experiments, we evaluated whether our algorithms, which solve small optimization problems instead of the original OT, show the same results as the original OT but with faster computation. These experiments were performed on a Windows laptop with Intel Core i7-10750H CPU, 32 GB memory. All the codes were implemented in Python.

Table 1: Experimental results in synthetic data. “Obj. value” indicates objective function value.

Algorithm	$n$	$d=5000$			$d=10000$
Algorithm	$n$	Obj. value	Marginal error	Time (sec.)	Obj. value	Marginal error	Time (sec.)
Network Simplex	–	$6.034\pm 0.824$	$0.000\pm 0.000$	$6.523\pm 1.013$	$6.526\pm 0.917$	$0.000\pm 0.000$	$33.660\pm 3.238$
Cyclic Network Simplex	2	$6.034\pm 0.824$	$0.000\pm 0.000$	$1.477\pm 0.235$	$6.526\pm 0.917$	$0.000\pm 0.000$	$7.084\pm 0.728$
	5	$6.034\pm 0.824$	$0.000\pm 0.000$	$0.300\pm 0.030$	$6.526\pm 0.917$	$0.000\pm 0.000$	$1.391\pm 0.155$
	10	$6.034\pm 0.824$	$0.000\pm 0.000$	$0.136\pm 0.026$	$6.526\pm 0.917$	$0.000\pm 0.000$	$0.618\pm 0.073$
	25	$6.034\pm 0.824$	$0.000\pm 0.000$	$0.080\pm 0.019$	$6.526\pm 0.917$	$0.000\pm 0.000$	$0.381\pm 0.044$
	50	$6.034\pm 0.824$	$0.000\pm 0.000$	$0.056\pm 0.015$	$6.526\pm 0.917$	$0.000\pm 0.000$	$0.329\pm 0.034$
Sinkhorn	–	$6.233\pm 0.821$	$0.000\pm 0.000$	$3.271\pm 1.445$	$6.745\pm 0.916$	$0.000\pm 0.000$	$14.589\pm 4.213$
Cyclic Sinkhorn	2	$6.233\pm 0.821$	$0.000\pm 0.000$	$0.918\pm 0.463$	$6.745\pm 0.916$	$0.000\pm 0.000$	$3.973\pm 0.922$
	5	$6.233\pm 0.821$	$0.000\pm 0.000$	$0.207\pm 0.170$	$6.745\pm 0.916$	$0.000\pm 0.000$	$1.262\pm 0.324$
	10	$6.233\pm 0.821$	$0.000\pm 0.000$	$0.116\pm 0.036$	$6.745\pm 0.916$	$0.000\pm 0.000$	$0.636\pm 0.259$
	25	$6.233\pm 0.821$	$0.000\pm 0.000$	$0.093\pm 0.036$	$6.745\pm 0.916$	$0.000\pm 0.000$	$0.381\pm 0.127$
	50	$6.233\pm 0.821$	$0.000\pm 0.000$	$0.067\pm 0.034$	$6.745\pm 0.916$	$0.000\pm 0.000$	$0.320\pm 0.053$

Table 2: Experimental results in real-world data.

Algorithm	$(h,w)=(64,64),\quad d=4096$			$(h,w)=(96,96),\quad d=9216$
Algorithm	Obj. value	Marginal error	Time (sec.)	Obj. value	Marginal error	Time (sec.)
Sinkhorn	$4.320\pm 2.056$	$0.000\pm 0.000$	$16.610\pm 6.502$	$6.296\pm 3.100$	$0.000\pm 0.000$	$117.152\pm 53.442$
Cyclic Sinkhorn	$4.289\pm 2.048$	$0.001\pm 0.001$	$3.837\pm 1.286$	$6.250\pm 3.089$	$0.001\pm 0.001$	$26.087\pm 11.985$
Two-Stage Sinkhorn	$4.320\pm 2.056$	$0.000\pm 0.000$	$13.877\pm 6.244$	$6.296\pm 3.100$	$0.000\pm 0.000$	$91.790\pm 43.000$

7.1 Synthetic Data w/ Strict Cyclic Symmetry

We created $20$ synthetic random data for each of the two dimensions, $d\in\{5000,10000\}$ , that satisfy Assumption 1 strictly in $n=50$ (for details, see Appendix D). For validation, we evaluated the average and standard deviation over each $20$ data of the objective function values, marginal constraint errors defined by $||\mathbf{T}^{\top}\mathbf{1}_{d}-\mathbf{b}||_{2}$ , and the computation time when using different algorithms: the network simplex algorithm (Ahuja, Magnanti, and Orlin 1993), Algorithm 1 using the network simplex algorithm in line 2 (we call it cyclic network simplex algorithm), the Sinkhorn algorithm (Cuturi 2013), and the cyclic Sinkhorn algorithm (Algorithm 2). We set $\lambda=0.5$ for the regularizer 3. The computation time was recorded between inputting the data and outputting the optimal solution. Because these synthetic data also satisfy Assumption 1 for all $n$ that are divisors of $50$ , namely $n\in\{2,5,10,25,50\}$ , we conducted experiments for each $n$ ; larger $n$ leads to smaller problems that output the same result. The network simplex algorithm was implemented using LEMON (Dezső, Jüttner, and Kovács 2011).

Table 1 lists the results. The network simplex algorithm and cyclic one had the same optimal objective function value, but the latter showed faster computation times as $n$ becomes larger. This was also the case when using the Sinkhorn algorithm and the cyclic one. These results support the effectiveness of our proposed algorithms; higher cyclic symmetry (i.e., larger $n$ ) results in faster computation time.

7.2 Real Data w/ Approximate Cyclic Symmetry

For real-world data, we tested the case of mirror symmetry ( $n=2$ ) in Example 1 with the NYU Symmetry Database (Cicconet et al. 2017). In this dataset, we selected $20$ images with mirror symmetry along the vertical axis (the images are shown in Appendix E). These images were converted to gray-scale, resized to be $64\times 64$ or $96\times 96$ pixels, and normalized so that the sum of the intensity is $1$ . We then obtained $\mathbf{a},\mathbf{b}$ by 7 and $\mathbf{C}$ by the pixel-wise Euclidean distance. For validation, we evaluated the same metrics as in Section 7.1 over $190(={}_{20}\mathrm{C}_{2})$ image pairs. Because EROT is commonly used in real applications (Peyré and Cuturi 2019), we focused on C-EROT here and compared the Sinkhorn algorithm, the cyclic one (Algorithm 2), and the two-stage one (Algorithm 3). Note that, in the two-stage Sinkhorn algorithm, we stopped Stage 1 before the end of convergence to prevent the solution far from the optimal one for real images (for details, see Appendix F). We set $\lambda$ as the same as in Section 7.1 for the regularizer 3.

Table 2 lists the results. The cyclic Sinkhorn algorithm showed the fastest computation time. However, because this algorithm assumes to satisfy Assumption 1 strictly, its objective function value differed from that of the original Sinkhorn algorithm, and marginal error occurred. In contrast, the two-stage Sinkhorn algorithm showed the same objective function value as that of the original one and no marginal error but with faster computation time than using the original one. These results indicate that the cyclic Sinkhorn algorithm can be a good choice for real-world data because of its fastest computation time if users tolerate the objective function value difference and the marginal error. If not, the two-stage Sinkhorn algorithm is promising for real-world data, which solves C-EROT with approximate cyclic symmetry faster than the original Sinkhorn algorithm.

8 Discussions and Limitations

Through this paper, we confirmed that our algorithms can solve C-ROT faster. For further progress, we discuss the following future issues. (I) In Assumption 1, we assume knowing the cyclic order $n$ in advance. Because cyclic symmetry arises naturally from the physical structure of input data, this assumption is reasonable in some real-world cases. However, we must improve our algorithms for unknown-order cyclic symmetry. (II) It is unknown whether our algorithms can be generalized for other symmetries, e.g., dihedral symmetry (Gatermann and Parrilo 2004). Further development of our algorithms for general symmetries remains as future work. (III) The main contribution of this paper is showing the utilization of cyclic symmetry in OT with theoretical proofs, but we must test our algorithms in various real-world data for further development.

9 Conclusion

We proposed novel fast algorithms for OT with cyclic symmetry. We showed that such OT can be reduced to a smaller optimization problem that has significantly fewer variables as higher cyclic symmetry exists in the input data. Our algorithms solve the small problem instead of the original OT and achieve fast computation. Through experiments, we confirmed the effectiveness of our algorithms in synthetic/real-world data with strict/approximate cyclic symmetry. This paper cultivates a new research direction, OT with symmetry, and paves the way for future research.

References

Ahuja, Magnanti, and Orlin (1993) Ahuja, R. K.; Magnanti, T. L.; and Orlin, J. B. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc.
Alaya et al. (2019) Alaya, M. Z.; Berar, M.; Gasso, G.; and Rakotomamonjy, A. 2019. Screening Sinkhorn Algorithm for Regularized Optimal Transport. In Advances in Neural Information Processing Systems.
Altschuler et al. (2019) Altschuler, J.; Bach, F.; Rudi, A.; and Niles-Weed, J. 2019. Massively scalable Sinkhorn distances via the Nyström method. In Advances in Neural Information Processing Systems.
Altschuler, Niles-Weed, and Rigollet (2017) Altschuler, J.; Niles-Weed, J.; and Rigollet, P. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Advances in Neural Information Processing Systems.
Beck (2017) Beck, A. 2017. First-Order Methods in Optimization. Society for Industrial and Applied Mathematics.
Beck and Tetruashvili (2013) Beck, A.; and Tetruashvili, L. 2013. On the convergence of block coordinate descent type methods. SIAM journal on Optimization, 23(4): 2037–2060.
Blondel, Seguy, and Rolet (2018) Blondel, M.; Seguy, V.; and Rolet, A. 2018. Smooth and Sparse Optimal Transport. In International Conference on Artificial Intelligence and Statistics.
Bonchev (1991) Bonchev, D. 1991. Chemical graph theory: introduction and fundamentals, volume 1. CRC Press.
Bonneel, Peyré, and Cuturi (2016) Bonneel, N.; Peyré, G.; and Cuturi, M. 2016. Wasserstein Barycentric Coordinates: Histogram Regression Using Optimal Transport. ACM Transactions on Graphics, 35(4).
Boyd and Vandenberghe (2004) Boyd, S.; and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press.
Cicconet et al. (2017) Cicconet, M.; Birodkar, V.; Lund, M.; Werman, M.; and Geiger, D. 2017. A convolutional approach to reflection symmetry. Pattern Recognition Letters, 95: 44–50.
Courty et al. (2017) Courty, N.; Flamary, R.; Tuia, D.; and Rakotomamonjy, A. 2017. Optimal Transport for Domain Adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9): 1853–1865.
Cuturi (2013) Cuturi, M. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Advances in Neural Information Processing Systems.
Dezső, Jüttner, and Kovács (2011) Dezső, B.; Jüttner, A.; and Kovács, P. 2011. LEMON – an Open Source C++ Graph Template Library. Electronic Notes in Theoretical Computer Science, 264(5): 23–45. Second Workshop on Generative Technologies.
Dieleman, Willett, and Dambre (2015) Dieleman, S.; Willett, K. W.; and Dambre, J. 2015. Rotation-invariant convolutional neural networks for galaxy morphology prediction. Monthly Notices of the Royal Astronomical Society, 450(2): 1441–1459.
Dvurechensky, Gasnikov, and Kroshnin (2018) Dvurechensky, P.; Gasnikov, A.; and Kroshnin, A. 2018. Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn’s Algorithm. In International Conference on Machine Learning.
Gatermann and Parrilo (2004) Gatermann, K.; and Parrilo, P. A. 2004. Symmetry groups, semidefinite programs, and sums of squares. Journal of Pure and Applied Algebra, 192(1-3): 95–128.
Getreuer (2013) Getreuer, P. 2013. A Survey of Gaussian Convolution Algorithms. Image Processing On Line, 3: 286–310.
Guillaume (2012) Guillaume, C. 2012. Optimal transportation and economic applications. Lecture Notes.
Guo, Ho, and Jordan (2020) Guo, W.; Ho, N.; and Jordan, M. 2020. Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter. In International Conference on Artificial Intelligence and Statistics.
Howard (1965) Howard, E. 1965. Garden Cities of To-Morrow, volume 23. Mit Press.
Jaffé and Orchin (2002) Jaffé, H. H.; and Orchin, M. 2002. Symmetry in chemistry. Courier Corporation.
Kantorovich (1942) Kantorovich, L. 1942. On the transfer of masses (in Russian). In Doklady Akademii Nauk, volume 37, 227.
Kusner et al. (2015) Kusner, M.; Sun, Y.; Kolkin, N.; and Weinberger, K. 2015. From Word Embeddings To Document Distances. In International Conference on Machine Learning.
Ladd (2014) Ladd, M. F. C. 2014. Symmetry of crystals and molecules. Oxford University Press, USA.
Lin, Ho, and Jordan (2019a) Lin, T.; Ho, N.; and Jordan, M. 2019a. On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms. In International Conference on Machine Learning.
Lin, Ho, and Jordan (2019b) Lin, T.; Ho, N.; and Jordan, M. I. 2019b. On the Acceleration of the Sinkhorn and Greenkhorn Algorithms for Optimal Transport. In ArXiv Preprint: 1906.01437v1.
Liu et al. (2020) Liu, Y.; Zhu, L.; Yamada, M.; and Yang, Y. 2020. Semantic Correspondence as an Optimal Transport Problem. In Computer Vision and Pattern Recognition.
Monge (1781) Monge, G. 1781. Mémoire sur la théorie des déblais et des remblais. De l’Imprimerie Royale.
Nikolentzos, Meladianos, and Vazirgiannis (2017) Nikolentzos, G.; Meladianos, P.; and Vazirgiannis, M. 2017. Matching Node Embeddings for Graph Similarity. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.
Pang et al. (2022) Pang, S.; Du, A.; Orgun, M. A.; Wang, Y.; Sheng, Q. Z.; Wang, S.; Huang, X.; and Yu, Z. 2022. Beyond CNNs: Exploiting Further Inherent Symmetries in Medical Image Segmentation. IEEE Transactions on Cybernetics, 1–12.
Petric Maretic et al. (2019) Petric Maretic, H.; El Gheche, M.; Chierchia, G.; and Frossard, P. 2019. GOT: An Optimal Transport framework for Graph comparison. In Advances in Neural Information Processing Systems.
Peyré and Cuturi (2019) Peyré, G.; and Cuturi, M. 2019. Computational Optimal Transport: With Applications to Data Science. Now Publishers.
Pham et al. (2020) Pham, K.; Le, K.; Ho, N.; Pham, T.; and Bui, H. 2020. On Unbalanced Optimal Transport: An Analysis of Sinkhorn Algorithm. In International Conference on Machine Learning.
Rockafellar (1970) Rockafellar, R. T. 1970. Convex Analysis, volume 18 of Princeton Mathematical Series. New Jersey: Princeton University Press.
Shamai et al. (2015) Shamai, G.; Aflalo, Y.; Zibulevsky, M.; and Kimmel, R. 2015. Classical Scaling Revisited. In International Conference on Computer Vision.
Sinkhorn (1967) Sinkhorn, R. 1967. Diagonal equivalence to matrices with prescribed row and column sums. In The American Mathematical Monthly, 402–405.
Solomon et al. (2015) Solomon, J.; de Goes, F.; Peyré, G.; Cuturi, M.; Butscher, A.; Nguyen, A.; Du, T.; and Guibas, L. 2015. Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains. ACM Transactions on Graphics, 34(4).
Tarjan (1997) Tarjan, R. E. 1997. Dynamic trees as search trees via euler tours, applied to the network simplex algorithm. Mathematical Programming, 78(2): 169–177.
Tenetov, Wolansky, and Kimmel (2018) Tenetov, E.; Wolansky, G.; and Kimmel, R. 2018. Fast Entropic Regularized Optimal Transport Using Semidiscrete Cost Approximation. SIAM Journal on Scientific Computing, 40(5): A3400–A3422.
Wu, Rupprecht, and Vedaldi (2023) Wu, S.; Rupprecht, C.; and Vedaldi, A. 2023. Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild (Invited Paper). IEEE Trans. Pattern Anal. Mach. Intell., 45(4): 5268–5281.
Xie and Grossman (2018) Xie, T.; and Grossman, J. C. 2018. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett., 120: 145301.
Zhao et al. (2003) Zhao, W.; Chellappa, R.; Phillips, P. J.; and Rosenfeld, A. 2003. Face Recognition: A Literature Survey. ACM Comput. Surv., 35(4): 399–458.

Appendices: Optimal Transport with Cyclic Symmetry

Appendix A Simple Counter-Example to the Intuitive Utilization of Cyclic Symmetry in OT

As explained in Section 1, the intuitive way, which solves OT for only one of the symmetric components of input data and concatenates $n$ copies of the obtained solution, cannot work well. To explain this reason clearly, we here present a simple counter-example to this intuitive utilization of cyclic symmetry in OT.

We consider the $90^{\circ}$ rotational symmetry case ( $n=4$ ) of Example 1 in Section 4 with the following gray images.

These images obviously have $90^{\circ}$ rotational symmetry.

Intuitively, to utilize the $90^{\circ}$ rotational symmetry in the above images, it looks good if we consider OT for only one of the symmetric parts of the images, i.e., only the red rectangular areas in Figure A1. However, when the cost matrix $\mathbf{C}$ is given by the pixel-wise Euclidean distance matrix, the optimal transportation plan for the $(0,1)$ -th entry (the value is $0.25$ ) in the left image is to be transported toward $(0,2)$ -th entry (the value is $0.25$ ), beyond the red rectangular area, in the right image. Thus, the OT for only one of the symmetric parts (the red rectangular areas) of the above images will give an incorrect transportation plan.

Therefore, the intuitive utilization of cyclic symmetry, i.e., solving OT for only one of the symmetric components of input data and concatenating $n$ copies of the obtained solution, cannot work well. Consequently, we must consider the interaction between the symmetric components and develop novel fast algorithms that appropriately utilize cyclic symmetry with theoretical proofs, leading to our algorithms in the main paper.

Appendix B Proof of Lemma 1

Proof.

Let $\mathbf{T}^{\prime}$ be an optimal solution to 1. We define $\mathbf{T}^{*}$ as follows:

\displaystyle\mathbf{T}^{*}\coloneqq\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}\left(\mathbf{P}^{k}\right)^{\top},

(A1)

where

\displaystyle\mathbf{P}\coloneqq\begin{pmatrix}\mathbf{O}_{m}&\mathbf{O}_{m}&\cdots&\mathbf{O}_{m}&\mathbf{I}_{m}\\ \mathbf{I}_{m}&\mathbf{O}_{m}&\cdots&\mathbf{O}_{m}&\mathbf{O}_{m}\\ \mathbf{O}_{m}&\ddots&\ddots&\vdots&\vdots\\ \vdots&\ddots&\ddots&\mathbf{O}_{m}&\mathbf{O}_{m}\\ \mathbf{O}_{m}&\cdots&\mathbf{O}_{m}&\mathbf{I}_{m}&\mathbf{O}_{m}\\ \end{pmatrix}

(A2)

is the block-circulant permutation matrix, $\mathbf{I}_{m}$ denotes the $m\times m$ identity matrix, and $\mathbf{O}_{m}$ denotes the $m\times m$ zero matrix.

First, we will show that $\mathbf{T}^{*}$ is a feasible solution to 1. $\mathbf{T}^{*}$ satisfies the constraints of row summation because

\displaystyle\mathbf{T}^{*}\mathbf{1}_{d}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}\left(\mathbf{P}^{k}\right)^{\top}\mathbf{1}_{d}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}\mathbf{1}_{d}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{a}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{a}=\mathbf{a}.

(A3)

Similarly, we can show that $\mathbf{T}^{*}$ satisfies the constraints of column summation (i.e., $(\mathbf{T}^{*})^{\top}\mathbf{1}_{d}=\mathbf{b}$ ). Thus, $\mathbf{T}^{*}$ is a feasible solution to 1.

Next, we will show that $\mathbf{T}^{*}$ is an optimal solution to 1. For this purpose, we check the optimal function value of 1 when $\mathbf{T}=\mathbf{T}^{*}$ . Let $f(\mathbf{T})$ be the objective function of 1, we get

\displaystyle f(\mathbf{T}^{*})=f\left(\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}\left(\mathbf{P}^{k}\right)^{\top}\right)\leq\frac{1}{n}\sum_{k=0}^{n-1}f\left(\mathbf{P}^{k}\mathbf{T}^{\prime}\left(\mathbf{P}^{k}\right)^{\top}\right)=\frac{1}{n}\sum_{k=0}^{n-1}f(\mathbf{T}^{\prime})=f(\mathbf{T}^{\prime}).

(A4)

Note that, we use the convexity of $f$ and Jensen’s inequality in the inequality relationship of the above equation. Thus, $\mathbf{T}^{*}$ is an optimal solution to 1.

Finally, for $l=0,\ldots,n-1$ , we get

\displaystyle\mathbf{P}^{l}\mathbf{T}^{*}(\mathbf{P}^{l})^{\top}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k+l}\mathbf{T}^{\prime}(\mathbf{P}^{k+l})^{\top}=\frac{1}{n}\sum_{k=0}^{n-1}\mathbf{P}^{k}\mathbf{T}^{\prime}(\mathbf{P}^{k})^{\top}=\mathbf{T}^{*}.

(A5)

Therefore, $\mathbf{T}^{*}$ has a block-circulant structure of 10. ∎

Appendix C Proof of Theorem 2

Proof.

We rewrite 11 with Lagrange multipliers $\mathbf{w}$ and $\mathbf{z}$ for the two equality constraints as follows:

		$\displaystyle\min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}}\ \max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ \sum_{k=0}^{n-1}\left\langle\mathbf{C}_{k},\mathbf{T}_{k}\right\rangle+\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi(T_{ijk})+\left\langle\mathbf{w},\boldsymbol{\alpha}-\sum_{k=0}^{n-1}\mathbf{T}_{k}\mathbf{1}_{m}\right\rangle+\left\langle\mathbf{z},\boldsymbol{\beta}-\sum_{k=0}^{n-1}\mathbf{T}_{k}^{\top}\mathbf{1}_{m}\right\rangle$		(A6)
	$\displaystyle=$	$\displaystyle\min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}}\ \max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ \langle{\mathbf{w}},{\boldsymbol{\alpha}}\rangle+\langle{\mathbf{z}},{\boldsymbol{\beta}}\rangle+\sum_{k=0}^{n-1}\left\langle{\mathbf{C}_{k}-\mathbf{w}\mathbf{1}^{\top}-\mathbf{1}\mathbf{z}^{\top}},{\mathbf{T}_{k}}\right\rangle+\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi(T_{ijk}).$		(A6)

Note that Problem 11 is convex and the constraints are linear and that Slater’s constraint qualification holds. Hence, the strong duality holds (see, e.g., (Boyd and Vandenberghe 2004, Section 5.2.3)), and we can swap the $\min$ - and $\max$ -operations in LABEL:apeq:problem_CROT_lagrange:

	$\displaystyle=\$	$\displaystyle\max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ \min_{\mathbf{T}_{0},\dots,\mathbf{T}_{n-1}\in\mathbb{R}^{m\times m}}\ \langle{\mathbf{w}},{\boldsymbol{\alpha}}\rangle+\langle{\mathbf{z}},{\boldsymbol{\beta}}\rangle+\sum_{k=0}^{n-1}\left\langle{\mathbf{C}_{k}-\mathbf{w}\mathbf{1}^{\top}-\mathbf{1}\mathbf{z}^{\top}},{\mathbf{T}_{k}}\right\rangle+\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi(T_{ijk})$		(A7)
	$\displaystyle=\$	$\displaystyle\max_{\mathbf{w},\mathbf{z}\in\mathbb{R}^{m}}\ \langle{\mathbf{w}},{\boldsymbol{\alpha}}\rangle+\langle{\mathbf{z}},{\boldsymbol{\beta}}\rangle-\sum_{k=0}^{n-1}\sum_{i,j=0}^{m-1}\phi^{\star}(w_{i}+z_{j}-C_{ijk}).$		(A8)

One of the optimality conditions is

\displaystyle T_{ijk}^{*}=(\phi^{\star})^{\prime}(w_{i}+z_{j}-C_{ijk}).

(A9)

∎

Appendix D Further Details of Synthetic Data in Section 7.1

We created synthetic data using the “Random Generator” class in Python. We set the random seed to $0$ . We here considered that the input data have $n(=50)$ -order cyclic symmetry.

For creating the synthetic $d$ -dimensional input probability vectors $\mathbf{a}$ and $\mathbf{b}$ , we first sampled $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ by $m(=\frac{d}{n})$ -dimensional uniform distribution with the half-open interval $[0.0,1.0)$ . We then created $\mathbf{a}$ and $\mathbf{b}$ by concatenating $n$ copies of $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ , respectively, like 4 and normalized them so that the sum of each is $1$ .

For creating the synthetic input cost matrix $\mathbf{C}$ , we first sampled $\mathbf{C}_{0},\dots,\mathbf{C}_{n-1}$ by $m\times m$ -dimensional Gaussian distribution with the mean $3.0$ and the standard deviation $5.0$ ; note that we selected these parameters to keep the same order of magnitude of metrics, namely the objective function value and the marginal error, in all experiments in Section 7. We then add the absolute minimum value, namely $\left|\min_{k=0,\dots,n-1}\left(\min_{i,j=0,\dots,m-1}C_{ijk}\right)\right|$ , to all entries of $\mathbf{C}_{0},\dots,\mathbf{C}_{n-1}$ to ensure their non-negativity. After that, we created $\mathbf{C}$ by concatenating $n$ copies of $\mathbf{C}_{0},\dots,\mathbf{C}_{n-1}$ like 5.

Appendix E Selected Images for Real-World Data in Section 7.2

In Section 7.2, we selected the 20 images with mirror symmetry ( $n=2$ ) in the NYU Symmetry Database (Cicconet et al. 2017). We here show their images in Figure A2. As you see, there are various kinds of images.

Appendix F Further Details of the Two-Stage Sinkhorn Algorithm used in Section 7.2

In Section 7.2, we stopped Stage 1 in the two-stage Sinkhorn algorithm before convergence to prevent the solution far from optimal for real images. For details, we first run the cyclic Sinkhorn algorithm until the marginal error $||\left(\mathrm{diag}(\widehat{\mathbf{p}})\mathbf{K}\mathrm{diag}(\widehat{\mathbf{q}})\right)^{\top}\mathbf{1}_{m}-\boldsymbol{\beta}||_{2}$ is below $1.0\times 10^{-3}$ . We then run the original Sinkhorn algorithm until the difference between its objective function value and the value obtained by directly solving C-SROT using the original Sinkhorn algorithm is below $1.0\times 10^{-4}$ .

Appendix Appendix Reference

Boyd, S.; and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press.