Submodular Hypergraph Partitioning: Metric Relaxations and Fast Algorithms via an Improved Cut-Matching Game

Submodular Hypergraph Partitioning: Metric Relaxations and Fast Algorithms via an Improved Cut-Matching Game

1 Introduction

2 Our Results

3 Preliminaries

4 Polymatroidal Cut Functions and Hypergraph Flows

5 An $O(\sqrt{\log n})$ -Approximation via $\ell_{2}^{2}$ -Metric Embeddings

6 Localized Convex Formulations and Ratio-Cut Improvement

7 An $O(\log n)$ -Approximation via Generalized Cut-Matching Games

8 Constructing Separated Sets: Proof of Theorem 3

9 Acknowledgements

References

Appendix A Omitted Proofs

Appendix B Comparison to the cut-Improvement of Andersen & Lang

Appendix C Proofs of Theorems 9 and 10

Appendix D Proof of Theorem 11: the cut-matching game reduction

1 Introduction

2 Our Results

3 Preliminaries

4 Polymatroidal Cut Functions and Hypergraph Flows

5 An O​(log⁡n)O(\sqrt{\log n})-Approximation via ℓ22\ell_{2}^{2}-Metric Embeddings

6 Localized Convex Formulations and Ratio-Cut Improvement

5 An $O(\sqrt{\log n})$ -Approximation via $\ell_{2}^{2}$ -Metric Embeddings

7 An $O(\log n)$ -Approximation via Generalized Cut-Matching Games

2.1 The Metric Approach to Submodular Hypergraph Ratio-Cut

2.2 A Non-Convex Optimization Approach To Minimum Ratio-Cut

2.3 An Improved Cut-Matching Game

4.1 Properties and Examples

4.2 Base Polytope, Hypergraph Flows and Polymatroidal Networks

4.3 Hypergraph Flow Embeddings

5.1 Warm-up: Symmetric Polymatroidal Cut Functions

5.2 General Polymatroidal Cut Functions

6.1 Graph Certificates from Approximate Dual Solutions

6.2 Approximately Solving (Local-RC)

7.1 An $\left(\Omega(\log n),O\big{(}\log^{2}n\big{)}\right)$ -good Cut Strategy for Symmetric Cut Functions

7.2 An $\left(\Omega(\log^{2}n),O(\log^{3}n)\right)$ -good Cut Strategy for Asymmetric Cut Functions

7.3 Completing the Cut Strategy: Proof of Theorem 12

A.1 Our Results

A.2 Polymatroidal Cut Functions and Hypergraph Flows

A.3 An $O(\sqrt{\log n})$ -Approximation via $\ell_{2}^{2}$ -Metric Embeddings

A.4 Localized Convex Formulations and Ratio-Cut Improvement

A.5 An $O(\log n)$ -Approximation via Generalized Cut-Matching Games

A.6 Constructing Separated Sets: Proof of Theorem 3

2.1 The Metric Approach to Submodular Hypergraph Ratio-Cut

2.2 A Non-Convex Optimization Approach To Minimum Ratio-Cut

2.3 An Improved Cut-Matching Game

4.1 Properties and Examples

4.2 Base Polytope, Hypergraph Flows and Polymatroidal Networks

4.3 Hypergraph Flow Embeddings

5.1 Warm-up: Symmetric Polymatroidal Cut Functions

5.2 General Polymatroidal Cut Functions

6.1 Graph Certificates from Approximate Dual Solutions

6.2 Approximately Solving (Local-RC)

7.1 An $\left(\Omega(\log n),O\big{(}\log^{2}n\big{)}\right)$ -good Cut Strategy for Symmetric Cut Functions

7.2 An $\left(\Omega(\log^{2}n),O(\log^{3}n)\right)$ -good Cut Strategy for Asymmetric Cut Functions

A.3 An $O(\sqrt{\log n})$ -Approximation via $\ell_{2}^{2}$ -Metric Embeddings

A.5 An $O(\log n)$ -Approximation via Generalized Cut-Matching Games

7.1.1 Ensuring Small Width

7.1.2 Ensuring Large Loss

7.1.3 Implementing MMWU in Nearly-linear Time

7.1.4 Completing the Analysis for Symmetric Cut Functions

7.2.1 Ensuring Small Width

7.2.2 Ensuring Large Loss

7.2.3 Completing the Analysis for Asymmetric Cut Functions

Previous Work on Submodular Hypergraph Partitioning

Previous Work on Cut-Matching Games

Our Contributions

Concurrent Work

Paper Organization

Polymatroidal Cut Functions and Metric Embeddings

Polymatroidal Cut Functions and Hypergraph Flows

Separated Sets

Hypergraphs

Linear Algebra Notation

Preliminaries on Graphs

Factor Graphs

Submodular Functions

Online Linear Optimization and Regret Minimization

Examples of Polymatroidal Cut Functions

Lovász Extensions of Polymatroidal Cut Functions

Base Polytope of Polymatroidal Cut Function

Example: Base Polytope for Standard Hypergraph Cut Function

Example: Base Polytope for Directed Hypergraph Cut Functions

Hypergraph Flows

Connection to Polymatroidal Flows

Rounding via Metric Embedding

Rounding via Metric Embedding

Approximate Solutions

The unbalanced case

The balanced case

Running Time

General Polymatroidal Cut Functions

Graph-Reducible Polymatroidal Cut Functions

Previous Work on Submodular Hypergraph Partitioning

Previous Work on Cut-Matching Games

Our Contributions

Concurrent Work

Paper Organization

Polymatroidal Cut Functions and Metric Embeddings

Polymatroidal Cut Functions and Hypergraph Flows

Separated Sets

Hypergraphs

Linear Algebra Notation

Preliminaries on Graphs

Factor Graphs

Submodular Functions

Online Linear Optimization and Regret Minimization

Examples of Polymatroidal Cut Functions

Lovász Extensions of Polymatroidal Cut Functions

Base Polytope of Polymatroidal Cut Function

Example: Base Polytope for Standard Hypergraph Cut Function

Example: Base Polytope for Directed Hypergraph Cut Functions

Hypergraph Flows

Connection to Polymatroidal Flows

Rounding via Metric Embedding

Rounding via Metric Embedding

Approximate Solutions

Abstract

Definition 1 (Submodular Cut Function).

Definition 2 (Submodular Hypergraph Ratio-Cut).

Definition 3 (Polymatroidal Cut Function).

Theorem 1.

Lemma 1.

Theorem 2 (Informal. See Theorems 9 and 10).

Definition 4.

Definition 5.

Definition 6.

Definition 7.

Corollary 1.

Corollary 2.

Definition 8 ( $\sigma$ -separated sets).

Theorem 3.

Fact 1.

Theorem 4 (Theorem 3.3.3 in [51]).

Fact 2.

Fact 3.

Fact 4.

Definition 9 (Hypergraph Flow Embedding).

Theorem 5.

Theorem 6.

Lemma 2.

Theorem 7 ([10]).

Lemma 3.

Proof.

Lemma 4.

Theorem 8.

Lemma 5.

Proof.

Definition 10 (Ratio-Cut Improvement Problem).

Lemma 6.

Proof.

Lemma 7.

Lemma 8.

Definition 11 (Approximate dual solution).

Definition 12.

Lemma 9.

Definition 13.

Theorem 9.

Theorem 10.

Theorem 11.

Theorem 12.

Lemma 10.

Lemma 11.

Lemma 12.

Proof.

Lemma 13.

Proof.

Lemma 14.

Lemma 15.

Lemma 16.

Proof.

Lemma 17.

Proof.

Lemma 18.

Proof.

Proof.

Lemma 19 (Fact 5.19 from [53]).

Definition 14 ( $(t,b)$ -Balanced Embedding).

Lemma 20.

Lemma 21.

Proof of Lemma 1.

Proof.

Proof.

Proof.

Proof.

Proof Theorem 6.

Proof.

Proof.

Proof.

Proof of Lemma 8.

Proof.

Proof.

Proof.

Proof.

Lemma 22.

Lemma 23 (Lemma 5.5.6 item (3) of [51]).

Proof.

Proof of Theorem 11.

Abstract

Definition 1 (Submodular Cut Function).

Definition 2 (Submodular Hypergraph Ratio-Cut).

Definition 3 (Polymatroidal Cut Function).

Theorem 1.

Lemma 1.

Theorem 2 (Informal. See Theorems 9 and 10).

Definition 4.

Definition 5.

Definition 6.

Definition 7.

Corollary 1.

Corollary 2.

Definition 8 (σ\sigma-separated sets).

Theorem 3.

Fact 1.

Theorem 4 (Theorem 3.3.3 in [51]).

Fact 2.

Fact 3.

Fact 4.

Definition 9 (Hypergraph Flow Embedding).

Theorem 5.

Theorem 6.

Lemma 2.

Theorem 7 ([10]).

Lemma 3.

Proof.

Lemma 4.

Theorem 8.

Lemma 5.

Proof.

Definition 10 (Ratio-Cut Improvement Problem).

Lemma 6.

Proof.

Lemma 7.

Lemma 8.

Definition 11 (Approximate dual solution).

Definition 12.

Lemma 9.

Definition 8 ( $\sigma$ -separated sets).

Definition 14 ( $(t,b)$ -Balanced Embedding).

Antares Chen
[email protected]
University of Chicago Lorenzo Orecchia
[email protected]
University of Chicago Erasmo Tani
[email protected]
University of Chicago

Despite there being significant work on developing spectral [16, 39, 38], and metric embedding [48] based approximation algorithms for hypergraph generalizations of conductance, little is known regarding the approximability of hypergraph partitioning objectives beyond this.

This work proposes algorithms for a general model of hypergraph partitioning that unifies both undirected and directed versions of many well-studied partitioning objectives. The first contribution of this paper introduces polymatroidal cut functions, a large class of cut functions amenable to approximation algorithms via metric embeddings and routing multicommodity flows. We demonstrate an $O(\sqrt{\log n})$ -approximation, where $n$ is the number of vertices in the hypergraph, for these problems by rounding relaxations to metrics of negative-type.

The second contribution of this paper generalizes the cut-matching game framework of Khandekar et al. [35] to tackle polymatroidal cut functions. This yields the first almost-linear time $O(\log n)$ -approximation algorithm for standard versions of undirected and directed hypergraph partitioning [38]. A technical consequence of our construction is that a cut-matching game which greatly relaxes the set of allowed actions for both players can be used to partition hypergraphs with negligible impact on the approximation ratio. We believe this to be of independent interest.

In the past 20 years, increasing complexity in real-world data has necessitated generalizing graph models to represent higher order relations [3, 15]. Hypergraphs have served as the quintessential object to model relations involving multiple objects, and, as a consequence, the study of hypergraph algorithms has featured prominently in recent practical [12, 57] and theoretical developments [16, 17, 47, 48].

Hypergraph partitioning is a fundamental algorithmic problem in this broader landscape. From an unsupervised machine learning perspective, hypergraph partitioning enables detecting significant clusters in complex higher-order networks, such as social [57, 61] or biological networks [28, 37]. From a theoretical perspective, partitioning algorithms are a fundamental primitive to be exploited in the development of divide-and-conquer methods for other hypergraphs problems. In contrast to graphs, where there is only one way for a partition to intersect an edge, hypergraphs admit multiple definitions for the cost of cutting a hyperedge. This yields a broad spectrum partitioning objectives, and allows the end-user to better design partitioning problems that fit specific applications. While there has been significant work on developing spectral approximation algorithms for hypergraph generalizations of conductance [16, 39, 38], with few exceptions [48], little is known about the approximability of large classes of hypergraph partitioning objectives beyond this. Even less is known regarding fast algorithms for these tasks.

This paper proposes a general model for partitioning hypergraphs. The first half of this paper identifies a large class of hypergraph partitioning objectives, which we call polymatroidal cut functions. This class is a subset of submodular cut functions previously considered by [43, 44] and captures both undirected and directed versions of almost all partitioning measures found in prior theoretical work. We show that these objectives possess properties that imply they are particularly well-suited for approximation methods based on metric embeddings and routing multi-commodity flows. This leads us to polynomial time $O(\sqrt{\log n})$ -approximations for these objectives based on rounding $\ell_{2}^{2}$ -metric relaxations. The second half of this paper constructs fast algorithms that achieve similar $O(\log n)$ -approximations by generalizing the cut-matching framework of Khandekar et al. [35]. This produces the first almost-linear time polylogarithmic approximation for both undirected and directed hypergraph conductance.

Let us now define concepts relevant to hypergraph partitioning. A hypergraph is a tuple $G=(V,E)$ , where $E\subseteq 2^{V}$ is a set of hyperedges. The rank of a hyperedge $h\in E$ is its cardinality $\lvert h\rvert$ . If every $h\in E$ satisfies $\lvert h\rvert=k$ , the hypergraph $G$ is called $k$ -uniform. Hence, an undirected graph is a $2$ -uniform hypergraph. A key step in formalizing hypergraph cut problems is to define a suitable cut function $\delta_{h}:2^{h}\to[0,1]$ on each hyperedge $h\in E$ . This expresses the cost incurred when $h$ is cut by $(A,\bar{A})$ into $A\cap h$ and $\bar{A}\cap h$ . Central to our work are submodular cut functions, defined by [43, 44]:

A function $\delta_{h}:2^{h}\to[0,1]$ is a submodular cut function if it is submodular and satisfies $\delta_{h}(\varnothing)=\delta_{h}(h)=0$ .

The space of cut functions for an edge $\{i,j\}$ is spanned by the undirected edge cut function $\delta_{\{i,j\}}(S)=\lvert{\bf 1}^{S}_{i}-{\bf 1}^{S}_{j}\rvert$ and the directed edge cut function $\delta_{(i,j)}(S)=({\bf 1}^{S}_{i}-{\bf 1}^{S}_{j})_{+}$ , giving a complete classification of cut functions for rank- $2$ hyperedges. Higher-rank hyperedges support broader choices of cut functions. The most common example is the standard hypergraph cut function [16, 47]: $\delta^{\mathsf{cut}}_{h}(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min\{1,\lvert S\rvert,\lvert h\setminus S\rvert\}$ , which takes value $1$ if $h$ is cut by $S$ and $0$ otherwise. If the hypergraph $G=(V,E)$ is associated with positive hyperedge weights $\mathbf{w}\in\mathbb{Z}^{E}_{>0}$ , we can succintly write the cut function for the entire graph as $\delta_{G}(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{h\in E}w_{h}\cdot\delta_{h}(S)$ .

With these definitions, we are interested in minimizing the submodular hypergraph ratio-cut objective.

Given a hypergraph $G=(V,E)$ with positive hyperedge weights $\mathbf{w}\in\mathbb{Z}^{E}_{>0}$ , non-negative vertex weights $\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}$ , and a collection of submodular cut functions $\{\delta_{h}\}_{h\in E}$ , the ratio-cut objective on $G$ is defined for all $S\subseteq V$ as:

\Psi_{G}(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{\delta_{G}(S)}{\min\{\mu(S),\mu(\bar{S})\}}=\frac{\sum_{h\in E}w_{h}\cdot\delta_{h}(S\cap h)}{\min\{\mu(S),\mu(\bar{S})\}}\,.

The ratio-cut of $G$ is $\Psi^{*}_{G}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min_{S\subset V}\Psi_{G}(S)$ .

When specialized to graphs¹¹1In the rest of the paper, for any undirected (resp. directed) graphs $G$ , unless otherwise specified, we assume that $\delta_{G}$ and $\Psi_{G}$ are formed using the undirected (resp. directed) cut functions. ratio-cut objectives include both graph expansion ( $\bm{\mu}=\boldsymbol{1}$ ), graph conductance ( $\forall i\in V,\mu_{i}=\sum_{h\in E:i\in h}w_{h}$ ). It also captures their directed counterparts, by replacing the undirected cut function $\delta_{\{i,j\}}$ with its directed analogue $\delta_{(i,j)}$ .

The standard hypergraph partitioning problem was first studied in the contexts of parallel numerical algorithms [15] and scientific computing [25].

Since then, many high-quality approximation algorithms have been developed for approximating hypergraph expansion and conductance. Louis and Makarychev [48] gave the first randomized polynomial-time $O\big{(}\sqrt{\log n}\big{)}$ -approximation algorithm for hypergraph expansion. This matches the best known approximation for the graph expansion [10], and uses the same relaxation to metrics of negative-type. Louis [47] and Chan et al. [16] developed a spectral approach towards approximation hypergraph conductance, achieving Cheeger-like guarantees. Their algorithms require solving semidefinite programs (SDP) to approximate the hypergraph spectral gap, returning a partition with conductance at most $O\big{(}\sqrt{\Psi^{*}_{G}\log\max_{h\in E}|h|}\big{)}$ . Their results are known to be tight under the small-set expansion conjecture. In more recent work, Lau et al. used a similar approach to obtain a $O(\sqrt{\Psi^{*}_{G}\log(1/\Psi^{*}_{G}})$ guarantee for directed graph conductance [39].

Much less is known about the approximability of hypergraph ratio-cut objectives for general submodular cut functions. Following the work of Yoshida [62], Li and Milenkovic [43, 44] proposed submodular cut functions. For submodular hypergraph conductance, rounding an SDP relaxation yields algorithms with conductance guarantees $O\big{(}\sqrt{\Psi^{*}_{G}\cdot\log|V|}\big{)}$ [62] and $O\big{(}\sqrt{\Psi^{*}_{G}\cdot\max_{h\in E}|h|}\big{)}$ [44]. Li and Milenkovic [44] further conjecture that improving the dependence on hypergraph rank is $\mathsf{NP}$ -hard. The work of Yoshida [62] and the survey by Veldt et al. [60] provide excellent discussions of submodular cut functions and their applications.

Algorithms for the above settings require solving SDP relaxations over $\lvert V\rvert\times\lvert V\rvert$ symmetric matrices, with at least $\lvert E\rvert$ constraints. No faster algorithms are currently known beyond generic SDP solvers [32], which run in $\Omega\big{(}\lvert V\rvert^{2}\lvert E\rvert\big{)}$ time. Even for the simplest form of standard hypergraph partitioning, the resulting SDP relaxation is a mixed packing-and-covering program for which no almost-linear time algorithm is currently known [31].

In order to develop fast approximation algorithms, one then considers primal-dual methods that solve the dual to metric relaxations for ratio-cut problems. This poses an immediate issue; the dual to these relaxations are given by multi-commodity flow problems whose demands are encoded by a $\bm{\mu}$ -weighted complete graph. Such problems may require quadratic time to solve [8] due to the large number of demands. The cut-matching game [36] was originally a method of approximately reducing this multi-commodity flow to a polylogarithmic number of exact single-commodity flow computations. Each single-commodity flow problem would route a perfect matching in the graph, until the union of matchings approximates the desired demand graph.

Through the cut-matching game, Khandekar et al. [36] obtained a $O(\log^{2}|V|)$ -approximation to undirected graph expansion by exactly solving $O(\log^{3}|V|)$ maximum flow problems. Later, Orecchia et al. [54] reduced both of these measures by $O(\log n)$ . Subsequent works have vastly generalized the applicability of the cut-matching game: Louis [46] extends [36] to approximate directed graph expansion, Long and Saranurak extend [35] to approximate vertex and hypergraph expansion, and [49] demonstrate how to use the cut-matching game using approximate single-commodity flow solves. The cut-matching game has since become ubiquitous in designing fast deterministic algorithms for various static, and dynamic graph problems [13, 21, 22, 23].

In this paper, we achieve the following.

•

We identify the class of polymatroidal cut functions, a subset of submodular cut functions amenable to metric and flow techniques. For this class, we develop notions of hypergraph flows and flow-embeddings of graphs into hypergraphs which certify lower bounds to the ratio-cut objective.
•

Using metric embedding techniques, we give polynomial-time $O(\sqrt{\log n})$ -approximation algorithms for the minimum ratio-cut problem over submodular hypergraphs with polymatroidal cut functions. Our algorithms generalize the seminal result of Arora, Rao and Vazirani [8] for sparsest cut to the largest class of problems to date.
•

We extend the cut-matching game framework to approximate minimum ratio-cuts on submodular hypergraphs with polymatroidal cut functions. Our approach is based on approximating a continuous non-convex formulation of the minimum ratio-cut via a family of “local” convex submodular minimization problems. We then use the cut-matching framework to “boost” solutions to problems in this family, producing an $O(\log n)$ -approximation algorithm for minimum ratio-cut that only requires a polylogarithmic number of approximate submodular minimization solves. This amounts to the first almost-linear time polylogarithmic approximation for both undirected, and directed hypergraph conductance.

A technical byproduct of our cut-matching game is that it allows for a cut player to play arbitrary disjoint vertex sets, and requires only a single $O(1)$ -approximate submodular minimization solve per round. This may impact the deployment of cut-matching games in practice by lowering the required precision for the current running-time bottleneck of algorithms based on this framework [52, 58] as previous applications required either a single $o_{n}(1)$ -approximate solves, or a polylogarithmic number of $O(1)$ -approximate solves.

Our cut-matching game additionally avoids combinatorial restrictions found in previous applications of the framework. For example, the submodular minimization problem present in approximating hypergraph conductance is equivalent to a single-commodity flow whose demand graph is neither a perfect matching in the undirected case, nor Eulerian in the directed case. This may simplify the analyses of down-stream applications of the cut-matching game such as those found in [13, 21, 22, 23, 45, 49].

Subsequent to initial versions of this paper [6], Veldt [58] provided a generalization of the cut-matching framework to approximate minimum ratio-cut problems specified by cardinality-based symmetric submodular cut functions, a subset of polymatroidal cut functions. Their approach utilizes exact maximum flow solves to compute hypergraph cut preservers, a gadget which lower bounds the ratio-cut objective only when the cut function is cardinality-based [60, 59]. Unlike this work, [58] provides compelling empirical evaluations which should certainly be considered.

Independent of this work, Lau et al. [40] give an almost-linear time $O(\sqrt{\log n})$ -approximation for standard undirected and directed hypergraph cut functions. Their algorithm solves the dual to a relaxation of hypergraph conductance given by adding $\ell_{2}^{2}$ -metric constraints to a minimum reweighted eigenvalue problem [39]. This yields a multi-commodity flow problem, which they demonstrate to be approximable using a sequence of single-commodity flows via an interesting generalization of Sherman’s algorithmic chaining [55]. Their algorithm is not based on the cut-matching game, and require similar combinatorial restrictions when routing the single-commodity flow as found in previous works.

The paper is organized as follows:

•

Section 2 provides a technical overview of the results, and proof techniques found in this paper.
•

Section 3 outlines preliminary tools and notation.
•

Section 4.1 introduces polymatroidal cut functions, and provides key examples of functions in this class.
•

Sections 4.2 and 4.3 introduce hypergraph flow, and hypergraph flow embeddings. These serve as fundamental objects in constructing certificates of approximate optimality in the cut-matching game.
•

Section 5 details our $O(\sqrt{\log n})$ -approximation for minimum ratio-cut given polymatroidal cut functions, describing the $\ell_{2}^{2}$ -metric relaxation, and embedding results required to round an integral solution.
•

Section 6 describes our non-convex approach to the minimum ratio-cut problem and its connection to the cut-matching game.
•

Section 7 describes our cut-matching game framework and uses it to construct an $O(\log n)$ -approximation algorithm to the minimum ratio-cut problem with polymatroidal cut functions.

The proof of several technical statements encountered throughout the paper are postponed to the appendix.

The best polynomial-time approximation algorithms for graph ratio-cut problems are obtained by combining metric relaxations, either to general metrics [42] or $\ell_{2}^{2}$ -metrics [10], with metric embedding results for rounding. They achieve respectively an $O(\log n)$ and a $O(\sqrt{\log n})$ -approximation ratio. We study the applicability of this approach to the submodular hypergraph ratio-cut problem. At the outset, we remark that we should not hope to obtain polylogarithmic approximation for general submodular cut functions without further assumptions, as Svitkina and Fisher (Section 3 in [56]) exhibit a submodular cut function for which any $o(\sqrt{n})$ -approximation requires exponentially many queries to a value oracle.

Given this negative result, it is necessary to restrict the problem to a smaller class of cut functions. At a high level, for a metric relaxation to work, the cut function must exhibit some kind of monotonicity, so that its value is preserved under low-distortion metric embeddings. However, a cut function cannot be monotone without being equal to $0$ , so the monotone structure must appear in some other way. To solve this issue, we introduce the class of polymatroidal cut functions, which are the infimal symmetrization of monotone non-decreasing submodular functions (see below). We believe that this new class captures most cut functions that are amenable to approximation via metric relaxation. As partial evidence to this statement, we show in Section 4 that the fractional dual objects associated with polymatroidal cut functions are closely related to polymatroidal flows, a well-studied generalization of network flows with many of the same favorable flow-cut gap properties [19].

Our first contribution is to define polymatroidal cut functions, a subclass of submodular cut functions given by the infimal convolutions of non-decreasing submodular functions.

A cut function $\delta_{h}:2^{h}\to\mathbb{R}_{\geq 0}$ is polymatroidal if, for all $S\subseteq h$ , it can be expressed as

\displaystyle\delta_{h}(S)

\displaystyle=\min\big{\{}F^{-}_{h}(S),\;F^{+}_{h}(h\setminus S)\big{\}},

where the associated functions $F_{h}^{-},F^{+}_{h}:2^{h}\rightarrow\mathbb{R}$ are non-decreasing submodular functions such that $F^{-}_{h}(\varnothing)=F^{+}_{h}(\varnothing)=0.$ When the associated functions $F^{-}_{h}$ and $F^{+}_{h}$ are identical, we refer to the cut function $\delta_{h}$ as symmetric polymatroidal.

Note that $\delta_{h}$ penalizes a cut $(A,\overline{A})$ intersecting $h$ in a directed manner, with $F^{-}$ penalizing $A\cap h$ and $F^{+}$ penalizing $\overline{A}\cap h$ . For a symmetric polymatroidal cut function $\delta_{h}$ , such penalty is the same, so that $\delta_{h}(A\cap h)=\delta_{h}(\overline{A}\cap h)$ , generalizing the cut function of undirected graphs and hypergraphs. We show in Section 4 that Definition 3 captures most of the cut functions proposed in previous work, including all directed and undirected standard hypergraph cut functions. Moreover, our setup also captures possible combinations of all these types of cut functions into a single framework. We highlight two additional examples that are relevant in applications:

•

$F_{h}$ is a cardinality-based non-decreasing submodular function [59], i.e., a non-decreasing concave function of the cardinality. This is useful when we wish to modify the star-expansion cut function by diminishing the penalty on balanced partitions of $h,$ e.g., by taking $\delta_{h}(S)=\min\big{\{}|S|^{p},|h\setminus S|^{p}\big{\}}$ for $p\in(0,1).$
•

$F_{h}$ is the entropy of a subset of random variables associated with vertices of $h$ . When the hypergraph $G$ is the factor graph of an undirected graphical model, this yields a variant of the minimum information partition [30, 50].

In Section 5, we investigate the polynomial-time approximability of ratio-cut problems over submodular hypergraphs with polymatroidal cut functions. In particular, we prove the following result.

There exists a randomized polynomial-time $O\big{(}\sqrt{\log|V|}\big{)}$ -approximation algorithm for solving the minimum ratio-cut problem on weighted submodular hypergraphs equipped with polymatroidal cut functions.

This result extends the $O\big{(}\sqrt{\log n}\big{)}$ -approximation of Louis and Makarychev [48] for standard hypergraph sparsest cut to all ratio-cut problems for the more general class of polymatroidal functions. Our algorithm for Theorem 1 solves a novel $\ell_{2}^{2}$ -metric relaxation of the ratio-cut problem for weighted submodular hypergraphs (see the program RC-Directed-Semimetric in Section 5). This relaxation crucially exploits the form of the Lovász extension of polymatroidal cut functions. To the best of our knowledge, our results constitutes the broadest known generalization of the seminal result of Arora, Rao and Vazirani for sparsest cut, capturing a wealth of partitioning objectives, including directed and undirected graph ratio-cut, vertex-based ratio cut and hypergraph ratio-cut within a single simple analysis. It also improves on the best known approximation ratio for many other objectives in this class, such as cardinality-based cut functions [59].

In the second part of the paper, we explore the efficient solution of the minimum submodular hypergraph ratio cut problem via flow methods based on the cut-matching game. We start by developing the notions of flow-cut duality and flow embeddings over hypergraphs. For this purpose, in Section 4.2, we define a novel notion of hypergraph flows for weighted submodular hypergraphs equipped with polymatroidal cut functions. Specifically, for a weighted submodular hypergraph $G=(V,E,\mathbf{w},\bm{\mu})$ , we show that the subgradients of polymatroidal cut functions behave like network flows over the factor graph, a graph analogue of the input submodular hypergraph that includes an auxiliary vertex for each hyperedge $h\in E$ and an edge $\{i,h\}$ for each $i\in h.$ For each hyperedge $h\in E$ , the monotone functions $F_{h}^{-}$ and $F_{h}^{+}$ define capacity constraints on the flows entering and leaving the corresponding auxiliary node. In this process, we highlight new connections between submodular hypergraphs and the polymatroidal flows described by Lawler and Martell [41] and extended to the multi-commodity flow setting by Chekuri et al. [19]. The main algorithmic result in this part of the paper is a flow decomposition result for hypergraph flows (Theorem 6) which allows us to construct flow embeddings of directed and undirected graphs into hypergraphs while lower bounding the size of the hypergraph cut with the size of the corresponding graph cut. This will become a fundamental building block for our cut-matching game construction.

To generalize the cut-matching game framework to the setting of submodular hypergraphs with polymatroidal cut functions, we devise a novel optimization approach based on a continuous non-convex formulation (RC-NonCvx) of the submodular hypergraph ratio-cut problem. Given a weighted submodular hypergraph $G=(V,E,\mathbf{w},\bm{\mu})$ , and cut functions $\{\delta_{h}\}_{h\in E}$ with Lovász extensions $\{\bar{\delta}_{h}\}_{h\in E}$ , we define (RC-NonCvx) as the following non-convex optimization problem over $\mathbb{R}^{V}$ :

	$\displaystyle\bar{\Psi}_{G}(\mathbf{x})$	$\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})}{\min_{u}\\|\mathbf{x}-u{\bf 1}\\|_{\bm{\mu},1}}$
	$\displaystyle\bar{\Psi}_{G}^{*}$	$\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min_{\mathbf{x}\in\mathbb{R}^{V}}\bar{\Psi}_{G}(\mathbf{x})$		(RC-NonCvx)

The fact that solving (RC-NonCvx) is equivalent to solving the minimum ratio-cut problem (expressed in the following lemma) is a direct consequence of the submodularity of the cut functions. It is proved in full in Appendix A.

Given a weighted hypergraph $G^{*}=(V,E,\bm{\mu},\mathbf{w})$ , we have $\Psi_{G}^{*}=\bar{\Psi}_{G}^{*}$ . Furthermore, there exists an algorithm that, given any $\mathbf{x}\in\mathbb{R}^{V}$ , recovers a cut $S\subseteq V$ satisfying

\Psi_{G}(S)\leq\bar{\Psi}_{G}(\mathbf{x})

in time $O\big{(}|V|\log|V|+\mathsf{sp}(G)\big{)}$ , where the sparsity of $G$ is defined as $\mathsf{sp}(G)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{h\in E}|h|.$

The main idea in our approach is to address the non-convexity of the $\bar{\Psi}_{G}$ objective by replacing the non-concave denominator with a linear lower bound. In particular, we can exploit the dual characterization of the $\ell_{1}$ -norm in the lower bound to write:

\min_{u\in\mathbb{R}}\lVert\mathbf{x}-u\boldsymbol{1}\rVert_{1,\bm{\mu}}=\min_{u\in\mathbb{R}}\max_{\lVert\mathbf{y}\rVert_{\infty}\leq 1}\langle\mathbf{y},\mathbf{x}-u\boldsymbol{1}\rangle_{\bm{\mu}}=\max_{\begin{subarray}{c}\lVert\mathbf{y}\rVert_{\infty}\leq 1\\ \langle\mathbf{y},\boldsymbol{1}\rangle_{\bm{\mu}}=0\end{subarray}}\langle\mathbf{y},\mathbf{x}\rangle_{\bm{\mu}}

(2)

This calculation directly leads us to define a novel localized, convex formulation of the (RC-NonCvx) problem for each vector $\mathbf{s}\in\mathbb{R}^{V}$ with $\|\mathbf{s}\|_{\infty}\leq 1$ and $\langle\mathbf{s},{\bf 1}\rangle=0,$ which we call a seed:

	$\displaystyle\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x})$	$\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})}{\max\{0,\langle\mathbf{s},\mathbf{x}\rangle_{\bm{\mu}}\}}$
	$\displaystyle\bar{\Psi}_{G,\mathbf{s}}^{*}$	$\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min_{\mathbf{x}\in\mathbb{R}^{V}}\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x})$		(Local-RC)

Intuitively, the program (Local-RC) seeks a distribution over cuts $\mathbf{x}$ with small expected cut size and high correlation with the seed $\mathbf{s}.$ In Section 6, we investigate the properties of this formulation, including its equivalence with an integral cut problem, the ratio-cut improvement problem (Definition 10 in Section 6), which generalizes the cut improvement problem of Andersen and Lang [7] to submodular hypergraphs. For polymatroidal cut functions, we show that the natural notion of a dual solution for the convex problem Local-RC is exactly a hypergraph flow with demand vector $\operatorname{diag}(\bm{\mu})\mathbf{s}.$ By applying our flow decomposition result, we can turn such a hypergraph flow into a dual graph certificate, a graph $D$ such that $\Psi^{*}_{G,\mathbf{s}}\cdot\delta_{D}(S)\leq\delta_{G}(S)$ for all $S\subseteq V.$ To construct these objects, we prove the following algorithmic result:

An approximate primal-dual solution to $\eqref{eq.rc-local}$ can be computed by solving $O(\log|V|)$ -many decomposable submodular minimization problems for general polymatroidal cut functions or $O(\log|V|)$ -many $\nicefrac{{1}}{{2}}$ -approximate maximum flow problems over a graph of size $O(\mathsf{sp}(G)).$

Finally, and crucially for our purposes, for all $\mathbf{x}\in\mathbb{R}^{V},$ we have $\bar{\Psi}_{G}(\mathbf{x})\leq\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x})$ by Equation 2. Hence, any (not necessarily optimal) solution $\mathbf{x}$ to (Local-RC) for a seed $\mathbf{s}$ yields a solution to (RC-NonCvx) of value at most $\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x})$ and, via Lemma 1, a cut $S$ with $\Psi_{G}(S)\leq\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x}).$ In other words, if we can somehow find a seed $\mathbf{s}$ such that $\bar{\Psi}^{*}_{G,\mathbf{s}}\leq\alpha\Psi^{*}_{G}$ , then solving the (Local-RC) will yield an $\alpha$ -approximation algorithm for $\Psi^{*}_{G}.$

This reduction may not seem very useful, as finding the required seed may be as hard as solving the original problem. However, we can now exploit the dual feedback obtained when solving (Local-RC) for a sequence of seeds $\mathbf{S}=(\mathbf{s}_{1},\mathbf{s}_{2},\ldots,\mathbf{s}_{T})$ to effectively guide our search for a good seed and boost our approximation ratio. In particular, suppose we can adaptively construct a sequence of $T$ seeds $\mathbf{S}$ such that the sum $H=\sum_{t=1}^{T}D_{t}$ of the corresponding dual demand graphs $D_{t}$ has large ratio-cut, i.e., $\Psi_{H}^{*}\geq\beta.$ Then, it is easy to show (see Theorem 11 and its proof) that we must have $\min_{t=1}^{T}\Psi^{*}_{G,\mathbf{s}_{t}}\leq\nicefrac{{T}}{{\beta}}\cdot\Psi^{*}_{G}.$ Hence, solving (Local-RC) for one of the seeds in the sequence $\mathbf{S}$ must yield a $\nicefrac{{T}}{{\beta}}$ -approximation to the minimum ratio-cut problem. It turns out that the problem of constructing such sequence $\mathbf{S}$ is precisely the task addressed by the cut-matching framework of Khandekar et al. [36], which we generalize and strengthen in our next contribution.

We now describe our new definition of the cut-matching game. A key distinction between our setup and the previous lies in what constitutes admissible actions played by the cut and matching players.

Let $V$ be a set of vertices with vertex weights $\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}$ . A cut action is a pair $(A,B)$ of non-empty disjoint subsets $A,B\subseteq V$ such that $\bm{\mu}(A)\leq\bm{\mu}(B).$

The cut player is no longer restricted to returning a partition of the vertex set, let alone a bisection. The set of valid actions for a matching player is similarly relaxed, as the edges of the response do not need to form a perfect matching. It is only required that enough of the available degree is routed across the cut action.

An approximate matching response to a cut action $(A,B)$ is an weighted bipartite graph $D=\big{(}A\cup B,E_{D},\mathbf{w}^{D},\bm{\mu}\big{)}$ satisfying the following conditions.

1.

Bounded degree: for all $i\in V$

$\deg_{D}(i)\leq\begin{cases}\mu_{i}\;&i\in A\\ \frac{\bm{\mu}(A)}{\bm{\mu}(B)}\cdot\mu_{i}\;&i\in B\end{cases}\,.$
2.

Largeness: $\mathbf{w}^{D}\big{(}E(A,B)\big{)}\geq{\bm{\mu}(A)\over 2}$ .

We may emphasize that $D$ is a directed approximate matching response, if the edges of $D$ are directed from $A$ to $B.$

This definition is motivated by the fact that an approximate matching response naturally arises from the dual graph certificate obtained by approximately solving the (Local-RC) program above. The matching response will be undirected when we wish to approximate the minimum ratio-cut of a hypergraph with symmetric cut functions and directed in the general case of asymmetric cut functions. Let us now define the cut-matching game.

Let $n>0$ , and $\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}$ be a weighting over a set of vertices $V$ where $\lvert V\rvert=n$ . A $(n,\bm{\mu})$ -generalized cut-matching game is a multi-round, two-player game between a cut player $\mathcal{C}$ , and a matching player $\mathcal{M}$ that proceeds as follows. In the $t$ -th round of the game, following definitions 4 and 5:

1.

$\mathcal{C}$ plays a cut action $(A_{t},B_{t})$ ,
2.

$\mathcal{M}$ responds with an approximate matching response $D_{t}$ to $(A_{t},B_{t})$ .

The state graph after $t$ rounds is $H_{t}=\sum_{s=1}^{t}D_{s}$ the edge-wise sum of the matching response seen thus far.

For comparison, the original cut-matching game of [36] is captured by the above definition when one specifically considers graph expansion $\bm{\mu}=\boldsymbol{1}$ , restricts cut actions to be exact bisections $(A,\bar{A})$ where $\lvert A\rvert=\lvert\bar{A}\rvert$ , and requires exact matching responses: $\mathbf{w}^{D}\big{(}E(A,\bar{A})\big{)}=\bm{\mu}(A)$ .

To approximate minimum ratio-cut using our cut-matching games, we seek a cut player $\mathcal{C}$ that outputs a sequence of cuts which, in as few cuts as possible, forces the matching player $\mathcal{M}$ to place edges that make the minimum ratio-cut objective for the state graph $H_{T}$ large. We call a cut player good if it can achieve this.

A cut strategy is a randomized algorithm $\mathcal{A}_{\mathsf{cut}}$ that takes as input the current state graph $H$ , and outputs a cut action $(A,B)$ . A cut strategy $\mathcal{A}_{\mathsf{cut}}$ is $\big{(}f(n),g(n)\big{)}$ -good if a cut player $\mathcal{C}$ using $\mathcal{A}_{\mathsf{cut}}$ at every iteration, achieves:

\Psi_{H_{g(n)}}\geq f(n)\,\,

with constant probability, for any (potentially adaptive) sequence of approximate matching responses $D_{1},\ldots,D_{g(n)}$ .

Note that, when dealing with symmetric cut functions, $H$ will be undirected, so that $\Psi_{H}$ refers to the undirected ratio-cut objective. If the cut functions are asymmetric, then $\Psi_{H}$ is naturally replaced with the directed ratio-cut objective.

These definitions reveal the mechanism by which the cut-matching framework adaptively constructs a sequence of seeds to (Local-RC): the seed we choose at every iteration of cut-matching game is just a $\bm{\mu}$ -weighted indicator vector for the cut action of that round. A good strategy guarantees that, no matter what dual graph certificates we receive for each seed, at the end of the game at $T=g(n)$ iterations, we can guarantee $H_{T}$ has ratio-cut objective at least $\beta=f(n)$ . This shows that a good strategy yields an approximation ratio of $O(\nicefrac{{g(n)}}{{f(n)}})$ for minimum ratio cut. In Section 7, we formalize this result as Theorem 11. Our main technical theorem (Theorem 12) regarding the new cut-matching game shows the existence of good strategies for the cut-player that yield $O(\log n)$ -approximation algorithms for the minimum ratio-cut problem over submodular hypergraphs, for both the cases of symmetric and general polymatroidal cut functions. The proof of this theorem is based on a new geometric result about separated sets in vector embeddings, which we discuss below. Together with Theorem 2, our results on good cut-player strategies directly imply the following results for the solution of the minimum ratio-cut problem. Their short proof appear in Appendix A.5.

There exists an $O(\log n)$ -approximation algorithm for the minimum ratio-cut problem over submodular hypergraphs with polymatroidal cut functions whose running time is dominated by the solution of a polylogarthmic number of decomposable submodular minimization problems over the hypergraph cut function.

There exists an $O(\log n)$ -approximation algorithm for the minimum ratio-cut problem over submodular hypergraphs equipped with the directed or standard hypergraph cut function whose running time is almost linear in the hypergraph sparsity.

In our new cut-matching game, the cut player must pay considerable more care in choosing a cut action $(A,B)$ than in the original version, as the approximate nature of the matching response means that the matching player may easily keep some small subset of vertices disconnected from the rest of the graph. Suppose for instance that the cut player restricted itself to play $\mu$ -bisections. Then, the matching player could select a small cut $T$ , with $\mu(T)\leq\nicefrac{{1}}{{10}}\cdot\mu(V)$ and never place any edge across the boundary of $T.$ The only way the cut player can force progress on $T$ is to depart from playing bisections and play a cut action $(A,B)$ where $A$ is close or equal to $T.$

Let us see how this intuition is formalized in our design of cut-player strategies. As in the analysis of Orecchia et al. [54], our cut strategy maintains at each iteration $t$ a spectral embedding for the current state graph $H_{t}$ . We make progress if we can force the matching player to connect vertices that are far away in this embedding. We show that we can achieve this, even against an approximate matching player, by choosing a cut action coming from the following notion of separated sets:

Given an embedding $\{\mathbf{v}_{i}\}_{i\in[n]}$ and a measure $\bm{\mu}\in\mathbb{R}^{n}_{\geq 0}$ we say two sets $S,T\subseteq[n]$ are $\sigma$ -separated if:

\bm{\mu}(S)\cdot\bm{\mu}(T)\cdot\min_{\begin{subarray}{c}i\in S\\ j\in T\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}\geq\sigma\cdot\sum_{i,j\in\binom{V}{2}}\mu_{i}\mu_{j}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}.

This definition generalizes the idea of well-separated sets [10], which was applied only to large balanced sets, to possibly unbalanced sets, as long as $S$ and $T$ contribute a $\sigma$ -fraction to the total variance of the embedding. In Section 8, we show how to efficiently find separated sets and prove the following result.

Given an embedding $\{\mathbf{v}_{i}\}_{i\in[n]}\subseteq\mathbb{R}^{d}$ and a measure $\bm{\mu}\in\mathbb{Z}^{n}_{\geq 0}$ , with $\mu(V)=\mathsf{poly}(n)$ , there exists a randomized algorithm running in time $\tilde{O}(nd)$ that outputs $\sigma$ -separated sets $S$ and $T$ for $\sigma=\Omega(\nicefrac{{1}}{{\log n}})$ with high probability.

The sparsity $\mathsf{sp}(G)$ of a hypergraph is defined as $\mathsf{sp}(G)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{h\in E}|h|.$ Throughout the paper, we assume that the edge weights $\mathbf{w}$ and vertex weights $\bm{\mu}$ of our instance submodular hypergraph $G=(V,E,\mathbf{w},\bm{\mu})$ are polynomials in $|V|.$ In particular, we have ${\bm{\mu}}(V)\in\mathsf{poly}(|V|).$

Throughout this paper, we adopt the convention of representing scalars $x$ in unbolded lowercase, vectors $\mathbf{x}$ in boldface lowercase, and matrices $\mathbf{X}$ as boldface uppercase. Given vectors $\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}$ , we will use $\mathbf{x}\circ\mathbf{y}$ to denote the Hadamard (entry-wise) product between $\mathbf{x}$ and $\mathbf{y}$ . We will use $\lvert\mathbf{x}\rvert$ to denote the vector whose entries are the absolute value of the entries in $\mathbf{x}$ . We let $\langle\mathbf{x},\mathbf{y}\rangle$ denote the standard Euclidean inner product, $\lVert\cdot\rVert$ the Euclidean norm, and $\lVert\cdot\rVert_{p}$ the $\ell_{p}$ -norm.

We define an inner product between $\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}$ weighted by a non-negative vector $\bm{\mu}\in\mathbb{R}^{n}_{\geq 0}$ as $\langle\mathbf{x},\mathbf{y}\rangle_{\bm{\mu}}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{i=1}^{n}\mu_{i}\cdot x_{i}\cdot y_{i}\,.$ The quantity $\lVert\mathbf{x}\rVert_{\bm{\mu}}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sqrt{\langle\mathbf{x},\mathbf{x}\rangle_{\bm{\mu}}}$ will denote the norm induced by the inner product $\langle\cdot,\cdot\rangle_{\bm{\mu}}$ , while for any $p>0$ , we use $\lVert\mathbf{x}\rVert_{\bm{\mu},p}$ to denote the $\ell_{p}$ -norm weighted by $\bm{\mu}$ $\lVert\mathbf{x}\rVert_{\bm{\mu},p}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\left(\sum_{i=1}^{n}\mu_{i}\cdot\lvert x_{i}\rvert^{p}\right)^{1/p}\,.$ Note consequently that $\lVert\mathbf{x}\rVert_{\bm{\mu}}=\lVert\mathbf{x}\rVert_{\bm{\mu},2}$ . For $\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}$ , we write $\mathbf{x}\perp\mathbf{y}$ to indicate the relation: $\langle\mathbf{x},\mathbf{y}\rangle=0$ .

Given any vector $\mathbf{x}\in\mathbb{R}^{n}$ , we define the positive and negative parts of $\mathbf{x}$ as the vectors $\mathbf{x}_{+},\mathbf{x}_{-}\in\mathbb{R}^{n}$ with $\mathbf{x}_{+}(i)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\max\{0,\mathbf{x}(i)\}$ and $\mathbf{x}_{-}(i)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\max\{0,-\mathbf{x}(i)\}.$

If $\mathbf{X},\mathbf{Y}\in\mathbb{R}^{n\times n}$ , then $\langle\mathbf{X},\mathbf{Y}\rangle=\sum_{i,j=1}^{n}X_{ij}Y_{ij}$ denotes the Frobenius inner product. We let $\mathcal{S}^{n\times n}$ be the set of $n$ by $n$ symmetric matrices. Given $\mathbf{X},\mathbf{Y}\in\mathcal{S}^{n\times n}$ we denote by $\mathbf{X}\succeq 0$ the statement that $\mathbf{X}$ is positive semidefinite, and by $\mathbf{X}\preceq\mathbf{Y}$ the relation $0\preceq\mathbf{Y}-\mathbf{X}$ . We use $\operatorname{diag}(\mathbf{x})\in\mathbb{R}^{n\times n}$ to denote the real matrix with the coordinates of $\mathbf{x}$ placed along its diagonal. We will reserve specifically the matrix $\mathbf{M}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\operatorname{diag}(\bm{\mu})$ for the non-negative vector $\bm{\mu}\in\mathbb{R}^{n}_{\geq 0}$ .

Given a finite set $V$ , a vector $\mathbf{x}\in\mathbb{R}^{V}$ , and a subset $S\subseteq V$ , we let $\mathbf{x}(S)=\sum_{v\in S}x_{v}$ . We also denote by $\mathbf{x}_{S}\in\mathbb{R}^{V}$ , the restriction of $\mathbf{x}$ to coordinates in $S$ . In this way, the vector $\boldsymbol{1}_{S}$ denotes the $\{0,1\}$ -indicator of set $S$ .

Given a graph $H=(V,E,\mathbf{w})$ , denote its combinatorial Laplacian as $\mathbf{L}(H)$ :

\mathbf{L}(H)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\mathbf{D}(H)-\mathbf{A}(H),

If $H$ is directed, then we will associate with it a directed Laplacian $\mathbf{L}_{+}$ given by

\mathbf{L}_{+}(H)=\sum_{(i,j)\in E}w_{ij}\cdot\big{(}\mathbf{L}_{ij}+\mathbf{E}_{ii}-\mathbf{E}_{jj}\big{)}

where $\mathbf{L}_{ij}$ is the Laplacian of an undirected graph $\big{(}V,\big{\{}\{i,j\}\big{\}}\big{)}$ , and $\mathbf{E}_{ii}$ is $1$ on $ii$ -th entry and $0$ everywhere else. We will denote $\tilde{H}$ as the symmetrization of $H$ , i.e. the weighted undirected graph formed by taking the edge-wise sum ${\mathbf{A}(G)+\mathbf{A}(G)^{\top}}$ of $H$ and its reverse graph.

A hypergraph $G=(V,E_{G},\mathbf{w},\bm{\mu})$ is naturally associated with its factor graph, an unweighted graph $\hat{G}=(V\cup E_{G},\hat{E}_{G})$ that is bipartite between the variable vertices representing vertices $i\in V$ , and factor vertices representing hyperedges $h\in E_{G}$ . For $i\in V$ and $h\in E_{G}$ , the edge $\{i,h\}$ belongs to $\hat{E}_{G}$ if and only if $i\in h$ , i.e.:

\hat{E}_{G}=\{\{i,h\}\in V\times E_{G}:i\in h\}.

Figure 1: A hypergraph (left) and its factor graph (right). In the factor graph image, the variable vertices are shown in solid black, while the factor vertices are shown as empty circles.

A set function $F:2^{V}\rightarrow\mathbb{R}$ is submodular if for all $A,B\subseteq V$ , we have:

F(A\cup B)\leq F(A)+F(B)-F(A\cap B).

From the above, one can easily see that the set of submodular functions is closed under taking conic combinations. Following standard conventions, we will always assume that $F(\varnothing)=0$ unless specified otherwise.

Any submodular set function $F$ can be associated with a convex body $\mathcal{B}(F)$ , known as its base polytope:

\mathcal{B}(F)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}\mathbf{x}\in\mathbb{R}^{V}\,:\,\langle\mathbf{x},\boldsymbol{1}\rangle=F(V)\,\wedge\,\forall S\subseteq V,\,\langle\mathbf{x},\boldsymbol{1}_{S}\rangle\leq F(S)\big{\}}.

(4)

When $F$ is monotone non-decreasing and $F(\emptyset)=0,$ it is convenient to employ the positive submodular polyhedron:

\mathcal{P}_{+}(F)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}\mathbf{x}\in\mathbb{R}^{V}_{\geq 0}\,:\,\forall S\subseteq V,\,\langle\mathbf{x},\boldsymbol{1}_{S}\rangle\leq F(S)\big{\}}.

Crucially, a submodular set function $F$ can be extended to its convex closure, known as its Lovász extension and denoted by $\bar{F}:\mathbb{R}^{V}\rightarrow\mathbb{R}.$ There are a number of equivalent characterizations of the Lovász extension and its connection to the base polytope $\mathcal{B}$ and the positive polyhedron $\mathcal{P}_{+}$ . Rather than listing them here, we will refer directly to the results in Chapter 3 and 4 of the excellent reference by Bach [11]. We make an exception for the following fact, which plays a crucial role in the analysis of our approximation algorithms based on metric embeddings. See Proposition 4.8 in [11] for a proof.

Let $F:2^{V}\to\mathbb{R}_{\geq 0}$ be a monotone non-decreasing submodular function with $F(\emptyset)=0$ . Its Lovász extension $\bar{F}$ is monotone, non-decreasing over $\mathbb{R}^{V}.$

Fix a convex set of actions $\mathcal{X}\subseteq\mathcal{S}^{n\times n}$ . The setting of online linear optimization over symmetric matrices considers the following multi-round game: in each round $t$ , a player plays an action $\mathbf{X}_{t}\in\mathcal{X}$ , receives a feedback matrix $\mathbf{F}_{t}\in\mathcal{S}^{n\times n}$ , and suffers a loss of $\langle\mathbf{F}_{t},\mathbf{X}_{t}\rangle$ . The goal of the player is to play actions which minimize total loss incurred. In general, one cannot hope for a strategy that is competitive against an arbitrary, adaptive adversary. Thus, one instead seeks to minimize regret: the difference, up to a time horizon $T>0$ , between total loss and the loss suffered by the best fixed action in hindsight.

We will require a regret minimization strategy to produce a sequence of $\mathbf{X}_{t}$ for this setting. Fortunately, many prior results [9, 5, 29, 34, 51, 53] have studied this, and a Matrix Multiplicative Weight Update algorithm works off-the-shelf.

Fix $T>0$ , and a weight vector $\bm{\mu}\in\mathbb{Z}_{\geq 0}^{n}$ . Let $\mathbf{F}_{1},\ldots,\mathbf{F}_{T}$ be a sequence of feedback matrices such that there exists $\rho>0$ where $\mathbf{F}_{t}\succeq-\frac{1}{\rho}\cdot\mathbf{M}$ for every $t>0$ . For any step size $0<\eta<\rho$ , define $\mathbf{X}_{t}$ to be given by the matrix multiplicative weight update

$\displaystyle\mathbf{W}_{t+1}$	$\displaystyle=\textup{{MMWU}}_{\eta,\bm{\mu}}(\mathbf{F}_{1},\ldots,\mathbf{F}_{t})$	(matrix-mwu)
	$\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\mathbf{M}^{-1/2}\exp\bigg{(}-\eta\cdot\sum_{k=1}^{t}\mathbf{M}^{-1/2}\mathbf{F}_{k}\mathbf{M}^{-1/2}\bigg{)}\mathbf{M}^{-1/2}$
$\displaystyle\mathbf{X}_{t+1}$	$\displaystyle=\frac{\mathbf{W}_{t+1}}{\langle\mathbf{M},\mathbf{W}_{t+1}\rangle}$

Then, the following inequalities hold for any $\mathbf{U}\in\Delta^{n\times n}_{\bm{\mu}}$ . We have:

\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{F}_{t},\mathbf{U}\rangle\geq\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{F}_{t},\mathbf{X}_{t}\rangle-\frac{\eta}{T}\sum_{t=1}^{T}\,\big{\lVert}\mathbf{M}^{-1/2}\mathbf{F}_{t}\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}^{2}-\frac{\log n}{\eta T}\,.

(6)

Furthermore, if $\mathbf{F}_{t}\succeq-\frac{1}{\rho}\cdot\mathbf{M}$ for every $\rho>0$ (i.e. $\mathbf{F}_{t}\succeq\mathbf{0}$ ), and there exist $\epsilon>0$ such that $\mathbf{F}_{t}\preceq\epsilon\cdot\mathbf{M}$ , then:

\frac{1}{T}\sum_{t=1}^{T}\big{\langle}\mathbf{F}_{t},\mathbf{U}\big{\rangle}\geq\frac{1-\epsilon\eta}{T}\cdot\sum_{t=1}^{T}\big{\langle}\mathbf{F}_{t},\mathbf{X}_{t}\big{\rangle}-\frac{\log n}{\eta T}\,.

(7)

We remark that equation (6) is the standard regret bound for the Matrix Multiplicative Weight Update algorithm, while (7) is the regret bound proven using local norm convergence.

In this section, we establish the main properties of the polymatroidal cut functions in Definition 3. In the second part of the section, we introduce the dual notion of hypergraph flows as natural lower bounds to polymatroidal cut functions and highlight their connection with previously studied polymatroidal flows [19]. Finally, we define the notion of flow embedding of a graph into a hypergraph.

We start by verifying that any polymatroidal cut function is submodular. A proof of the following fact can be found in Appendix A:

Let $V$ be an arbitrary finite set and $F,G:2^{V}\to\mathbb{R}$ be monotone non-decreasing submodular functions. If $\delta:2^{V}\to\mathbb{R}$ is the function defined by:

\delta(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min\{F(S),G(V\setminus S)\},

for all $S\subseteq V$ , then $\delta$ is submodular.

We now show that the class of polymatroidal cut functions encompasses many popular cut objectives. Our first example is the directed edge cut function in graphs. This function is defined on a directed edge $(i,j)$ as:

\delta_{(i,j)}(S)=\begin{cases}1&\text{if }i\in S\text{ and }j\in\overline{S},\\ 0&\text{otherwise}.\end{cases}

It is easy to check that this function is polymatroidal, since it can be rewritten as:

\delta_{(i,j)}(S)=\min\{|S\cap\{i\}|,|(h\setminus S)\cap\{j\}|\}

for all $S\subseteq h$ . Next, we consider the standard hyperedge cut function. Given a hyperedge $h\in E$ , this function is defined as:

\delta^{\mathsf{cut}}_{h}(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\begin{cases}0&\text{if }S\cap h\in\{\emptyset,h\},\\ 1&\text{otherwise}.\end{cases}

Note that the undirected cut function in graphs is a special case of $\delta^{\mathsf{cut}}$ . This function is symmetric polymatroidal, as we can let $F_{h}(S)=\min\{1,|h\cap S|\}=G_{h}(S)$ for any $h\in E$ to obtain:

\delta_{h}^{\mathsf{cut}}(S)=\min\{1,|h\cap S|,|h\cap\overline{S}|\},

which is $1$ if and only if $S$ intersects $h$ non-trivially, and zero otherwise.

Moreover, the class of directed hypergraph cut functions recently introduced by Lau et al. (Section 6.2 of [40]) to define directed hypergraph expansion is also a sub-class of polymatroidal cut functions, as we now show. Given a hyperedge $h\in E$ and a partition of $h$ into a tail set $T_{h}$ and a headset $H_{h}$ , we can capture the directed cut function of Lau et al., by letting $F_{h}(S)=\min\{1,|T_{h}\cap S|\}$ and $G_{h}(\overline{S})=\min\{1,|H_{h}\cap\overline{S}|\}$ .

Finally, Veldt et al. [59] consider a class of cut functions which are cardinality-based and submodular. In Appendix B.2 of their paper, they argue that every such cut function $\delta_{h}$ can be written as:

\delta_{h}(S)=g_{h}(\min\{|S\cap h|,|h\cap\overline{S}|\}),

for some concave and monotonically increasing function $g_{h}$ . Since $g_{h}$ is monotonically increasing, the above gives:

\delta_{h}(S)=\min\{g_{h}(|S\cap h|),g_{h}(|h\cap\overline{S}|)\},

and hence all functions in this class are also examples of polymatroidal cut functions (note that the composition of a monotone, concave function with the cardinality function is always monotone and submodular).

We next establish the form of the Lovász extension of a polymatroidal cut function. We will make heavy use of this form in building our metric relaxations and in rounding them via low-distortion metric embeddings in Section 5. The proof of the following fact is found in Appendix A:

Given a polymatroidal cut function $\delta_{h}$ , we have, for all $\mathbf{x}\in\mathbb{R}^{h}$ :

\overline{\delta}_{h}(\mathbf{x})=\min_{\nu\in\mathbb{R}}\bar{F}^{-}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{+})+\bar{F}^{+}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{-}).

It is instructive to consider how the previous results are instantiated for the two canonical examples of the directed edge cut function and the standard hypergraph cut function. For the former on an arc $(i,j)$ , we have $\bar{F}^{-}(\mathbf{x})=\mathbf{x}_{i}$ and $\bar{F}^{+}(\mathbf{x})=\mathbf{x}_{j},$ so that:

\forall\mathbf{x}\in\mathbb{R}^{\{i,j\}},\;\overline{\delta}_{(i,j)}(\mathbf{x})=\min_{\nu\in\mathbb{R}}(\mathbf{x}(i)-\nu)_{+}+(\mathbf{x}(j)-\nu)_{-}=(\mathbf{x}(i)-\mathbf{x}(j))_{+}.

For the latter, on a hyperedge $h$ :

\forall\mathbf{x}\in\mathbb{R}^{h},\;\overline{\delta}^{\mathsf{cut}}_{h}(\mathbf{x})=\min_{\nu\in\mathbb{R}}\|(\mathbf{x}-\nu{\bf 1})_{+}\|_{\infty}+\|(\mathbf{x}-\nu{\bf 1}\|)_{-}\|_{\infty}=.

In the case of a symmetric polymatroidal cut function, the characterization of Fact 3 can take a more compact form, as shown in Appendix A:

Given a symmetric polymatroidal cut function $\delta_{h}$ with $F_{h}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}F^{+}_{h}=F^{-}_{h}$ , we have, for all $\mathbf{x}\in\mathbb{R}^{h}$ :

\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|)\leq\overline{\delta}_{h}(\mathbf{x})\leq 2\cdot\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|).

It is natural to attempt to lower bound the cost of cuts in a submodular hypergraph $G=(V,E_{G},\mathbf{w},\bm{\mu})$ by considering collections of vectors $\mathbf{Y}=(\mathbf{y}_{h}\in\mathcal{B}(\delta_{h}))_{h\in E_{G}}$ , where each vector $\mathbf{y}_{h}$ belongs to the base polytope of the polymatroidal cut function $\delta_{h}$ under consideration. In this section, we interpret these vectors as hypergraph flows over $G$ and show that they take the form of polymatroidal flows [41, 19] over the factor graph $\hat{G}$ .

We start by characterizing the base polytope $\mathcal{B}(\delta_{h})$ for a polymatroidal cut function $\delta_{h}.$ By the definition of base polytope in Equation 4.2, we have

\mathcal{B}(\delta_{h})=\{\mathbf{y}\in\mathbb{R}^{h}:\langle\mathbf{y},{\bf 1}_{h}\rangle=0\,\wedge\,\forall S\subset h,\,\langle\mathbf{y},{\bf 1}_{S}\rangle\leq\min\{F_{h}^{-}(S),F_{h}^{+}(h\setminus S)\}\}.

The first constraint implies that for all $S\subset h,$ we must have $\langle\mathbf{y},{\bf 1}_{S}\rangle=-\langle\mathbf{y},{\bf 1}_{h\setminus S}\rangle.$ Hence, we can rewrite:

$\displaystyle\mathcal{B}(\delta_{h})$	$\displaystyle=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}:\forall S\subset h,\;\langle\mathbf{y},{\bf 1}_{S}\rangle\leq F_{h}^{-}(S)\,\wedge\,-\langle\mathbf{y},{\bf 1}_{h\setminus S}\rangle\leq F_{h}^{+}(h\setminus S)\}$
	$\displaystyle=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}:\forall S\subset h,\;-F_{h}^{+}(S)\leq\langle\mathbf{y},{\bf 1}_{S}\rangle\leq F_{h}^{-}(S)\}$	(8)
	$\displaystyle=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}:(\mathbf{y})_{+}\in\mathcal{P}_{+}(F^{-}_{h})\,\wedge\,(\mathbf{y})_{-}\in\mathcal{P}_{+}(F^{+}_{h})\}.$

We will interpret an element $\mathbf{y}$ of $\mathcal{B}(\delta_{h})$ as a flow over the neighborhood of factor vertex $h$ in the factor graph $\hat{G}$ , with $\mathbf{y}_{i}>0$ indicating flow going from $i$ to $h$ and $\mathbf{y}_{i}<0$ flow going from $h$ to $i$ . The constraint $\mathbf{y}\perp{\bf 1}_{h}$ simply represents flow conservation at $h.$ With this setup, for each set $S\subset h,$ the function $F_{h}^{-}$ (resp. $F_{h}^{+}$ ) constrains the amount of flow that can be routed from the set $S$ to $h$ (resp. to the set $S$ from $h$ ).

Consider the case of the standard hypergraph cut function $\delta^{\mathsf{cut}}_{h}(S)$ defined in the previous section. Then, it is easy to see that:

\mathcal{B}(\delta^{\mathsf{cut}}_{h})=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}\mid\lVert\mathbf{y}\rVert_{1}\leq{2}\}.

As discussed above, we can think of each entry $\mathbf{y}(i)$ of $\mathbf{y}$ for $i\in h$ , as the amount of flow flowing into $h$ from vertex $i$ , and the constraint $\lVert\mathbf{y}\rVert_{1}\leq 2$ enforces that the total amount of flow through the hyperedge $h$ is no more than $1$ . This is equivalent to a vertex capacity constraint on the factor vertex corresponding to the hyperedge $h$ in the factor graph $\hat{G}$ .

Consider the class of directed hypergraph cut functions defined by Lau et al. in Section 6.2 of [40]. Each of these cut functions $\delta_{h}$ has an associated head set $H_{h}\subseteq h$ and an associated tail set $T_{h}\subseteq h$ and $\delta_{h}(S)$ has value one if and only if the head set intersects $H_{h}\cap S\neq\emptyset$ and $\overline{S}\cap\emptyset\neq\emptyset$ . Their base polytope is then given by:

\mathcal{B}(\delta_{h})=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}\mid\mathbf{y}(H_{h})\leq 1\land\forall i\in T_{h}\setminus H_{h}:\mathbf{y}(i)\leq 0\}.

The notion of hypergraph flow, which we introduce next, formalizes the network flow interpretation for all hyperedges $h\in E_{G}.$

Given a weighted submodular hypergraph $G=(V,E,\mathbf{w}\in\mathbb{Z}^{E}_{>0},\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0})$ , a hyperedge flow over a hyperedge $h\in E$ is a vector $\mathbf{y}_{h}\in\mathbb{R}^{h}\perp{\bf 1}_{h}.$ A hypergraph flow $\mathbf{Y}\in\bigoplus_{h\in E}\mathbb{R}^{h}\perp{\bf 1}_{h}$ is a direct sum of hyperedge flows $\mathbf{Y}=(\mathbf{y}_{h})_{h\in E}.$ Analogous to the case for graph flows, we define a notion of congestion based on the maximum violation of the polymatroidal constraints defining each $\mathcal{B}(\delta_{h})$ :

\operatorname{cong}_{G}(\mathbf{Y})=\min\{\rho\geq 0:\forall h\in E_{G},\,\mathbf{y}_{h}\in\rho\cdot\mathbf{w}_{h}\cdot\mathcal{B}(\delta_{h})\}.

(9)

The demand vector $\operatorname{dem}(\mathbf{Y})\in\mathbb{R}^{V}\perp{\bf 1}$ of a hypergraph flow $\mathbf{Y}=(\mathbf{y}_{h})_{h\in E}$ is

\operatorname{dem}_{i}(\mathbf{Y})=\sum_{h\in E:h\ni i}\mathbf{y}_{h}(i).

(10)

We will make use of the vector space structure inherited from $\mathbf{Y}\in\bigoplus\mathbb{R}^{h}\perp{\bf 1}_{h}$ to define linear combinations of hypergraph flows $c\mathbf{Y}_{1}+\mathbf{Y}_{2}$ for any $c\in\mathbb{R}.$ The demand vector also behaves linearly as $\operatorname{dem}(c\mathbf{Y}_{1}+\mathbf{Y}_{2})=c\cdot\operatorname{dem}(\mathbf{Y}_{1})+\operatorname{dem}(\mathbf{Y}_{2}).$

Finally, notice that hypergraph flows over hyperedges of rank $2$ are equivalent to graph flows as, for an edge $\{i,j\}\in E$ , $\mathbf{y}_{ij}(i)$ represents the flow from $i$ to $j$ , while $\mathbf{y}_{ij}(j)=-\mathbf{y}_{ij}(j)$ is the flow from $j$ to $i$ .

In the polymatroidal network flow model [19, 41], one has a directed (2-uniform) graph $H$ representing a flow network where the edges are not individually capacitated, but rather for every vertex $v\in V(H)$ , there are monotone submodular functions $\rho^{-}$ and $\rho^{+}$ constraining the amount of flow entering and exiting a vertex respectively.

Given an hypergraph flow $\mathbf{Y}=\{\mathbf{y}_{h}\}_{h\in E}$ over $G$ , one can interpret it as a graph flow over the factor graph $\hat{G}$ , where the flow from vertex $i$ to factor vertex $h$ over the edge $\{i,h\}$ is given by $\mathbf{y}_{h}(i)$ . When the congestion (as defined in (9)) is required to be at most one, the constraints imposed on this flow on $\hat{G}$ can be cast as constraints defining a flow in a polymatroidal network supported on $\hat{G}$ . Here, each constraint on the hyperedge flow for hyperedge $h\in E(G)$ defined by its base polytope, can be encoded as polymatroidal constraints on the graph flow into and out of the vertex $h$ in $\hat{G}$ .

Despite this connection with polymatroidal network, we remark that, to the best of our knowledge, the hypergraph partitioning problems studied in this paper cannot be expressed in the setup proposed by Chekuri [19] for defining sparsest cut problems over polymatroidal networks.

To construct efficient flow-based algorithms for the minimum ratio-cut problem in the second part of this paper, we will need a way to generalize the notion of flow embedding of a demand graph into a graph to the setting of hypergraphs. In this section, we leverage the definition of hypergraph flows to provide this generalization. In particular, we define the flow embedding of a (directed) graph into a weighted submodular hypergraphs with polymatroidal cut functions. We also show that these embeddings yield lower bounds on the hypergraph cut functions in terms of the graph cuts of the embedded graph. Analogously to graphs, we also show that any hypergraph flow can be converted into a flow embedding via flow-path decompositions.

Let $G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)}$ be a weighted submodular hypergraph equipped with a collection of polymatroidal cut functions $\{\delta_{h}\}_{h\in E_{G}}$ and let $H=\big{(}V,E_{H},\mathbf{w}^{H}\big{)}$ be a weighted directed graph on the same vertex set. We say that the graph $H$ embeds into $G$ as a flow with congestion $\rho\geq 0$ , denoted $H\preceq_{\rho}G$ if there exist hypergraph flows $\{\mathbf{Y}^{e}\}_{e\in E_{H}}$ such that:

1.

For all $e=(i,j)\in E_{H}$ , the flow $\mathbf{Y}^{e}$ routes $w^{H}_{e}$ units of flow from vertex $i$ to vertex $j$ , i.e.,

$\operatorname{dem}(\mathbf{Y}^{e})=w^{H}_{e}\cdot({\bf 1}_{i}-{\bf 1}_{j})\,.$

For each $h\in E_{G}$ , the flows ${\mathbf{Y}^{e}}_{e\in E_{H}}$ respect the polymatroidal capacity constraints, i.e., for all $h\in E_{G}$ :

	$\displaystyle\sum_{e\in E_{h}}(\mathbf{y}_{h}^{e})_{+}\in\rho\cdot w_{h}\cdot\mathcal{P}_{+}(F_{h}^{-}),$
	$\displaystyle\sum_{e\in E_{h}}(\mathbf{y}_{h}^{e})_{-}\in\rho\cdot w_{h}\cdot\mathcal{P}_{+}(F_{h}^{+}).$

The capacity constraints can be understood by analogy with the capacity constraints for multi-commodity flows over a graph. As in that case, the constraints in the definition ensure that flows from different commodity along the same arc do not cancel out.

We will use this notion of embeddability to certify lower bounds to the ratio-cut objective for general polymatroidal cut functions. We will crucially make use of the following theorem in our development of the cut-matching game for solving ratio-cut problems over weighted submodular hypergraphs. Its proof is in Section A.

\forall\mathbf{x}\in\mathbb{R}^{V}\,,\sum_{(i,j)\in E_{H}}w^{H}_{ij}\cdot(\mathbf{x}_{i}-\mathbf{x}_{j})_{+}\leq\rho\cdot\sum_{h\in E_{G}}w^{G}_{h}\cdot\overline{\delta}_{h}(\mathbf{x})\;\textrm{ and }\;\forall S\subseteq V\,,\,\delta_{H}(S)\leq\rho\cdot\delta_{G}(S).

For symmetric polymatroidal cut functions $\{\delta_{h}\}_{h\in E_{G}}$ , then $H\preceq_{\rho}G$ implies the following bound on the undirected symmetrization $\tilde{H}=(V,E_{\tilde{H}},\mathbf{w}^{\tilde{H}})$ :

\forall\mathbf{x}\in\mathbb{R}^{V}\,,\sum_{\{i,j\}\in E_{\tilde{H}}}w^{\tilde{H}}_{ij}\cdot\lvert\mathbf{x}_{i}-\mathbf{x}_{j}\rvert\leq 2\cdot\rho\cdot\sum_{h\in E_{G}}w^{G}_{h}\cdot\overline{\delta}_{h}(\mathbf{x})\;\textrm{ and }\;\forall S\subseteq V\,,\,\delta_{\tilde{H}(S)}\leq 2\cdot\rho\cdot\delta_{G}(S).

We now state the main algorithmic result of this section, which allows us to construct hypergraph flow embedding from any single hypergraph flow. Just like in the case of graphs, this result is based on constructing a flow-path decomposition of the original flow. The particular form of polymatroidal cut functions plays monotonicity of the functions $F^{+}$ and $F^{-}$ plays a crucial role in bounding the congestion of the embedding, which cannot be controlled in the same way for general submodular cut functions. The proof appears in Appendix A.

Let $G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)}$ be a weighted hypergraph equipped with a collection of polymatroidal cut functions $\{\delta_{h}\}_{h\in E_{G}}$ . Given a hypergraph flow $\mathbf{Y}$ over $G$ , there is an algorithm that, in time $\tilde{O}\big{(}\mathsf{sp}(G)\big{)}$ , computes a weighted directed bipartite graph $H=\big{(}V,E_{h},\mathbf{w}^{H}\big{)}$ such that $H\preceq_{\operatorname{cong}_{G}(\mathbf{Y})}G$ and $|E_{H}|=\tilde{O}\big{(}\mathsf{sp}(G)\big{)}$ . Furthermore, all vertices $i\in V$ with non-negative (resp. negative) demand $\operatorname{dem}_{i}(\mathbf{Y})$ are source nodes (resp. sink nodes) in $H$ , each with out-degree (resp. in-degree) equal $\operatorname{dem}_{i}(\mathbf{Y}).$

In this section, we prove Theorem 1 by giving an $\ell_{2}^{2}$ -metric relaxation and rounding argument yielding a randomized polynomial-time $O(\sqrt{\log n})$ -approximation algorithm for the minimum ratio-cut problem over submodular hypergraphs equipped with polymatroidal cut functions. As a warm-up, we give the simpler argument for symmetric polymatroidal cut functions, before proving the result for general polymatroidal cut functions in Section 5.2

We begin by providing a relaxation for the symmetric version of the problem, in which $F_{h}^{+}=F_{h}^{-}$ for every $h\in E$ . All the results in this subsection will focus exclusively on this special case, but they will be extended to the general case of all polymatroidal cut functions in the following subsection.

Given a weighted submodular hypergraph $G=(V,E,\mathbf{w},\bm{\mu})$ , we construct a vector-program relaxation of the continuous non-convex formulation in Equation (RC-NonCvx) by associating to each vertex $i\in V\cup E$ of the factor graph $\hat{G}$ a vector $\mathbf{v}_{i}$ . For a hyperedge $h\in E$ and $i\in h,$ we can then relax the terms $|\mathbf{x}_{i}-\nu_{h}{\bf 1}_{h}|$ , which make up the argument of $\bar{F}_{h}$ in Fact 4, with the distance $\mathbf{d}^{h}_{i}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\|\mathbf{v}_{i}-\mathbf{v}_{h}\|^{2}.$ We obtain the following semidefinite program in its vector embedding form:

$\displaystyle\underset{\mathbf{v}_{i}}{\min}$		$\displaystyle\sum_{h\in E}w_{h}\cdot\bar{F}_{h}(\mathbf{d}^{h})$	(RC-Metric)
s.t.		$\displaystyle\sum_{\{i,j\}\subseteq V}{\mu(i)\mu(j)\over\mu(V)}\cdot\\|\mathbf{v}_{i}-\mathbf{v}_{j}\\|_{2}^{2}\geq 1$
$\displaystyle\\|\mathbf{v}_{i}-\mathbf{v}_{j}\\|^{2}_{2}\leq\\|\mathbf{v}_{i}-\mathbf{v}_{k}\\|_{2}^{2}+\\|\mathbf{v}_{k}-\mathbf{v}_{j}\\|_{2}^{2}$	$\displaystyle\forall\,i,j,k\in V\cup E$
$\displaystyle\mathbf{d}^{h}_{i}=\\|\mathbf{v}_{i}-\mathbf{v}_{h}\\|^{2}$	$\displaystyle\forall h\in E,i\in h$
$\displaystyle\mathbf{v}_{i}\in\mathbb{R}^{n},\mathbf{d}^{h}\in\mathbb{R}^{h}_{\geq 0}$	$\displaystyle\hskip 28.45274pt\forall\,i\in V\cup E,\,\forall\,h\in E$

The following simple lemma is proved in Appendix A.

The (RC-Metric) vector program is, up to a constant, a relaxation of the minimum ratio-cut program for the hypergraph $G=(V,E,\mathbf{w},\bm{\mu})$ with symmetric polymatroidal cut functions $\{\delta_{h}\}_{h\in E}.$

To round the convex program (RC-Metric), we apply the following well-known embedding result, which is implicit in the seminal work of Arora, Rao and Vazirani [10].

Let $\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V\cup E}$ be a feasible solution to the $\ell_{2}^{2}$ -metric relaxation (RC-Metric). Then, there exists an efficiently computable $1$ -Lipschitz²²2Recall that, given a metric space $(X,d)$ , a function $f:X\to\mathbb{R}$ is said to be $K$ -Lipschitz for some constant $K$ if $|f(x)-f(y)|\leq K\cdot d(x,y)$ for every $x,y\in X$ . map $\phi:\mathbb{R}^{d}\to\mathbb{R}$ satisfying:

\frac{\sum_{i,j\in V}\mu(i)\mu(j)\cdot|\phi(\mathbf{v}_{i})-\phi(\mathbf{v}_{j})|}{\sum_{i,j\in V}\mu(i)\mu(j)\cdot\|\mathbf{v}_{i}-\mathbf{v}_{j}\|^{2}}\geq\Omega\left(\frac{1}{\sqrt{\log|V|}}\right).

To complete the proof of Theorem 1 for the special case of symmetric polymatroidal cut functions, we make use of the embedding in Theorem 7. The proof crucially relies on the monotonicity of the Lovász extension $\bar{F}$ (Fact 1). The ability to perform this step can be taken as a justification for our definition of polymatroidal cut functions.

There is a polynomial-time algorithm that given a solution $\{\mathbf{v}_{i}\}_{i\in V\cup E}$ to (RC-Metric) of objective value $\kappa:=\sum_{h\in E}w_{h}\cdot\bar{F}_{h}(\mathbf{d}^{h})$ , returns a cut $S\subseteq V$ such that:

\Psi_{G}(S)\leq O\big{(}\kappa\cdot\sqrt{\log|V|}\big{)}\,.

Apply Theorem 7 to obtain $\mathbf{x}\in\mathbb{R}^{V}$ and $\bm{\eta}\in R^{E}$ with $\mathbf{x}(i)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\phi(\mathbf{v}_{i})$ for all $i\in V$ and $\bm{\eta}(h)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\phi(\mathbf{v}_{h})$ . The $1$ -Lipschitzness of $\phi$ guarantees, for every $h\in E$ and $i\in h$ :

\lvert\mathbf{x}(i)-\bm{\eta}(h)\rvert=\lvert\phi(\mathbf{v}_{i})-\phi(\mathbf{v}_{h})\rvert\leq\|\mathbf{v}_{i}-\mathbf{v}_{h}\|^{2}=\mathbf{d}^{h}_{i}.

By the monotonicity of $\bar{F}_{h}$ in Fact 1, we have:

\sum_{h\in E}w_{h}\cdot\min_{\nu_{h}\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h}|)\leq\sum_{h\in E}w_{h}\cdot\bar{F}_{h}(|\mathbf{x}_{h}-\bm{\eta}(h){\bf 1}_{h}|)\leq\sum_{h\in E}w_{h}\cdot\bar{F}_{h}(\mathbf{d}^{h})=\kappa.

(12)

It now suffices to obtain a lower bound on the denominator in the continuous formulation (RC-NonCvx):

{\min_{\gamma\in\mathbb{R}}\|\mathbf{x}-\gamma\boldsymbol{1}\|_{1,\bm{\mu}}}\geq{1\over 2}{\sum_{i,j\in V}{\bm{\mu}(i)\bm{\mu}(j)\over\bm{\mu}(V)}\cdot|\mathbf{x}_{i}-\mathbf{x}_{j}|}\geq\Omega\left({1\over\sqrt{\log|V|}}\right){\sum_{i,j\in V}{\bm{\mu}(i)\bm{\mu}(j)\over\bm{\mu}(V)}\|\mathbf{v}_{i}-\mathbf{v}_{j}\|^{2}_{2}}\geq\Omega\left({1\over\sqrt{\log|V|}}\right).

(13)

Finally, Equations (12) and (13), yield, together with Fact 4:

{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})\over\min_{\gamma\in\mathbb{R}}\|\mathbf{x}-\gamma\boldsymbol{1}\|_{1,\bm{\mu}}}\leq 2\cdot{\sum_{h\in E}w_{h}\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu 1|)\over\min_{\gamma\in\mathbb{R}}\|\mathbf{x}-\gamma\boldsymbol{1}\|_{1,\bm{\mu}}}\leq O(\kappa\cdot\sqrt{\log n}).

The vector $\mathbf{x}$ can then be efficiently rounded to a subset $S\subseteq V$ by Lemma 1. ∎

For the case of general polymatroidal cut function, our metric relaxation must be able to capture the signed terms $(\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h})_{+}$ and $(\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h})_{-}$ that appear in the Lovász extension in Fact 3. For this purpose, we will use the directed $\ell_{2}^{2}$ -semimetrics introduced by Charikar, Makarychev and Makarychev [18] and, for every $h\in E$ and $i\in h$ , relax $(\mathbf{x}_{i}-\nu_{h})_{+}$ to $\mathbf{d}^{h,-}_{i}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{h}\rVert_{2}^{2}$ and $(\mathbf{x}_{i}-\nu_{h})_{-}$ to $\mathbf{d}^{h,+}_{i}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}+\lVert\mathbf{v}_{h}\rVert_{2}^{2}-\lVert\mathbf{v}_{i}\rVert_{2}^{2}$ . We obtain the following semidefinite program:

$\displaystyle\underset{\mathbf{v}_{i},\bm{\ell}^{h}}{\min}$		$\displaystyle\sum_{h\in E}w_{h}\cdot\left(\bar{F}_{h}^{-}(\mathbf{d}_{h}^{-})+\bar{F}_{h}^{+}(\mathbf{d}_{h}^{+})\right)$	(RC-Directed-Semimetric)
s.t.		$\displaystyle\sum_{\{i,j\}\subseteq V}{\mu(i)\mu(j)\over\mu(V)}\cdot\\|\mathbf{v}_{i}-\mathbf{v}_{j}\\|_{2}^{2}\geq 1$
$\displaystyle\\|\mathbf{v}_{i}-\mathbf{v}_{j}\\|^{2}_{2}\leq\\|\mathbf{v}_{i}-\mathbf{v}_{k}\\|_{2}^{2}+\\|\mathbf{v}_{k}-\mathbf{v}_{j}\\|_{2}^{2}$	$\displaystyle\forall\,i,j,k\in V\cup E$
$\displaystyle\mathbf{d}^{h,-}_{i}=\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{h}\rVert_{2}^{2}$	$\displaystyle\forall h\in E,i\in h$
$\displaystyle\mathbf{d}^{h,+}_{i}=\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}+\lVert\mathbf{v}_{h}\rVert_{2}^{2}-\lVert\mathbf{v}_{i}\rVert_{2}^{2}$	$\displaystyle\forall h\in E,i\in h$
$\displaystyle\mathbf{v}_{i}\in\mathbb{R}^{n},\mathbf{d}^{h,+},\mathbf{d}^{h,-}\in\mathbb{R}^{h}_{\geq 0}$	$\displaystyle\forall\,i\in V\cup E,\,\forall\,h\in E$

We have the following lemma, which we prove in Appendix A.

The (RC-Directed-Semimetric) vector program is, up to a constant, a relaxation of the minimum submodular hypergraph ratio-cut problem for the hypergraph $G=(V,E,\mathbf{w},\bm{\mu})$ with polymatroidal cut functions $\{\delta_{h}\}_{h\in E}.$

To round a solution to the program (RC-Directed-Semimetric) to a solution of the original ratio-cut problem (RC-NonCvx), we require a version of the embedding result of Theorem 7 for directed semimetrics. The following result is a simple consequence of the rounding algorithm of Agarwal et al. [2] for a similar relaxation of directed expansion. We prove it for completeness in Appendix A.

Given a feasible solution $\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V\cup E}$ to RC-Directed-Semimetric and a measure $\bm{\mu}\in\mathbb{R}_{\geq 0}^{V}$ , there exists a polynomial-time computable map $\phi:\mathbb{R}^{d}\to\mathbb{R}$ such that:

1.

$(\phi(\mathbf{v})-\phi(\mathbf{w}))_{+}\leq\lVert\mathbf{v}-\mathbf{w}\rVert^{2}+\lVert\mathbf{v}\rVert^{2}-\lVert\mathbf{w}\rVert^{2}$ for all $\mathbf{v},\mathbf{w}\in\mathbb{R}^{d}$ ,

and:

\frac{\sum_{i,j\in V}\mu(i)\mu(j)|\phi(\mathbf{v}_{i})-\phi(\mathbf{v}_{j})|}{\sum_{i,j\in V}\mu(i)\mu(j)\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}}\geq\Omega\left(\frac{1}{\sqrt{\log|V|}}\right)

We can now complete the proof of Theorem 1 for the general polymatroidal case by showing that the map $\phi$ in Theorem 8 yields a $O(\sqrt{\log n})$ -approximate solution to the minimum ratio-cut problem.

Given a solution $(\{\mathbf{v}_{i}\}_{i\in V\cup\in E})$ to RC-Directed-Semimetric of value $\kappa$ , we can produce in polynomial time a set $S\subseteq V$ such that:

\Psi_{G}(S)\leq O(\kappa\cdot\sqrt{\log|V|}).

Let $\phi$ be the map whose existence is guaranteed by Theorem 8. Let $\mathbf{x}\in\mathbb{R}^{V}$ and $\bm{\eta}\in\mathbb{R}^{E}$ be the vector given by $\mathbf{x}_{i}=\phi(\mathbf{v}_{i})$ for all $i\in V$ and $\eta_{h}=\phi(\mathbf{x}_{h})$ for all $h\in E.$ By the first condition in Theorem 8, we have, for all $h\in E$ and $i\in h$ :

(\mathbf{x}_{i}-\eta_{h})_{+}\leq\mathbf{d}^{h,-}_{i}\,\textrm{ and }\,(\mathbf{x}_{i}-\eta_{h})_{-}\leq\mathbf{d}^{h,+}_{i}.

We can now exploit the monotonicity of the functions $F^{-}_{h}$ and $F^{+}_{h}$ associated with the polymatroidal cut functions $\delta_{h}$ . By Fact 3 and Fact 1, we have:

	$\displaystyle\sum_{h\in E}w_{h}\overline{\delta}_{h}(\mathbf{x})$	$\displaystyle=\sum_{h\in E}w_{h}\min_{\nu_{h}\in\mathbb{R}}\left\{\bar{F}^{-}_{h}((\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h})_{+})+\bar{F}^{+}_{h}((\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h})_{-})\right\}$
		$\displaystyle\leq\sum_{h\in E}w_{h}\left\{\bar{F}^{-}_{h}((\mathbf{x}_{h}-\eta_{h}{\bf 1}_{h})_{+})+\bar{F}^{+}_{h}((\mathbf{x}_{h}-\eta_{h}{\bf 1}_{h})_{-})\right\}$
		$\displaystyle\leq\sum_{h\in E}w_{h}\left(\bar{F}^{-}_{h}(\mathbf{d}^{h,-})+\bar{F}^{+}_{h}(\mathbf{d}^{h,+})\right)=\kappa.$

We can now lower bound the denominator in the objective of Equation (RC-NonCvx) as in Equation 13. Finally, we have:

{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})\over\min_{\gamma\in\mathbb{R}}\|\mathbf{x}-\gamma\boldsymbol{1}\|_{1,\bm{\mu}}}\leq O(\kappa\cdot\sqrt{\log n}).

The vector $\mathbf{x}$ can then be efficiently rounded to a subset $S\subseteq V$ by Lemma 1. ∎

In this section, we consider the localized formulation (Local-RC) introduced in Section 2.2, which we rewrite as a convex program. We show that the dual solutions to this problem are hypergraph flows, which can be used to obtain hypergraph flow embeddings into the input graph. Finally, we give algorithms that construct approximate primal-dual solutions for the (Local-RC) problem. These algorithms will play an important part in our cut-matching game.

We start by introducing the ratio-cut improvement objective, a local partitioning objective parametrized by a seed $\mathbf{s}\in R^{V}$ , which is the integral analogue of the (Local-RC) problem:

Given a weighted submodular hypergraph $G=(V,E,\mathbf{w},\bm{\mu})$ , cut functions $\{\delta_{h}\}_{h\in E}$ , and a seed vector $\mathbf{s}\in\mathbb{R}^{V}$ with $\langle\mathbf{s},\boldsymbol{1}\rangle_{\mu}=0$ and $\lVert\mathbf{s}\rVert_{\infty}\leq 1,$ the ratio-cut improvement objective $\Psi_{G,s}$ is defined for all $S\subset V$ as:

	$\displaystyle\Psi_{G,\mathbf{s}}(S)$	$\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{\sum_{h\in E}w_{h}\cdot\delta_{h}(S\cap h)}{\max\{0,\langle\mathbf{s},\boldsymbol{1}^{S}\rangle_{\mu}\}}$
	$\displaystyle\Psi^{*}_{G,\mathbf{s}}$	$\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min_{S\subseteq V}\Psi_{G,\mathbf{s}}(S)$

The ratio-cut improvement problem asks to find $S\subseteq V$ achieving $\Psi^{*}_{G,\mathbf{s}}=\Psi_{G,\mathbf{s}}(S)$ .

The ratio-cut improvement problem can be thought of as attempting to find a cut $(S,\overline{S})$ which has both a small boundary and high correlation with the input seed $\mathbf{s}$ simultaneously. In Appendix B, we demonstrate that the problem of minimizing $\Psi_{G,\mathbf{s}}$ is precisely the generalization of the cut-improvement problem of Andersen and Lang [7] to weighted submodular hypergraphs and fractional seeds.

The relation between the minimum ratio-cut problem, the ratio-cut improvement problem and their continuous formulations (RC-NonCvx) and (Local-RC) is formalized in the following lemma.

For all $\mathbf{x}\in\mathbb{R}^{V}$ , and seeds $\mathbf{s}\in\mathbb{R}^{V}$ satisfying $\lVert\mathbf{s}\rVert_{\infty}\leq 1$ and $\langle\mathbf{s},\boldsymbol{1}\rangle_{\bm{\mu}}=0$ , we have:

\bar{\Psi}_{G}(\mathbf{x})\leq\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x})\,.

In particular, setting $\mathbf{x}=\boldsymbol{1}^{S}$ yields $\Psi_{G,\mathbf{s}}^{*}$ is a relaxation of $\Psi_{G}^{*}$ for any $S\subset V$ :

\Psi_{G}(S)\leq\Psi_{G,\mathbf{s}}(S)\,.

By the duality of conjugate norms, we have

0\geq\min_{u\in\mathbb{R}}\lVert\mathbf{x}-u\boldsymbol{1}\rVert_{1,\bm{\mu}}=\min_{u\in\mathbb{R}}\max_{\lVert\mathbf{y}\rVert_{\infty}\leq 1}\langle\mathbf{y},\mathbf{x}-u\boldsymbol{1}\rangle_{\bm{\mu}}=\max_{\begin{subarray}{c}\lVert\mathbf{y}\rVert_{\infty}\leq 1\\ \langle\mathbf{y},\boldsymbol{1}\rangle_{\bm{\mu}}=0\end{subarray}}\langle\mathbf{y},\mathbf{x}\rangle\geq\langle\mathbf{s},\mathbf{x}\rangle_{\bm{\mu}}

for all $\mathbf{x}\in\mathbb{R}^{V}$ . Consequently,

\bar{\Psi}_{G}(\mathbf{x})=\frac{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})}{\min_{u\in\mathbb{R}}\lVert\mathbf{x}-u\boldsymbol{1}\rVert_{1,\bm{\mu}}}\leq\frac{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})}{\max\{0,\langle\mathbf{s},\mathbf{x}\rangle_{\bm{\mu}}\}}=\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x})

as required. ∎

Of critical, algorithmic, consequence is the fact that computing the value of $\bar{\Psi}_{G,\mathbf{s}}^{*}$ can be written as a convex program. We refer to this program as (CI-Primal):

		$\displaystyle\underset{\mathbf{x}\in\mathbb{R}^{V}}{\min}$		$\displaystyle\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x})$		(CI-Primal)
		s.t.		$\displaystyle\langle\mathbf{s},\mathbf{x}\rangle_{\mu}=1$		(CI-Primal)

The following lemma is a simple consequence of the positive homogeneity of the Lovász extensions of submodular functions (Proposition 3.1 in [11]).

The optimum value of (CI-Primal) equals $\Psi^{*}_{G,\mathbf{s}}$ .

With this convex program, one can leverage convex duality to construct lower bounds for $\bar{\Psi}^{*}_{G,\mathbf{s}}$ , and consequently $\Psi^{*}_{G,\mathbf{s}}$ . We consider a dual formulation of (CI-Primal) as a problem over hypergraph flows:

In this setting, strong duality between (CI-Primal) and (CI-Dual) holds by virtue of the (CI-Primal) having a finite number of polyhedral constraints. The full proof of the following lemma can be found in Appendix A.

The programs (CI-Primal) and (CI-Dual) are dual to each other. Strong duality holds.

From this point onwards, we restrict our attention to the kind of seed vectors that we will encounter in our application to the cut-matching game. For this purpose, we define the following notation for disjoint sets $A,B\subseteq V$ with $\mu(A)\leq\mu(B)$ :

{\bf 1}^{A,B}={\bf 1}_{A}-\frac{\mu(A)}{\mu(B)}{\bf 1}_{B}.

Notice that ${\bf 1}^{A,B}$ is a valid seed for the local formulation of the minimum ratio-cut problem as $\|{\bf 1}^{A,B}\|_{\infty}\leq 1$ and $\langle{\bf 1}^{A,B},{\bf 1}\rangle_{\bm{\mu}}=0.$ For this kind of seed, the dual problem can be interpreted as a maximum concurrent single-commodity flow over the hypergraph $G$ , in which we each vertex $i\in A$ is asked to route demand $\mu_{i}$ to $B$ and each vertex $j$ in $B$ is asked to receive demand $\nicefrac{{\mu(A)}}{{\mu(B)}}\cdot\mu_{j}$ from $A$ . For a solution of value $\alpha$ , each vertex will be able to route an $\alpha$ fraction of its demand and the total demand routed from $A$ to $B$ will be $\alpha\mu(A).$ Notice that, in the graph case, for $B=\bar{A}$ , this construction specializes exactly to the undirected maximum concurrent flow problem used by Andersen and Lang [7] to solve the ratio-cut improvement problem.

In Section 6.2, we will discuss how, for many kinds of polymatroidal cut functions, approximately feasible dual solutions to (CI-Dual) can be obtained by approximately solving a maximum flow problem over the factor graph or an augmented version of it. For this purpose, it is convenient to introduce a notion of approximate dual solution of value $\alpha$ to (CI-Dual).

Let $G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)}$ be a weighted submodular hypergraph equipped with polymatroidal cut functions $\{\delta_{h}\}_{h\in E}$ , and $A,B\subseteq V$ disjoint sets with $\mu(A)\leq\mu(B)$ . An approximate dual solution of value $\alpha\geq 0$ for $(G,{\bf 1}^{A,B})$ is a hypergraph flow $\mathbf{Y}=\big{(}\mathbf{y}^{h}\in\mathbb{R}^{h}\perp{\bf 1}_{h}\big{)}$ such that the following hold simultaneously:

1.

$\mathbf{Y}$ has unit congestion, i.e., $\operatorname{cong}_{G}(\mathbf{Y})\leq 1$ ,
2.

$\mathbf{Y}$ routes a $\nicefrac{{1}}{{2}}$ -fraction of the required flow from $A$ to $B$ , i.e., at least $\nicefrac{{1}}{{2}}\cdot\alpha\cdot\mu(A)$ flow,
3.

$\mathbf{Y}$ does not exceed the require demand at every vertex, i.e., $|\operatorname{dem}(\mathbf{Y})|\leq\alpha\cdot|\bm{\mu}\circ{\bf 1}^{A,B}|$ .

As discussed in Section 2.2, our plan is to apply the cut-matching game to combine local lower bounds given by approximately optimal solutions to (CI-Dual) into global lower bounds for the non-convex problem $\bar{\Psi}^{*}_{G}$ . To carry out this plan, we need to express our local lower bounds, i.e. dual solutions to (CI-Dual) in terms of hypergraph embeddings of graphs into the instance hypergraph.

Let $G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)}$ be a weighted submodular hypergraph equipped with polymatroidal cut functions $\{\delta_{h}\}_{h\in E}$ , and $A,B\subseteq V$ disjoint sets with $\mu(A)\leq\mu(B).$ Let $\alpha=\Psi^{*}_{G,{\bf 1}^{A,B}}.$ An approximate dual graph certificate of value $\alpha\geq 0$ for the (Local-RC) problem with seed ${\bf 1}^{A,B}$ is a directed graph $D=(V,E_{D},\mathbf{w}^{D})$ with the following properties:

1.

$D$ is sparse, i.e., $|E_{D}|\leq\tilde{O}(\mathsf{sp}(G)),$
2.

$\alpha\cdot D$ embeds into $G$ with congestion $1$ , or equivalently $D\preceq_{\nicefrac{{1}}{{\alpha}}}G,$
3.

$D$ is bipartite from $A$ to $B$ , i.e. all arcs of $D$ go from $A$ to $B$ ,
4.

for all $i\in A,j\in B$ , we have the degree bounds $\deg_{D}(i)\leq\mu_{i}$ and $\deg_{D}(j)\leq\nicefrac{{\mu(A)}}{{\mu(B)}}\cdot\mu_{j}$ ,
5.

$D$ is large, i.e., $w^{D}(E(A,B))\geq\nicefrac{{1}}{{2}}\cdot\mu(A)$ .

Following Theorem 5, we may assume that $D$ is undirected if all cut functions are symmetric.

In the cut-matching game framework, we will use dual graph certificates to iteratively construct lower bounds to $\Psi^{*}_{G}$ through the application of Theorem 5. For example, given an approximate dual graph certificate $D$ of value $\alpha$ , Theorem 5 yields the lower bound:

\forall S\subseteq V\,,\delta_{G}(S)\geq\alpha\cdot\delta_{D}(S)=\alpha\cdot w^{D}(E(S\cap A,\bar{S}\cap B)).

As we built $D$ from a dual solution for (CI-Dual) with seed ${\bf 1}^{A,B}$ , this lower bound is tighter for cuts $S$ that are well-correlated with ${\bf 1}^{A,B}$ , reaching its maximum when $A\subseteq S\subseteq\bar{B}.$

We can now apply the flow-path decomposition of Theorem 6 to show how to efficiently construct an approximate dual graph certificate from an approximate dual solution. The straightforward proof is found in Appendix A.

Under the same assumptions of Definition 12, consider an approximate dual solution $\mathbf{Y}$ of value $\alpha$ to the ratio-cut improvement problem on $(G,{\bf 1}^{A,B}).$ Let $H$ be the directed graph obtained by applying the flow-path decomposition algorithm of Theorem 6 to the hypergraph flow $\mathbf{Y}$ . Then, the scaled graph $\frac{1}{\alpha}\cdot H$ is an approximate dual graph certificate of value $\alpha.$

In this section, we discuss algorithms that construct approximate solutions to (CI-Primal) and (CI-Dual), the primal and dual convex formulations of (Local-RC) . Our specific focus is to describe algorithms that implement the following specification:

Given a weighted submodular hypergraph $G=(V,E,\mathbf{w},\bm{\mu})$ with polymatroidal cut functions $\{\delta_{h}\}_{h\in E}$ and disjoint sets $A,B\subset V$ with $\mu(A)\leq\mu(B)$ , an approximate primal-dual oracle problem is an algorithm $\mathcal{A}_{\textup{{CI}}}$ that takes as input $G$ and the seed vector ${\bf 1}^{A,B}$ . For some $\alpha\geq 0,$ the algorithm $\mathcal{A}_{\textup{{CI}}}$ outputs both:

1.

a cut $S\subseteq V$ with $\Psi_{G,{\bf 1}^{A,B}}(S)\leq 3\alpha,$
2.

an approximate dual graph certificate of value $\alpha.$

We prove the following two theorems in Appendix C by a simple application of binary search over $\alpha.$ The first thereom bounds the complexity of constructing an approximate primal-dual oracle for general polymatroidal cut functions.

For submodular hypergraphs with general polimatroidal cut functions, an approximate primal-dual oracle can be implemented in time $\tilde{O}\big{(}|V|\cdot\sum_{h\in E}\Theta_{h}\big{)},$ where $\Theta_{h}$ is the running time for a quadratic optimization oracle for $\delta_{h},$ by solving $O(\log|V|)$ -many decomposable submodular minimization problems.

Our second theorem shows how this result can be significantly improved in the case of the standard hypergraph cut functions by relying on almost-linear-time maximum flow solvers [13, 20].

For a submodular hypergraph $G$ equipped with the directed or standard hypergraph cut functions, an approximate primal-dual oracle can be implemented in almost linear-time by solving $O(\log|V|)$ -many $\nicefrac{{1}}{{2}}$ -approximate maximum flow problems over a directed, vertex-capacitated version of the factor graph $\hat{G}.$

We now describe how to use our cut-matching game to approximate minimum ratio-cut. Let us begin by stating the precise approximation guarantee garnered by running a game between a good cut strategy, and an approximate primal-dual solver for the ratio-cut improvement problem (Local-RC).

Let $G=(V,E,\mathbf{w},\bm{\mu})$ be an input submodular hypergraph equipped with polymatroidal cut functions. Consider an execution of the cut-matching game in which the cut player $\mathcal{C}$ plays an $\big{(}f(n),g(n)\big{)}$ -good cut strategy, while the matching player plays the dual graph certificates output by an approximate primal-dual oracle $\mathcal{A}_{\textup{{CI}}}$ (Definition 13) on $G$ . Let $C_{t}$ be the cuts output by $\mathcal{A}_{\textup{{CI}}}$ . Then, with constant probability, we have the following approximation result:

\min\bigg{\{}\Psi_{G}(C_{t})\bigg{\}}_{t=1}^{g(n)}\leq O\bigg{(}\frac{g(n)}{f(n)}\bigg{)}\cdot\Psi^{*}_{G}.

This is a standard result for cut-matching games and its proof does not deviate too far from that in previous work. We give its proof in Appendix D for completeness.

Theorem 11 reduces the task of approximating minimum ratio-cut to constructing a good cut strategy. Hence, this section is devoted towards proving the following theorem.

Let $H=\big{(}V,E_{H},\mathbf{w}^{H},\bm{\mu}\big{)}$ be the state graph for an $(n,\bm{\mu})$ -generalized cut-matching game. There exists a cut strategy $\mathcal{A}_{\mathsf{cut}}$ satisfying the following:

1.

If $H$ is undirected, then $\mathcal{A}_{\mathsf{cut}}$ is $\left(\Omega(\log n),O(\log^{2}n)\right)$ -good with probability $O(1)$ .
2.

If $H$ is directed, then $\mathcal{A}_{\mathsf{cut}}$ is $\left(\Omega(\log^{2}n),O(\log^{3}n)\right)$ -good with probability $O(1)$ .

Furthermore, $\mathcal{A}_{\mathsf{cut}}$ outputs a cut action in time $\tilde{O}\big{(}\mathsf{sp}(H)\big{)}$ .

To prove this theorem, we give an optimization-based analysis of the cut-matching game by using regret minimization techniques from online optimization to construct and analyze $\mathcal{A}_{\mathsf{cut}}$ . This approach is very natural: the goal of online optimization is to adaptively produce a sequence of actions that will lead to some terminal outcome despite the presence of adversarial response. For the cut-matching game, the cut player’s goal is to produce a short sequence of cut actions such that, no matter what approximate dual responses are given by $\mathcal{M}$ , the state graph $H$ will have large ratio-cut objective.

This approach also provides insight into why earlier combinatorial restrictions on valid cut and matching responses are not necessary when using the cut-matching game to partition (hyper)graphs. An interesting byproduct of our construction, in the case where $\{\delta_{h}\}_{h\in E}$ are asymmetric, is that $\mathcal{M}$ does not necessarily route Eulerian demand graphs. Additionally, no further reductions are required to enforce these combinatorial restrictions when approximating ratio-cut objectives beyond expansion.

To prove Theorem 12, we first describe our construction of $\mathcal{A}_{\mathsf{cut}}$ for the case where the cut functions defining the ratio-cut objective are symmetric, and $H$ is undirected. Simply considering this case already demonstrates a large fraction of our technical contributions. We then consider asymmetric cut functions and directed $H$ , highlighting key differences from the symmetric case.

We first consider the case where the polymatroidal cut functions $\{\delta_{h}\}_{h\in E}$ are symmetric.

Let us start by instantiating the online linear optimization setting in the context of our cut-matching game. For an instance of a $(n,\bm{\mu})$ -generalized cut-matching game, fix the action set to be $\mathcal{X}=\Delta^{n\times n}_{\bm{\mu}}$ where

\Delta^{n\times n}_{\bm{\mu}}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}\mathbf{X}\succeq\mathbf{0}:\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1\big{\}}\,.

At the end of round $t$ , the matching player $\mathcal{M}$ will have produced approximate matching responses $D_{1},\ldots,D_{t}$ . Choose the $t$ -th feedback matrix to be $\mathbf{F}_{t}=\mathbf{L}(D_{t})$ . The loss incurred during round $t$ is then

\big{\langle}\mathbf{L}(D_{t}),\mathbf{X}\big{\rangle}\,.

We instantiate online linear optimization in this way because any bound on regret now translates to a lower bound on the ratio-cut objective for the state graph $H$ . When $\{\delta_{h}\}_{h\in E}$ are symmetric, $H$ is an undirected graph, and $\Psi_{H}^{*}$ is simply the undirected ratio-cut objective, which has the following trivial SDP relaxation:

Given any undirected graph $G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)}$ , we have $\frac{\mathsf{opt}}{2}\leq\Psi_{G}^{*}$ where $\mathsf{opt}$ is the optimal objective value for the following semidefinite program.

$\displaystyle{\min}$	$\displaystyle\big{\langle}\mathbf{L}(G),\mathbf{X}\big{\rangle}$	(cond-sdp)
s.t.	$\displaystyle\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1$
$\displaystyle\mathbf{X}\succeq\mathbf{0}$

Our choice of action set thus coincides with the feasible set for (cond-sdp), while the feedback matrices are chosen so that the sum of the losses give the SDP objective for the state graph. This is as intended; we want the cut player to force a state graph with large ratio-cut objective, thus it is crucial to have some lower bound on the ratio-cut objective that is a function of both cut, and matching responses. This is also to be expected; if we wanted regret minimizing strategies to produce approximate solutions for (cond-sdp), then we would have defined the action set and losses in exactly the same way.

Three issues remain if we wish to use a regret bound from Theorem 4 to analyze a good cut strategy:

(1)

Ensuring Small Width: We need to ensure that the width term $\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}$ is small given any approximate matching response $D_{t}$ .
(2)

Ensuring Large Loss: For our cut strategy to be good, we will need to specify how to produce a cut action such that the loss $\big{\langle}\mathbf{L}(D_{t}),\mathbf{X}_{t}\big{\rangle}$ regardless of what approximate matching response $D_{t}$ is played.
(3)

Implementing $\textup{{MMWU}}_{\eta,\bm{\mu}}$ in Nearly-linear Time: If we want our cut player to run in nearly-linear time, then we will (a) need to compute the update $\textup{{MMWU}}_{\eta,\bm{\mu}}$ in nearly-linear time, and (b) ensure that queries on the vector embedding given by the Gram matrix of the update can be computed in sublinear time.

Let us address these issues now.

The width term is constant for any approximate matching response. This follows simply from the fact that the degree of any vertex $i$ is bounded by $\mu_{i}$ .

Any approximate matching response $D=\big{(}V,E_{D},\mathbf{w}^{D},\bm{\mu}\big{)}$ satisfies $\mathbf{0}\preceq\mathbf{L}(D)\preceq 2\cdot\mathbf{M}$ .

Notice that, since $\mathbf{L}(D)\succeq\mathbf{0}$ , the regret guarantee in Theorem 4 holds for any step-size $\eta>0$ .

Let $\big{\{}\mathbf{v}^{(t)}_{i}\big{\}}_{i\in V}\subseteq\mathbb{R}^{d}$ be a vector embedding of $V$ whose Gram matrix is $\mathbf{X}_{t}$ . If $D_{t}=\big{(}V,E_{D_{t}},\mathbf{w}^{D_{t}}\big{)}$ is the approximate matching response at time $t$ , then the loss $\big{\langle}\mathbf{L}_{D_{t}},\mathbf{X}_{t}\big{\rangle}$ is equivalent to

\big{\langle}\mathbf{L}_{D_{t}},\mathbf{X}_{t}\big{\rangle}=\sum_{(i,j)\in E_{D_{t}}}w_{ij}^{D_{t}}\cdot\big{\lVert}\mathbf{v}^{(t)}_{i}-\mathbf{v}^{(t)}_{j}\big{\rVert}^{2}\,.

For our cut strategy to be good, we need to produce a cut action $(A_{t},B_{t})$ such that, no matter how $D_{t}$ is played, the edges of $D_{t}$ will always be placed across pairs of embedding vectors $\mathbf{v}^{(t)}_{i}$ , $\mathbf{v}^{(t)}_{j}$ that are well-separated under the $\ell_{2}^{2}$ -norm. This can be difficult to guarantee if the flow routed is approximate. If the primal-dual solver can only route a constant factor approximate flow, then an adversarially chosen constant fraction of the demand may be dropped. If one naively rounds $(A_{t},B_{t})$ using a random 1-dimensional projection, as in [9, 53], then the rounding can produce an unbalanced cut, upon which the adversary can choose to drop a fraction of demand that routes between all vertex pairs whose embedding vectors are far apart in $\ell_{2}^{2}$ -distance.

Previous works [23, 22, 49] have addressed this issue by routing a polylogarithmic number of $O(1)$ -factor approximate maximum flows during a single iteration of the cut-matching game However, a technical contribution of this construction is that a single approximate primal-dual solve of (Local-RC) suffices, provided $(A_{t},B_{t})$ are $\sigma$ -separated in the sense of Definition (8).

The following lemma shows that any approximate matching response routing demand across $\sigma$ -separated $(A_{t},B_{t})$ must always route a large amount of flow between embedding vectors that are well separated in $\ell_{2}^{2}$ -distance. Using $(A_{t},B_{t})$ as the cut action will thus always result in large loss.

Given $\bm{\mu}\in\mathbb{Z}_{\geq 0}$ , let $\mathbf{X}\in\Delta_{\bm{\mu}}^{n\times n}$ and $\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d}$ be the vector embedding whose Gram matrix is $\mathbf{X}$ . If $(A,B)$ is $\sigma$ -well separated with respect to $\bm{\mu}$ and $\{\mathbf{v}_{i}\}_{i\in V}$ , then for any approximate matching response $D=\big{(}A\cup B,E_{D},\mathbf{w}^{D}\big{)}$ with respect to $(A,B)$ , we have:

\sum_{\{i,j\}\in E_{D}}w_{ij}^{D}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq{\sigma\over 2}\,.

Since $(A,B)$ are $\sigma$ -separated we have for all $i\in A$ and $j\in B$ :

\frac{\bm{\mu}(A)\bm{\mu}(B)}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\sigma\cdot\sum_{i,j\in\binom{V}{2}}\frac{\mu_{i}\mu_{j}}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}=\sigma\cdot\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq\sigma\,.

Here, we use $\mathbf{X}\in\Delta_{\bm{\mu}}^{n\times n}$ to deduce $\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1$ . Rearranging the above inequality then implies

\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\frac{\sigma}{\bm{\mu}(A)}\cdot\frac{\bm{\mu}(V)}{\bm{\mu}(B)}\geq\frac{\sigma}{\bm{\mu}(A)}\qquad\qquad\forall\,i\in A,j\in B

where the last inequality follows by $\bm{\mu}(B)\leq\bm{\mu}(V)$ . Finally, any approximate matching response is bipartite across $(A,B)$ , and has large total weight. Thus we can conclude

\sum_{\{i,j\}\in E_{D}}w_{ij}^{D}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\frac{\sigma}{\bm{\mu}(A)}\cdot\sum_{i\in A,j\in B}w_{ij}^{D}=\frac{\sigma}{\bm{\mu}(A)}\cdot\mathbf{w}\big{(}e(A,B)\big{)}\geq{\sigma\over 2}

as required. ∎

In section 8, we give an algorithm round-cut that produces $\Omega\big{(}\frac{1}{\log n}\big{)}$ -separated $(A_{t},B_{t})$ . This algorithm will subsequently be used in our cut strategy construction for the symmetric case.

Computing $\textup{{MMWU}}_{\eta,\bm{\mu}}$ in nearly-linear time, along with $\ell_{2}^{2}$ -distances and projections in logarithmic time are standard techniques. We briefly sketch them here.

Lemma 6 in [9] states one can compute a matrix-vector product with a matrix exponential by truncating its Taylor expansion. Given any $\mathbf{A}\in\mathbb{R}^{n\times n}$ and $\mathbf{v}\in\mathbb{R}^{n}$ , one can compute $\mathbf{u}\in\mathbb{R}^{n}$ such that

\big{\lVert}\mathbf{u}-\exp(\mathbf{A})\mathbf{v}\big{\rVert}\leq\epsilon

in time $O\big{(}\mathcal{T}_{\textsf{mv}}\cdot\max\big{\{}\lVert\mathbf{M}\rVert,\log\big{(}\frac{1}{\epsilon}\big{)}\big{\}}\big{)}$ where $\mathcal{T}_{\textsf{mv}}$ is matrix-vector product time. To compute $\textup{{MMWU}}_{\eta,\bm{\mu}}$ in nearly-linear time, we take advantage of two details.

First, each loss matrix provided to MMWU is given by the Laplacian of a graph $D_{t}$ . Consequently, the sparsity of the matrix being exponentiated in MMWU is bounded by $\mathsf{sp}(H_{t})$ . Consequently, matrix-vector product time satisfies $\mathcal{T}_{\textsf{mv}}=O\big{(}\mathsf{sp}(H_{T})\big{)}$ where $T$ is the total number of cut-matching game rounds. Second, Lemma 11 implies that $\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}\leq O(1)$ for every $t\in T$ , and thus

\Big{\lVert}\sum_{k=1}^{t}\mathbf{M}^{-1/2}\mathbf{L}(D_{t})\mathbf{M}^{-1/2}\Big{\rVert}_{\mathsf{spec}}\leq O(t)\,.

If we run at most $\operatorname*{\mathsf{polylog}}(n)$ iterations of the cut-matching game, we can compute an $O\big{(}\frac{1}{\mathsf{poly}(n)}\big{)}$ -additive approximation to MMWU by truncating the Taylor expansion up to a $\operatorname*{\mathsf{polylog}}(n)$ number of terms.

To compute $\ell_{2}^{2}$ -distances, and projections in logarithmic time, one can use the Johnson–Lindenstrauss transform [33] to project the embedding into $O\big{(}\frac{\log n}{\delta^{2}}\big{)}$ dimensions for $\delta=O(1)$ (See, e.g. [24, 1, 14]). Since the error is multiplicative in terms of pairwise $\ell_{2}^{2}$ -distances, picking $\delta$ to be constant only causes a constant factor loss in the number of cut strategy plays, and the final approximation ratio. We will ignore this constant in the subsequent calculations ³³3Picking $\delta>\frac{1}{10}$ loses a factor $2$ in the final bound on $g(n)$ when proving Theorem 12.

We are now ready to describe the cut strategy which we call $\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}}$ . Fix a step size $\eta>0$ to be chosen subsequently, and consider round $t$ of a $(n,\bm{\mu})$ -generalized cut-matching game. Up to this point, the matching player $\mathcal{M}$ will have produced approximate matching responses $D_{1},\ldots,D_{t}$ .

The cut strategy is straightforward: use Johnson-Lindenstrauss and the truncated Taylor expansion, as in Lemma 6 of [9], to compute a low-dimensional projection of the embedding whose Gram matrix is given by $\textup{{MMWU}}_{\eta,\bm{\mu}}\big{(}\mathbf{L}(D_{1}),\ldots,\mathbf{L}(D_{t})\big{)}$ , execute round-cut on the resulting embedding to produce $\sigma$ -separated sets $(A_{t},B_{t})$ , and output $(A_{t},B_{t})$ as the cut action. Our strategy is summarized in Algorithm 2.

Algorithm 1 $\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}}$ . Input: vertex measure $\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}$ , and approximate matching responses $D_{1},\ldots,D_{t}$ . Parameters: step size $\eta>0$ . Do: The following. 1. Sample $d=O\big{(}\log n\big{)}$ random unit vectors $\mathbf{u}_{1},\ldots,\mathbf{u}_{d}\sim\mathcal{S}^{n-1}$ . 2. Denote $\mathbf{W}_{t}=\textup{{MMWU}}_{\eta,\bm{\mu}}\big{(}\mathbf{L}(D_{1}),\ldots,\mathbf{L}(D_{t})\big{)}$ , and compute $\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\subseteq\mathbb{R}^{d}$ given by $\big{(}\hat{\mathbf{v}}^{(t)}_{i}\big{)}_{j}=\langle\mathbf{W}_{t}\mathbf{u}_{j},\mathbf{e}_{i}\rangle\qquad\qquad\forall\,i\in V\;j\in[d]$ using the truncated Taylor series via [9]. 3. Execute $\textup{{round-cut}}\big{(}d,\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\big{)}$ to produce the cut $(A_{t},B_{t})$ . Output: The cut $(A_{t},B_{t})$ .

Figure 2: The cut strategy

\mathcal{A}_{\mathsf{cut}}^{\leftrightarrow}

for symmetric polymatroidal cut functions

\{\delta_{h}\}_{h\in E}

Let us now demonstrate that $\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}}$ is a $\left(\Omega(\log n),O\big{(}{\log^{2}n}\big{)}\right)$ -good cut strategy.

Let $G=(V,E,\mathbf{w},\bm{\mu})$ be an input submodular hypergraph equipped with symmetric polymatroidal cut functions $\{\delta_{h}\}_{h\in E}$ such that $\lvert V\rvert=n$ . Then $\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}}$ is $\Big{(}\Omega(\log n),O\big{(}\log^{2}n\big{)}\Big{)}$ -good with probability $O(1)$ .

Let $T=g(n)$ , and for any sequence of weighted undirected approximate matching responses $D_{1},\ldots,D_{T}$ , suppose $\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}}$ plays cut actions $(A_{1},B_{1}),\ldots,(A_{T},B_{T})$ in response. To show that $\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}}$ is a $\left(\Omega(\log n),O\big{(}{\log^{2}n}\big{)}\right)$ -good cut strategy, we first show that $\Psi^{*}_{H}\geq\Omega(\log n)$ after $T=O\big{(}\log^{2}n\big{)}$ rounds of the cut-matching game.

Fix $H=\sum_{t=1}^{T}D_{t}$ to be the state graph of the cut-matching game. For every $t>0$ , the feedback matrix satisfies $\mathbf{L}(D_{t})\succeq\mathbf{0}$ , hence we can use the regret bound given by (7) in Theorem 4. Lemma 11 ensures

\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}\leq 2

Consequently, we have for every $\mathbf{U}\in\Delta_{\bm{\mu}}^{n\times n}$ and $\eta>0$ :

\big{\langle}\tfrac{1}{T}\cdot\mathbf{L}(H),\mathbf{U}\big{\rangle}=\frac{1}{T}\sum_{t=1}^{T}\big{\langle}\mathbf{L}(D_{t}),\mathbf{U}\big{\rangle}\geq\frac{1-2\eta}{T}\sum_{t=1}^{T}\big{\langle}\mathbf{L}(D_{t}),\mathbf{X}_{t}\big{\rangle}-\frac{\log n}{\eta T}\,,

Next, for fixed $t>0$ , let $\big{\{}\mathbf{v}_{i}^{(t)}\big{\}}_{i\in V}$ be the vector embedding whose Gram matrix is $\mathbf{X}_{t}$ . Theorem 3 implies round-cut outputs $\Omega\big{(}\frac{1}{\log n}\big{)}$ -separated sets $(A_{t},B_{t})$ . Lemma 12 then implies

\frac{1-2\eta}{T}\sum_{t=1}^{T}\big{\langle}\mathbf{L}(D_{t}),\mathbf{X}_{t}\big{\rangle}-\frac{\log n}{\eta T}=\frac{1-2\eta}{T}\sum_{t=1}^{T}\sum_{ij\in E(D_{t})}\mathbf{w}_{ij}^{D_{t}}\cdot\lVert\mathbf{v}^{(t)}_{i}-\mathbf{v}^{(t)}_{j}\rVert^{2}_{2}-\frac{\log n}{\eta T}\geq\frac{C_{1}(1-2\eta)}{\log n}-\frac{\log n}{\eta T}\,.

for some absolute constant $C_{1}>0$ . Let $C_{2}$ be constant satisfying $2\sqrt{\frac{2}{C_{2}}}<\sqrt{C_{1}}$ , then with choice of $\eta={1\over\sqrt{2C_{1}C_{2}}}$ , and $T=C_{2}\cdot{\log^{2}n}$ , we have that:

\displaystyle\frac{C_{1}(1-2\eta)}{\log n}-\frac{\log n}{\eta T}

\displaystyle={C_{1}\over\log n}-\left(2\sqrt{2C_{1}\over C_{2}}\right){1\over\log n}\geq\Omega\left({1\over\log n}\right),

and thus establishing for every $\mathbf{U}\in\Delta_{\bm{\mu}}^{n\times n}$

\big{\langle}\mathbf{L}(H),\mathbf{U}\big{\rangle}\geq\Omega\left({T\over\log n}\right)=\Omega(\log n)

(18)

Now choose $\mathbf{U}=\mathbf{X}^{*}$ , the minimizer to the semidefinite relaxation (cond-sdp) for $\Psi_{H}^{*}$ . Certainly $\mathbf{X}^{*}\in\Delta_{\bm{\mu}}^{n\times n}$ , since $\mathbf{X}^{*}$ is feasible. If $\mathsf{opt}_{H}$ is the optimal objective value of (cond-sdp), then by inequality (18)

\mathsf{opt}_{H}=\big{\langle}\mathbf{L}(H),\mathbf{X}^{*}\big{\rangle}\geq\Omega(\log n)\,.

Finally, Lemma 10 implies the required

\Psi_{H}^{*}\geq\frac{\mathsf{opt}_{H}}{2}\geq\Omega(\log n)

To ensure that we obtain the above guarantee with good probability, note that the randomizes Johnson-Lindenstrauss transform fails to produce an embedding with the necessary distortion guarantees with probability $O\left({1\over n}\right)$ . By the union bound, the probability that this procedure fails at all over any iteration is then $O\left({\operatorname*{\mathsf{polylog}}(n)\over n}\right)=o_{n}(1)$ . The only other randomized part of the algorithm is the balanced case of Algorithm 4. We know from Theorem 3 that one obtains a $\Omega\left(1/\log n\right)$ -separated set with $1-O(1/n)$ probability at each iteration, which again by the union bound guarantees that this procedure will fail during the algorithm with probability $O\left({\operatorname*{\mathsf{polylog}}(n)\over n}\right)$ . Hence the algorithm will produce the desired output with probability $1-o(1)$ . ∎

Let us now consider the case where $\{\delta_{h}\}_{h\in E}$ are asymmetric. We describe our setup for the online linear optimization setting.

Given an instance of a $(n,\bm{\mu})$ -generalized cut-matching game, we again fix the set of actions to be $\mathcal{X}=\Delta_{\bm{\mu}}^{n\times n}$ . For round $t$ of the online optimization setting, we now define the feedback matrix to be the directed Laplacian of the approximate matching response $D_{t}$ : $\mathbf{F}_{t}=\mathbf{L}_{+}(D_{t})$ so that the losses become

\big{\langle}\mathbf{L}_{+}(D_{t}),\mathbf{X}\big{\rangle}

Similar to the symmetric case, our choices of action set and loss functions are guided by the trivial spectral relaxation for the directed ratio-cut of a graph.

Given any weighted, directed graph $G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)}$ , we have $\frac{\mathsf{opt}}{2}\leq\Psi_{G}^{*}$ where $\mathsf{opt}$ is the optimal objective value for the following semidefinite program.

$\displaystyle{\min}$	$\displaystyle\big{\langle}\mathbf{L}_{+}(G),\mathbf{X}\big{\rangle}$	(dir-cond-sdp)
s.t.	$\displaystyle\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1$
$\displaystyle\mathbf{X}\succeq\mathbf{0}$

In order to use a regret bound from Theorem 4, we need only address the first two issues presented in the symmetric case:

(1)

Ensuring Small Width: We need to ensure that the width $\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}_{+}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}$ is small given any approximate matching response $D_{t}$ . Similar to the symmetric case, we show this is bounded by a constant.
(2)

Ensuring Large Loss: We now need to ensure that $\big{\langle}\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\big{\rangle}$ is large. A key difference present in the cut strategy for the asymmetric case is that ensuring large loss requires an additional step in rounding algorithm to control for terms in the loss that correspond to the embedding vector lengths.

To compute the matrix exponential update $\textup{{MMWU}}_{\eta,\bm{\mu}}\big{(}\mathbf{L}_{+}(D_{1}),\ldots,\mathbf{L}_{+}(D_{t})\big{)}$ in nearly-linear time, we can reuse the sketch provided in Section 7.1.3.

Whe width in the asymmetric case is still bounded by an absolute constant given any directed approximate matching response. We prove the following Lemma in Appendix A.5

Suppose $D=\big{(}A\cup B,E_{D},\mathbf{w}^{D},\bm{\mu}\big{)}$ is an approximate matching response with respect to cut action $(A,B)$ . Then $-\mathbf{M}\preceq\mathbf{L}_{+}(D)\preceq 3\cdot\mathbf{M}$ .

This lemma implies that the regret guarantee of Theorem 4 only for step-size $0<\eta<1$ when the losses are $\big{\langle}\mathbf{L}_{+}(D),\mathbf{X}\big{\rangle}$ . We will subsequently choose an $\eta$ satisfying this.

In the asymmetric case, we wish to produce a cut action $(A_{t},B_{t})$ such that no matter how the matching player $\mathcal{M}$ plays $D$ , the edges of $D$ will be weighted such that the loss

\big{\langle}\mathbf{L}_{+}(D_{t}),\mathbf{X}\big{\rangle}=\sum_{(i,j)\in E_{D_{t}}}w^{D_{t}}_{ij}\cdot\Big{(}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}\Big{)}

is large. The critical distinction from the symmetric case is the presence of the embedding vector lengths.

This is where one might be inclined to require that valid matching responses be Eulerian directed graphs. If the matching response $D$ is Eulerian, then the graph can be split into two parts where the edges are all oriented from $A_{t}$ to $B_{t}$ , and from $B_{t}$ to $A_{t}$ . One can then cancel the embedding vector lengths in the loss using the fact that the in- and out-degrees are identical for each vertex. This reduces the problem of constructing a good cut strategy back to that present in the symmetric case: one need only round a cut $(A_{t},B_{t})$ such that a large amount of flow is always routed between vertex pairs whose embedding vectors are far in $\ell_{2}^{2}$ -distance.

Absent of this trick, one then needs to control the length of the embedding vectors. A priori this seems difficult. To ensure that an approximation algorithm based on cut-matching games runs in nearly-linear time, some low-dimensional projection needs to be applied to the embedding. One then cannot hope to simultaneously control for both the distortion of the projected embedding, and the length of the projected vectors, suggesting that the matching response must be an Eulerian graph. We remark that if one tries to directly adapt the potential analysis of [36] or [54], then the same terms in the loss appear in lower bounding the potential progress made when the matching player routes a large flow. This might suggest why current works generalizing the cut-matching game to directed settings [46, 40] require the matching player to play Eulerian graphs.

However, if one is cognizant of the underlying SDP being used to certify lower bounds on the directed conductance of the state graph, then it is natural to consider rounding algorithms which directly target the SDP. Indeed, applying the rounding algorithm present in [2] (Algorithm 4) augments a $\sigma$ -separated cut $(S,T)$ into a cut action $(A,B)$ such that any matching response across $(A,B)$ must have large loss. We describe this rounding procedure in Algorithm 7.2.2 directed-round-cut.

Algorithm 2 directed-round-cut. Input: A cut $(S,T)$ of $V$ , and vector embedding $\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d}$ . 1. Find $r>0$ such that $\bm{\mu}\big{(}A\cap\mathcal{B}_{r}(0)\big{)}\geq\frac{\bm{\mu}(S)}{2}$ and $\bm{\mu}\big{(}A\cap\overline{\mathcal{B}_{r}(0)}\big{)}\geq\frac{\bm{\mu}(S)}{2}$ . 2. Define the following sets. $\displaystyle S^{+}$ $\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}i\in S:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\leq r^{2}\big{\}}\qquad\qquad S^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}i\in S:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\geq r^{2}\big{\}}$ $\displaystyle T^{+}$ $\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}i\in T:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\leq r^{2}\big{\}}\qquad\qquad T^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}i\in T:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\geq r^{2}\big{\}}$ 3. If $\bm{\mu}(T^{+})\leq\bm{\mu}(T^{-})$ , then let $A=S^{+}$ and $B=T^{-}$ . Else, let $A=T^{+}$ and $B=S^{-}$ . Output: the cut $(A,B)$

The intuition for directed-round-cut is this: one does not need to route the flow bidirectionally across $(S,T)$ , and subsequently produce an Eulerian demand graph. The SDP solution, and specifically the position of the median embedding vector, should dictate which direction flow needs to be routed in a given round the cut-matching game. We now prove our lower bound on the loss

Let $\bm{\mu}\in\mathbb{Z}_{\geq 0}^{V}$ , $\mathbf{X}\in\Delta_{\bm{\mu}}^{n\times n}$ , and $\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d}$ be a vector embedding whose Gram matrix is $\mathbf{X}$ . For $\sigma>0$ , suppose directed-round-cut is given $(S,T)$ that are $\sigma$ -separated with respect to $\bm{\mu}$ and $\{\mathbf{v}_{i}\}_{i\in V}$ as input, and outputs $(A,B)$ . Then, any approximate directed matching response $D=\big{(}A\cup B,E_{D},\mathbf{w}^{D}\big{)}$ with respect to $A,B$ satisfies the following:

\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\Big{(}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}\Big{)}\geq\frac{\sigma}{8}

We assume, without loss of generality, that directed-round-cut outputs $(A,B)$ such that $A=S^{+}$ and $B=T^{-}$ as the analysis for the case where $A=T^{+}$ and $B=S^{-}$ is identical.

In step (1) of directed-round-cut, $r>0$ is chosen to be the length of the median element in $S$ with respect to measure $\bm{\mu}$ , we have $\bm{\mu}(A)\geq\frac{\bm{\mu}(S)}{2}$ . Furthermore, if $A=S^{+}$ and $B=T^{-}$ , then $\bm{\mu}(T^{-})\geq\bm{\mu}(T^{+})$ and hence $\bm{\mu}(T^{-})\geq\frac{\bm{\mu}(T)}{2}$ . Consequently,

\bm{\mu}(A)\bm{\mu}(B)\geq\frac{1}{4}\cdot\bm{\mu}(S)\bm{\mu}(T)\,.

Using the fact that $(S,T)$ are $\sigma$ -separated, we derive for all $i\in A,j\in B$ :

\frac{\bm{\mu}(A)\bm{\mu}(B)}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\frac{1}{4}\cdot\frac{\bm{\mu}(S)\bm{\mu}(T)}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\frac{\sigma}{4}\cdot\sum_{i,j\in\binom{V}{2}}\frac{\mu_{i}\mu_{j}}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}

Because $\mathbf{X}\in\Delta_{\bm{\mu}}^{n\times n}$ , we have $\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1$ and

\frac{\sigma}{4}\cdot\sum_{i,j\in\binom{V}{2}}\frac{\mu_{i}\mu_{j}}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}=\frac{\sigma}{4}\cdot\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq\frac{\sigma}{4}

Combining the above two calculations, we have the inequality

\frac{\bm{\mu}(A)\bm{\mu}(B)}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\frac{\sigma}{4}\qquad\Longleftrightarrow\qquad\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\frac{\sigma}{4\bm{\mu}(A)}\cdot\frac{\bm{\mu}(V)}{\bm{\mu}(B)}\geq\frac{\sigma}{4\bm{\mu}(A)}

where we have used the fact that $\bm{\mu}(B)\leq\bm{\mu}(V)$ . Finally, because any approximate directed matching response is bipartite across $(A,B)$ , and the total weight is large, we conclude

\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\frac{\sigma}{4\bm{\mu}(A)}\sum_{(i,j)\in E_{D}}w_{ij}^{D}=\frac{\sigma}{4\bm{\mu}(A)}\cdot\mathbf{w}\big{(}e(A,B)\big{)}\geq\frac{\sigma}{8}

as required. ∎

We also bound the running time of directed-round-cut.

directed-round-cut runs in time $O(nd+n\log n)$ for $n=\lvert V\rvert$ .

Computing the norm of a vector takes time $O(d)$ , so all norm computations take time $O(nd)$ overall. The vectors then need to be sorted by norm value, which takes time $O(n\log n)$ . After that, completing the algorithm takes $O(n)$ time. This yield the desired runtime. ∎

We are now ready to give our good cut strategy

Algorithm 3 $\mathcal{A}^{\rightarrow}_{\mathsf{cut}}$ . Input: vertex measure $\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}$ , and directed approximate matching responses $D_{1},\ldots,D_{t}$ . Parameters: step size $\eta>0$ . Do: The following. 1. Sample $d=O\big{(}\log n\big{)}$ random unit vectors $\mathbf{u}_{1},\ldots,\mathbf{u}_{d}\sim\mathcal{S}^{n-1}$ . 2. Denote $\mathbf{W}_{t}=\textup{{MMWU}}_{\eta,\bm{\mu}}\big{(}\mathbf{L}_{+}(D_{1}),\ldots,\mathbf{L}_{+}(D_{t})\big{)}$ , and compute $\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\subseteq\mathbb{R}^{d}$ given by $\big{(}\hat{\mathbf{v}}^{(t)}_{i}\big{)}_{j}=\langle\mathbf{W}_{t}\mathbf{u}_{j},\mathbf{e}_{i}\rangle\qquad\qquad\forall\,i\in V\;j\in[d]$ using the truncated Taylor series via [9]. 3. Execute $\textup{{round-cut}}\big{(}d,\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\big{)}$ to produce the cut $(S,T)$ 4. Execute $\textup{{directed-round-cut}}\big{(}(S,T),\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\big{)}$ to produce the cut $(A_{t},B_{t})$ . Output: The cut $(A_{t},B_{t})$ .

Figure 3: The cut strategy

\mathcal{A}_{\mathsf{cut}}^{\rightarrow}

for asymmetric polymatroidal cut functions

\{\delta_{h}\}_{h\in E}

Let $G=(V,E,\mathbf{w},\bm{\mu})$ be an input submodular hypergraph equipped with asymmetric polymatroidal cut functions $\{\delta_{h}\}_{h\in E}$ such that $\lvert V\rvert=n$ . Then $\mathcal{A}^{\rightarrow}_{\mathsf{cut}}$ is $\Big{(}\Omega\big{(}\log^{2}n\big{)},O\big{(}\log^{3}n\big{)}\Big{)}$ -good with probability $O(1)$ .

We demonstrate that $\Psi^{*}_{H}\geq\Omega(\log^{2}n)$ after $T=O\big{(}{\log^{3}n}\big{)}$ rounds of the cut-matching game. Since the feedback matrices $\mathbf{L}_{+}(D_{t})$ are not positive semidefinite, we use regret bound (6) from Theorem 4. Lemma 15 guarantees $\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}_{+}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}\leq 3$ for every $t>0$ , thus

	$\displaystyle\big{\langle}\tfrac{1}{T}\cdot\mathbf{L}_{+}(H),\mathbf{U}\big{\rangle}$	$\displaystyle\geq\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\rangle-\frac{\eta}{T}\sum_{t=1}^{T}\,\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}_{+}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}^{2}-\frac{\log n}{\eta T}$
		$\displaystyle\geq\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\rangle-9\eta-\frac{\log n}{\eta T}$

holding for every $0<\eta<1$ .

Next, consider the $t$ -th round, and let $\big{\{}\mathbf{v}_{i}^{(t)}\}_{i\in V}$ be the vector embedding whose Gram matrix is $\mathbf{X}_{t}$ . Theorem 3 implies round-cut outputs $\Omega\big{(}\frac{1}{\log n}\big{)}$ -separated sets $(S,T)$ . Consequently, Lemma 16 implies that directed-round-cut produces a cut action $(A_{t},B_{t})$ such that $D_{t}$ satisfies

\langle\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\rangle=\sum_{(i,j)\in E_{D_{t}}}w_{ij}^{D_{t}}\cdot\Big{(}\big{\lVert}\mathbf{v}_{i}^{(t)}-\mathbf{v}_{j}^{(t)}\big{\rVert}_{2}^{2}+\big{\lVert}\mathbf{v}_{i}^{(t)}\big{\rVert}_{2}^{2}-\big{\lVert}\mathbf{v}_{j}^{(t)}\big{\rVert}_{2}^{2}\Big{)}\geq\frac{C}{8\cdot\log n}

for some absolute constant $C_{1}>0$ . Consequently, the regret bound becomes

\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\rangle-9\eta-\frac{\log n}{\eta T}\geq\frac{C_{1}}{8\cdot\log n}-9\eta-\frac{\log n}{\eta T}\,.

Let $C_{2}$ be a constant satisfying: $\frac{C_{1}}{8}>6/\sqrt{C_{2}}$ . Choosing $\eta=\frac{1}{3\sqrt{c_{2}}\log n}$ and $T=C_{2}\cdot\log^{3}n$ , we have that:

\frac{C_{1}}{8\cdot\log n}-9\eta-\frac{\log n}{\eta T}={1\over\log n}\left({C_{1}\over 8}-{6\over\sqrt{C_{2}}}\right)\geq\Omega\left({1\over\log n}\right)

This implies that every $\mathbf{U}\in\Delta_{\bm{\mu}}^{n\times n}$ satisfies:

\big{\langle}\tfrac{1}{T}\cdot\mathbf{L}_{+}(H),\mathbf{U}\big{\rangle}\geq\Omega\left({1\over\log n}\right)

Now, choosing $\mathbf{U}=\mathbf{X}^{*}$ to be the minimizer to (dir-cond-sdp), and $\mathsf{opt}_{H}$ the optimal objective value for (dir-cond-sdp), we have

\mathsf{opt}_{H}=\big{\langle}\mathbf{L}_{+}(H),\mathbf{X}^{*}\big{\rangle}\geq\Omega\left({T\over\log n}\right)=\Omega\left({\log^{2}n}\right)

Finally, Lemma 14 implies the required

\Psi_{H}^{*}\geq\frac{\mathsf{opt}_{H}}{2}\geq\Omega\left({\log^{2}n}\right)

These guarantees can be shown to hold with super-constant probability by following the same argument as in Lemma 13. ∎

For the sake of completion, let us now prove Theorem 12.

See 12

Items (1) and (2) of statement are given by Lemmas 13, and 18 respectively. To bound the running time, notice that by the sketch in Section 7.1.3, the matrix multiplicative weight update can be computed in time bounded by $\tilde{O}\big{(}\mathsf{sp}(H)\big{)}$ . Meanwhile, the round-cut and directed-round-cut steps run in time $O(nd+n\log n)$ as given by Theorem 3, and lemma 17. Since $d\leq O(\log n)$ for both $\mathcal{A}_{\mathsf{cut}}^{\leftrightarrow}$ and $\mathcal{A}_{\mathsf{cut}}^{\rightarrow}$ , the total runtime is dominated by $\tilde{O}\big{(}\mathsf{sp}(H)\big{)}$ as required. ∎

This section is devoted to proving Theorem 3 on the construction of separated sets. We will show that the algorithm round-cut in Figure 4 satisfies the requirements of the theorem. We denote by $V=[n]$ the index set for the vector embedding $\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d}$ in the theorem statement and define the following notions of expected vector and total variance of the embedding:

	$\displaystyle\mathbf{v}_{\mathsf{avg}}$	$\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{1}{\mu(V)}\sum_{i\in V}\mu_{i}\mathbf{v}_{i}$
	$\displaystyle\operatorname*{Var}_{\bm{\mu}}(\mathbf{v}_{i})$	$\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{1}{\mu(V)}\sum_{i\in V}\mu_{i}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{\mathsf{avg}}\rVert_{2}^{2}=\frac{1}{(\mu(V))^{2}}\sum_{\{i,j\}\in\binom{V}{2}}\mu_{i}\mu_{j}\\|\mathbf{v}_{i}-\mathbf{v}_{j}\\|^{2}.$

To simplify our proof, we will assume, without loss of generality, that our embedding is both centered, i.e., $\mathbf{v}_{\mathsf{avg}}=0,$ and normalized, i.e., $\operatorname*{Var}_{\bm{\mu}}(\mathbf{v}_{i})=1.$ Under these assumptions, we will make use of the following lemma, which relates the contribution to the total variance of a set $\bar{S}$ to the that of its complement $S$ .

Let $\{\mathbf{v}_{i}\}_{i\in V}$ be a centered and normalized vector embedding. We have:

\sum_{i\in\overline{S}}\mu_{i}\cdot\lVert\mathbf{v}_{i}\rVert_{2}^{2}\geq\mu(S)\cdot\bigg{(}1-\frac{1}{(\mu(V))^{2}}\cdot\sum_{i,j\in\binom{S}{2}}\mu_{i}\mu_{j}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\bigg{)}.

Our algorithm handles two cases, depending on whether there exist a set of small measure that contributes most of the variance of the embedding or not. To distinguish these two cases, we consider long vectors, i.e. vectors with large contributions to the variance of the embedding, and in particular we define the set $R_{t}\subseteq V$ of vertices with long vector embeddings:

R_{t}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\Big{\{}i\in V:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\leq t\Big{\}}

We then say that an embedding is $(t,b)$ -balanced if the variance due to vertices in $R_{t}$ captures a $b$ -fraction of the total embedding variance, as per the following definition.

Given $t>0$ , and $b>0$ , a centered normalized vector embedding $\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V}$ is $(t,b)$ -balanced if the set $R_{t}$ satisfies:

\frac{1}{(\mu(V))^{2}}\cdot\sum_{i,j\in\binom{R_{t}}{2}}\bm{\mu}(i)\bm{\mu}(j)\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq b\,.

We distinguish the two cases based on whether the vector embedding is $(3,\nicefrac{{1}}{{10}})$ -balanced. We remark that, given $t$ and $b$ , one can check if the embedding is $(t,b)$ -balanced in $O(nd)$ time by computing distances to the average vector of $R_{t}$ . We start by describing the unbalanced case, which is where our main technical contribution lies.

Algorithm 4 round-cut. Input: A centered normalized vector embedding $\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V}.$ 1. If $\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d}$ is $\big{(}3,\frac{1}{10}\big{)}$ -balanced: • Sample $\mathbf{g}$ uniformly at random from $\mathcal{S}^{d-1}$ and compute $r_{i}=\sqrt{d}\cdot\langle\mathbf{g},\mathbf{v}_{i}\rangle$ for each $i\in V$ . • Sort $r_{i}$ . Assume without loss of generality that $r_{1}\geq\ldots\geq r_{n}$ and define sweep cuts $S_{t}^{up}=\{i\in V:r_{i}\geq r_{t}\}\text{ and }S_{t}^{down}=\{i\in V:r_{i}\leq r_{t}\}$ • Let $t_{start}=\min\{t:3\cdot\bm{\mu}(S_{t}^{up})\geq\bm{\mu}(V)\}$ and $t_{end}=\max\{t:3\cdot\bm{\mu}(S_{t}^{down})\geq\bm{\mu}(V)\}$ • Output $\big{(}S_{t_{start}}^{up},S_{t_{end}}^{down}\big{)}$ . 2. If $\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d}$ is not $\big{(}3,\frac{1}{10}\big{)}$ -balanced: • Sort the vectors by their length. Relabel vertices so that $\|\mathbf{v}_{1}\|\geq\|\mathbf{v}_{2}\|\geq\ldots\geq\|\mathbf{v}_{n}\|$ . • Let $T=\{i\in V:\|\mathbf{v}_{i}\|^{2}\leq\frac{3}{2}\}.$ • Define sweep cuts $S_{t}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in V:\|\mathbf{v}_{i}\|\geq\|\mathbf{v}_{t}\|\}.$ For each sweep cut $S_{t}$ with $\|\mathbf{v}_{t}\|\geq 3$ : – if the following condition holds(see Equation 20 and Equation 21), $\frac{\bm{\mu}(S_{t})}{\bm{\mu}(V)}\cdot\frac{\|\mathbf{v}_{t}\|^{2}}{8}\geq\frac{3}{100\log(\bm{\mu}(V))},$ output $\frac{1}{100\log(\bm{\mu}(V))}$ -separated sets $(S_{t},T).$

Figure 4: The round-cut algorithm.

In the unbalanced case, the round-cut algorithm sorts the embedding vectors according to their length and outputs separated sets $S$ and $T$ from the set of threshold cuts in this ordering. In particular, the set $T$ is always chosen to equal $R_{3/2}$ , while the set $S$ is chosen by sweeping through all cuts in the ordering to find one that satisfies the required separation guarantee. We show that such a cut always exists. The following lemma, which is implicit in previous works [10, 2], follows from Markov’s Inequality and Lemma 19. We prove it for completeness in Appendix A.

Let $\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V}$ be a centered normalized vector embedding that is not $(3,\nicefrac{{1}}{{10}})$ -balanced. Then:

\mu(R_{3/2})\geq\frac{1}{3}\cdot\mu(V)\,\textrm{ and }\,\sum_{i\in\overline{R_{3}}}\mu_{i}\cdot\|\mathbf{v}_{i}\|^{2}\geq\frac{3}{5}\cdot\mu(V).

Setting $T=R_{3/2}$ , it remains to that there exists some $t\geq 3$ such that all pairs $i\in\overline{R_{t}}$ , $j\in T$ satisfy

\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\sigma\cdot\frac{3\mu(V)}{\mu(\overline{R_{t}})},

(20)

where $\sigma=\nicefrac{{1}}{{100\log(\mu(V))}}=\Omega(\nicefrac{{1}}{{\log n}}).$ For sake of contradiction, assume that no such $t$ exists. Then, for every $t\geq 3,$ there exists a pair $i\in\overline{R_{t}},j\in T$ such that the squared distance between $\mathbf{v}_{i}$ and $\mathbf{v}_{j}$ is less than $\sigma\cdot\nicefrac{{3\mu(V)}}{{\mu(\overline{R_{t}})}}.$ At the same time, we can lower bound such squared distance as follows:

\|\mathbf{v}_{i}-\mathbf{v}_{j}\|^{2}\geq(\|\mathbf{v}_{i}\|-\|\mathbf{v}_{j}\|)^{2}\geq(\sqrt{t}-\sqrt{\nicefrac{{3}}{{2}}})^{2}\geq\frac{t}{8}\,.

(21)

This yields, for all $t\geq 3$ :

\mu(\overline{R_{t}})\leq\frac{24\sigma}{t}\cdot\mu(V).

Integrating over $t\in[3,M]$ where $M\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\max_{i\in V,\mu_{i}>0}\|\mathbf{v}_{i}\|^{2}$ , we have:

\sum_{i\in\overline{R_{3}}}\mu_{i}\|\mathbf{v}_{i}\|^{2}=\int_{3}^{M}\mu(\overline{R_{t}})\,dt\,+\,3\cdot\mu(\overline{R_{3}})\leq 24\sigma\cdot\mu(V)\cdot\int_{3}^{M}\frac{dt}{t}\,+8\sigma\cdot\mu(V)\leq(24\log(M)+8)\cdot\sigma\mu(V).

At the same time, Lemma 20 lower bounds the right hand side:

\frac{3}{5}\cdot\mu(V)\leq\sum_{i\in\overline{R_{3}}}\mu_{i}\|\mathbf{v}_{i}\|^{2}\leq 32\cdot\log(\mu(V))\cdot\sigma\cdot\mu(V)\,,

which implies $\sigma\geq\nicefrac{{1}}{{54\log(\mu(V))}},$ yielding the required contradiction.

On the right hand side, by the normalization assumption and the fact that $\bm{\mu}\in\mathbb{Z}^{n}_{\geq 0}$ , we have $M\leq\sum_{i\in V}\mu_{i}\|\mathbf{v}_{i}\|^{2}\leq\mu(V)=\mathsf{poly}(n).$ This completes the proof that a set $\overline{R_{t}}$ with the required separation guarantee exists. To find such cut, the round-cut algorithm avoids computing the minimum distance between $\overline{R_{t}}$ and $T$ for each threshold cut by noticing that the guarantee holds if we replace such minimum distance with the quantity $\nicefrac{{t}}{{8}}$ , as we did in Equation 21.

In the balanced case, the set $R_{3}$ of short vectors captures a constant fraction of the vector embedding’s variance and a large fraction of the total volume $\bm{\mu}(V)$ via Markov’s inequality. One can subsequently expect that random projection coupled with a search for a balanced sweep cut will work as a bulk of the vectors will be well separated from one another. The balanced case analysis is thus very similar to the analysis of SDP-based approximation algorithms for balanced separator [53]. In fact, it is simpler as one does not need to bound the conductance of such a cut and so we leave the proof of the following lemma to Appendix A.6.

Given $V=[n]$ , let $\{\mathbf{v}_{i}\in\mathbb{R}_{d}\}_{i\in V}\subseteq\mathbb{R}^{d}$ be a centered normalized vector embedding. If $\{\mathbf{v}_{i}\}_{i\in V}$ is a $\big{(}3,\frac{1}{10}\big{)}$ -balanced embedding, then algorithm round-cut outputs a pair $(S,T)$ of $\Omega\big{(}\frac{1}{\log n}\big{)}$ -separated sets with high probability. m

To bound the running time, consider the following. Checking if the embedding is $(3,\nicefrac{{1}}{{10}})$ -balanced takes $O(nd)$ time. This can seen by noting that the inequality in Definition 14 can be rewritten as:

{1\over\mu(R_{t})}\sum_{i\in R_{t}}\mu(i)\Big{\|}\mathbf{v}_{i}-{1\over\mu(R_{t})}\sum_{j\in R_{t}}\mu(j)\mathbf{v}_{j}\Big{\|}_{2}^{2}\geq b.

Computing $\ell_{2}^{2}$ -distances and vector projections takes $O(d)$ time per vector, of which there are $n$ . Sorting takes $O(n\log n)$ time. There are at most $O(n)$ sweep cuts to check in both the balanced and unbalanced case, and finding the appropriate cut in both cases takes $O(1)$ time per cut. Checking the condition:

\frac{\mu(S_{t})}{\mu(V)}\cdot\frac{\|\mathbf{v}_{t}\|^{2}}{8}\geq\frac{3}{100\log(\mu(V))},

requires time $O(d)$ per cut. Hence the overall running time is at most $O(nd+n\log n)$ .

This completes the proof of Theorem 3.

AC is supported by NSF DGE 2140001. Part of this work was also done while AC was a visiting student at Bocconi University. LO is supported by NSF CAREER 1943510. The authors would like to thank anonymous reviewers for suggesting related work on polymatroidal networks, Thatchaphol Saranurak for suggesting related work on the use of cut-matching games beyond partitioning graphs, and Jeffrey Negrea for assisting on topics related to online learning. AC would like to thank Luca Trevisan for discussion regarding HDX construction, flow embedding techniques for certifying expansion of random regular graphs, and support while visiting Bocconi.

[1] Dimitris Achlioptas. Database-friendly random projections. In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 274–281, 2001.
[2] Amit Agarwal, Moses Charikar, Konstantin Makarychev, and Yury Makarychev. $\mathcal{O}(\sqrt{\log n})$ -approximation algorithms for Min UnCut, Min 2CNF Deletion, and directed cut problems. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 573–581, 2005.
[3] Sameer Agarwal, Kristin Branson, and Serge Belongie. Higher order learning with graphs. In Proceedings of the 23rd international conference on machine learning, pages 17–24. Association for Computing Machinery, 2006.
[4] Ravindra K Ahujia, Thomas L Magnanti, and James B Orlin. Network Flows: Theory, Algorithms and Applications. Prentice-Hall, 1993.
[5] Zeyuan Allen-Zhu, Zhenyu Liao, and Lorenzo Orecchia. Spectral sparsification and regret minimization beyond matrix multiplicative updates. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 237–245, 2015.
[6] Konstantinos Ameranis, Antares Chen, Lorenzo Orecchia, and Erasmo Tani. Efficient flow-based approximation algorithms for submodular hypergraph partitioning via a generalized cut-matching game. arXiv preprint arXiv:2301.08920, 2023.
[7] Reid Andersen and Kevin J. Lang. An algorithm for improving graph partitions. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’08, page 651–660, USA, 2008. Society for Industrial and Applied Mathematics.
[8] Sanjeev Arora, Elad Hazan, and Satyen Kale. ${{O}}(\sqrt{\log n})$ -Approximation to SPARSEST CUT in $\tilde{O}(n^{2})$ Time. SIAM Journal on Computing, 39(5):1748–1771, January 2010.
[9] Sanjeev Arora and Satyen Kale. A combinatorial, primal-dual approach to semidefinite programs. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 227–236, 2007.
[10] Sanjeev Arora, Satish Rao, and Umesh Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):1–37, 2009.
[11] Francis Bach et al. Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, 6(2-3):145–373, 2013.
[12] Austin R Benson, David F Gleich, and Jure Leskovec. Higher-order organization of complex networks. Science, 353(6295):163–166, 2016.
[13] Aaron Bernstein, Maximilian Probst Gutenberg, and Thatchaphol Saranurak. Deterministic decremental sssp and approximate min-cost flow in almost-linear time. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 1000–1008. IEEE, 2022.
[14] Avrim Blum, John Hopcroft, and Ravindran Kannan. Foundations of data science. Cambridge University Press, 2020.
[15] Umit V Catalyurek and Cevdet Aykanat. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on parallel and distributed systems, 10(7):673–693, 1999.
[16] T-H Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang. Spectral properties of hypergraph laplacian and approximation algorithms. Journal of the ACM (JACM), 65(3):1–48, 2018.
[17] Yuk Hei Chan and Lap Chi Lau. On linear and semidefinite programming relaxations for hypergraph matching. Mathematical programming, 135(1-2):123–148, 2012.
[18] Moses Charikar, Konstantin Makarychev, and Yury Makarychev. Directed metrics and directed graph partitioning problems. In SODA, volume 6, pages 51–60, 2006.
[19] Chandra Chekuri, Sreeram Kannan, Adnan Raja, and Pramod Viswanath. Multicommodity flows and cuts in polymatroidal networks. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages 399–408, 2012.
[20] Li Chen, Rasmus Kyng, Yang P Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum flow and minimum-cost flow in almost-linear time. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 612–623. IEEE, 2022.
[21] Julia Chuzhoy. A Distanced Matching Game, Decremental APSP in Expanders, and Faster Deterministic Algorithms for Graph Cut Problems. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Proceedings, pages 2122–2213. Society for Industrial and Applied Mathematics, January 2023.
[22] Julia Chuzhoy, Yu Gao, Jason Li, Danupon Nanongkai, Richard Peng, and Thatchaphol Saranurak. A deterministic algorithm for balanced cut with applications to dynamic connectivity, flows, and beyond. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1158–1167. IEEE, 2020.
[23] Julia Chuzhoy and Sanjeev Khanna. A new algorithm for decremental single-source shortest paths with applications to vertex-capacitated flow and cut problems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 389–400, 2019.
[24] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60–65, 2003.
[25] Karen D Devine, Erik G Boman, Robert T Heaphy, Rob H Bisseling, and Umit V Catalyurek. Parallel hypergraph partitioning for scientific computing. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, pages 10–pp. IEEE, 2006.
[26] Alina Ene and Huy Nguyen. Random coordinate descent methods for minimizing decomposable submodular functions. In International Conference on Machine Learning, pages 787–795. PMLR, 2015.
[27] Alina Ene, Huy Nguyen, and László A Végh. Decomposable submodular function minimization: discrete and continuous. Advances in neural information processing systems, 30, 2017.
[28] Song Feng, Emily Heath, Brett Jefferson, Cliff Joslyn, Henry Kvinge, Hugh D Mitchell, Brenda Praggastis, Amie J Eisfeld, Amy C Sims, Larissa B Thackray, et al. Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC bioinformatics, 22(1):1–21, 2021.
[29] Elad Hazan, Satyen Kale, and Shai Shalev-Shwartz. Near-optimal algorithms for online matrix prediction. In Conference on Learning Theory, pages 38–1. JMLR Workshop and Conference Proceedings, 2012.
[30] Shohei Hidaka and Masafumi Oizumi. Fast and exact search for the partition with minimal information loss. PLoS One, 13(9):e0201126, 2018.
[31] Arun Jambulapati, Yin Tat Lee, Jerry Li, Swati Padmanabhan, and Kevin Tian. Positive semidefinite programming: mixed, parallel, and width-independent. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 789–802, 2020.
[32] Haotian Jiang, Tarun Kathuria, Yin Tat Lee, Swati Padmanabhan, and Zhao Song. A faster interior point method for semidefinite programming. In 2020 IEEE 61st annual symposium on foundations of computer science (FOCS), pages 910–918. IEEE, 2020.
[33] William Johnson and J. Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Conference in Modern Analysis and Probability, 26:189–206, 01 1982.
[34] Satyen Kale. Efficient algorithms using the multiplicative weights update method. Princeton University, 2007.
[35] Rohit Khandekar, Subhash Khot, Lorenzo Orecchia, and Nisheeth K Vishnoi. On a cut-matching game for the sparsest cut problem. Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2007-177, 6(7):12, 2007.
[36] Rohit Khandekar, Satish Rao, and Umesh Vazirani. Graph partitioning using single commodity flows. Journal of the ACM (JACM), 56(4):1–15, 2009.
[37] Steffen Klamt, Utz-Uwe Haus, and Fabian Theis. Hypergraphs and cellular networks. PLoS computational biology, 5(5):e1000385, 2009.
[38] Tsz Chiu Kwok, Lap Chi Lau, and Kam Chuen Tung. Cheeger inequalities for vertex expansion and reweighted eigenvalues. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 366–377. IEEE, 2022.
[39] Lap Chi Lau, Kam Chuen Tung, and Robert Wang. Cheeger inequalities for directed graphs and hypergraphs using reweighted eigenvalues. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1834–1847, 2023.
[40] Lap Chi Lau, Kam Chuen Tung, and Robert Wang. Fast algorithms for directed graph partitioning using flows and reweighted eigenvalues. arXiv preprint arXiv:2306.09128, 2023.
[41] Eugene L Lawler and Charles U Martel. Computing maximal “polymatroidal” network flows. Mathematics of Operations Research, 7(3):334–347, 1982.
[42] Tom Leighton and Satish Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM (JACM), 46(6):787–832, 1999.
[43] Pan Li and Olgica Milenkovic. Inhomogeneous hypergraph clustering with applications. Advances in neural information processing systems, 30, 2017.
[44] Pan Li and Olgica Milenkovic. Submodular hypergraphs: p-laplacians, cheeger inequalities and spectral clustering. In International Conference on Machine Learning, pages 3014–3023. PMLR, 2018.
[45] Yaowei Long and Thatchaphol Saranurak. Near-optimal deterministic vertex-failure connectivity oracles. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 1002–1010. IEEE, 2022.
[46] Anand Louis. Cut-matching games on directed graphs. arXiv preprint arXiv:1010.1047, 2010.
[47] Anand Louis. Hypergraph markov operators, eigenvalues and approximation algorithms. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 713–722, 2015.
[48] Anand Louis and Yury Makarychev. Approximation algorithms for hypergraph small-set expansion and small-set vertex expansion. Theory of Computing, 12(1):1–25, 2016.
[49] Danupon Nanongkai and Thatchaphol Saranurak. Dynamic spanning forest with worst-case update time: adaptive, las vegas, and $\mathcal{O}(n^{1/2-\varepsilon})$ -time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1122–1129, 2017.
[50] Mukund Narasimhan, Nebojsa Jojic, and Jeff A Bilmes. Q-clustering. Advances in Neural Information Processing Systems, 18, 2005.
[51] Lorenzo Orecchia. Fast Approximation Algorithms for Graph Partitioning using Spectral and Semidefinite-Programming Techniques. PhD thesis, EECS Department, University of California, Berkeley, May 2011.
[52] Lorenzo Orecchia, Konstantinos Ameranis, Charalampos Tsourakakis, and Kunal Talwar. Practical almost-linear-time approximation algorithms for hybrid and overlapping graph clustering. In International Conference on Machine Learning, pages 17071–17093. PMLR, 2022.
[53] Lorenzo Orecchia, Sushant Sachdeva, and Nisheeth K Vishnoi. Approximating the exponential, the lanczos method and an $\tilde{\mathcal{o}}(m)$ -time spectral algorithm for balanced separator. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 1141–1160, 2012.
[54] Lorenzo Orecchia, Leonard J Schulman, Umesh V Vazirani, and Nisheeth K Vishnoi. On partitioning graphs via single commodity flows. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 461–470, 2008.
[55] Jonah Sherman. Breaking the multicommodity flow barrier for $\mathcal{O}(\sqrt{\log n})$ -approximations to sparsest cut. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 363–372. IEEE, 2009.
[56] Zoya Svitkina and Lisa Fleischer. Submodular approximation: Sampling-based algorithms and lower bounds. SIAM Journal on Computing, 40(6):1715–1737, 2011.
[57] Charalampos E Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. Scalable motif-aware graph clustering. In Proceedings of the 26th International Conference on World Wide Web, pages 1451–1460, 2017.
[58] Nate Veldt. Cut-matching games for generalized hypergraph ratio cuts. In Proceedings of the ACM Web Conference 2023, WWW ’23, page 694–704, New York, NY, USA, 2023. Association for Computing Machinery.
[59] Nate Veldt, Austin R Benson, and Jon Kleinberg. Approximate decomposable submodular function minimization for cardinality-based components. Advances in Neural Information Processing Systems, 34:3744–3756, 2021.
[60] Nate Veldt, Austin R Benson, and Jon Kleinberg. Hypergraph cuts with general splitting functions. SIAM Review, 64(3):650–685, 2022.
[61] Wenyin Yang, Guojun Wang, Md Zakirul Alam Bhuiyan, and Kim-Kwang Raymond Choo. Hypergraph partitioning for social networks based on information entropy modularity. Journal of Network and Computer Applications, 86:59–71, 2017.
[62] Yuichi Yoshida. Cheeger inequalities for submodular transformations. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2582–2601. SIAM, 2019.

In this section, we prove some technical lemmata used in the body of the paper.

See 1

Observe that the denominator of the ratio-cut objective is a monotone submodular cut function applied over the entire vertex set

\delta_{\bm{\mu}}(S)=\min\{\bm{\mu}(S),\bm{\mu}(\bar{S})\},

Consequently, for every $S\subseteq V$ :

\Psi_{G}(S)=\frac{\sum_{h\in E}w_{h}\delta_{h}(S\cap h)}{\min\{\bm{\mu}(S),\bm{\mu}(\bar{S})\}}=\frac{\delta_{G}(S)}{\delta_{\bm{\mu}}(S)}={\overline{\delta}_{G}(\boldsymbol{1}^{S})\over\overline{\delta}_{\bm{\mu}}(\boldsymbol{1}^{S})},

which gives:

\Psi_{G}^{*}\geq\min_{\mathbf{x}\in\mathbb{R}^{n}}{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}.

We now prove the opposite inequality by showing the second part of the lemma. Notice that the value of the ratio in the right-hand side of (RC-NonCvx) is invariant under shifting all the coordinates by a fixed constant. Furthermore, both the numerator and the denominator of the right-hand side of (RC-NonCvx) are homogeneous and hence their ratio is invariant under scaling $\mathbf{x}$ by a constant, i.e., for any $u\in\mathbb{R}$ :

{\overline{\delta}_{G}(u\cdot\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(u\cdot\mathbf{x})}={u\cdot\overline{\delta}_{G}(\mathbf{x})\over u\cdot\overline{\delta}_{\bm{\mu}}(\mathbf{x})}={\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}.

Hence, for any $\mathbf{x}$ , by letting $\hat{\mathbf{x}}$ be defined as:

\hat{x_{i}}={x_{i}-\min_{j}x_{j}\over\max_{j,k}(x_{j}-x_{k})},

we have $\max_{i}\hat{x}_{i}=1$ , $\min_{i}\hat{x}_{i}=0$ , and:

{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}={\overline{\delta}_{G}(\hat{\mathbf{x}})\over\overline{\delta}_{\bm{\mu}}(\hat{\mathbf{x}})}.

We then have, as per Propositions 3.1 and 3.1 in [11]:

{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}={\overline{\delta}_{G}(\hat{\mathbf{x}})\over\overline{\delta}_{\bm{\mu}}(\hat{\mathbf{x}})}={\operatorname*{\mathbb{E}}_{t\sim[0,1]}[\delta_{G}(S_{t}(\hat{\mathbf{x}}))]\over\operatorname*{\mathbb{E}}_{t\sim[0,1]}[\delta_{\bm{\mu}}(S_{t}(\hat{\mathbf{x}}))]}.

Hence, there must exist some $t\in[0,1]$ such that:

{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}\geq{\delta_{G}(S_{t}(\hat{\mathbf{x}}))\over\delta_{\bm{\mu}}(S_{t}(\hat{\mathbf{x}}))}.

This gives:

\min_{\mathbf{x}\in\mathbb{R}^{n}}{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}=\Psi_{G}^{*},

as needed for the first part of the lemma. Moreover, computing and checking all of the sets $\{S_{t}(\mathbf{x})\}_{t\in[0,1]}$ only requires time $O(n\log n)$ for sorting the entries of $\mathbf{x}$ and time $O(\mathsf{sp}(G)+|V|)$ for evaluating the numerator and denominator. ∎

See 2

Let $S,T\subseteq V$ and denote with $\overline{S}$ and $\overline{T}$ their complements in $V$ . We consider four possible cases, corresponding to four ways of determining $\delta(S)$ and $\delta(T)$ . In the first two cases, we exploit the submodularity of $F$ and $G$ .

Case 1:

$F(S)\leq G(\overline{S})$ and $F(T)\leq G(\overline{T})$ , which implies $\delta(S)=F(S)$ and $\delta(T)=F(T).$ We have:

	$\displaystyle\delta(S\cup T)+\delta(S\cap T)$	$\displaystyle=\min\{F(S\cup T),\;G(\overline{S}\cap\overline{T})\}+\min\{F(S\cap T),G(\overline{S}\cup\overline{T})\}$
		$\displaystyle\leq F(S\cup T)+F(S\cap T)$
		$\displaystyle\leq F(S)+F(T)=\delta(S)+\delta(T).$

Case 2:

$G(\overline{S})\leq F(S)$ and $G(\overline{T})\leq F(T)$ , which implies $\delta(S)=G(\overline{S})$ and $\delta(T)=G(\overline{T}).$ We have:

	$\displaystyle\delta(S\cup T)+\delta(S\cap T)$	$\displaystyle=\min\{F(S\cup T),\;G(\overline{S}\cap\overline{T})\}+\min\{F(S\cap T),G(\overline{S}\cup\overline{T})\}$
		$\displaystyle\leq G(\overline{S}\cap\overline{T})+G(\overline{S}\cup\overline{T})$
		$\displaystyle\leq G(\overline{S})+G(\overline{T})=\delta(S)+\delta(T).$

In the last two cases, we rely on the monotonocity of $F$ and $G$ .

Case 3:

$F(S)\leq G(\overline{S})$ and $G(\overline{T})\leq F(T)$ , which implies $\delta(S)=F(S)$ and $\delta(T)=G(\overline{T})$ . We have:

	$\displaystyle\delta(S\cup T)+\delta(S\cap T)$	$\displaystyle=\min\{F(S\cup T),\;G(\overline{S}\cap\overline{T})\}+\min\{F(S\cap T),G(\overline{S}\cup\overline{T})\}$
		$\displaystyle\leq G(\overline{S}\cap\overline{T})+F(S\cap T)$
		$\displaystyle\leq G(\overline{T})+F(S)=\delta(S)+\delta(T).$

Case 4:

$G(\overline{S})\leq F(S)$ and $F(T)\leq G(\overline{T})$ , which implies $\delta(S)=G(\overline{S})$ and $\delta(T)=F(T)$ . We have:

	$\displaystyle\delta(S\cup T)+\delta(S\cap T)$	$\displaystyle=\min\{F(S\cup T),\;G(\overline{S}\cap\overline{T})\}+\min\{F(S\cap T),G(\overline{S}\cup\overline{T})\}$
		$\displaystyle\leq G(\overline{S}\cap\overline{T})+F(S\cap T)$
		$\displaystyle\leq G(\overline{S})+F(T)=\delta(S)+\delta(T).$

It follows that $\delta$ is submodular, as we have shown that for every $S,T\subseteq V$ :

\delta(S\cup T)+\delta(S\cap T)\leq\delta(S)+\delta(T).

∎

See 3

Since $F^{-}_{h}$ and $F^{+}_{h}$ are monotone, for any $\mathbf{x}\in\mathbb{R}^{n}$ there exists a value $z^{*}=z^{*}(\mathbf{x})$ such that:

F^{-}_{h}(\{i\in V\mid\mathbf{x}(i)\geq z\})\leq F^{+}_{h}(\{i\in V\mid\mathbf{x}(i)<z\}),

for all $z\geq z^{*}$ , and:

F^{-}_{h}(\{i\in V\mid\mathbf{x}(i)\geq z\})\geq F^{+}_{h}(\{i\in V\mid\mathbf{x}(i)<z\}),

for all $z<z^{*}$ . We then have, by Proposition 3.1(c) in [11]:

	$\displaystyle\overline{\delta}_{h}(\mathbf{x})$	$\displaystyle=\int_{-\infty}^{\infty}\delta(\{i\in V\mid\mathbf{x}(i)\geq z\})dz$
		$\displaystyle=\int_{\infty}^{z^{}(\mathbf{x})}F^{+}_{h}(\{i\in V\mid\mathbf{x}(i)<z\})dz+\int_{z^{}(\mathbf{x})}^{\infty}F^{-}_{h}(\{i\in V\mid\mathbf{x}(i)\geq z\})dz$
		$\displaystyle=\min_{\nu\in\mathbb{R}}\int_{\infty}^{0}F^{+}_{h}(\{i\in V\mid\mathbf{x}(i)-\nu<z\})dz+\int_{0}^{\infty}F^{-}_{h}(\{i\in V\mid\mathbf{x}(i)-\nu\geq z\})dz$
		$\displaystyle=\min_{\nu\in\mathbb{R}}\bar{F}^{-}_{h}((\mathbf{x}-\nu{\bf 1})_{+})+\bar{F}^{+}_{h}((\mathbf{x}-\nu{\bf 1})_{-}),$

as needed. ∎

See 4

Since $F_{h}$ is submodular, its Lovász extension $\bar{F}_{h}$ is convex, and hence, by Fact 3:

	$\displaystyle\bar{\delta}_{h}(\mathbf{x})$	$\displaystyle=\min_{\nu\in\mathbb{R}}\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{+})+\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{-})=\min_{\nu\in\mathbb{R}}2\cdot\left({1\over 2}\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{+})+{1\over 2}\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{-})\right)$
		$\displaystyle\geq\min_{\nu\in\mathbb{R}}2\cdot\bar{F}_{h}\left({1\over 2}(\mathbf{x}-\nu{\bf 1}_{h})_{+}+{1\over 2}(\mathbf{x}-\nu{\bf 1}_{h})_{-}\right)$
		$\displaystyle=\min_{\nu\in\mathbb{R}}2\cdot\bar{F}_{h}\left({1\over 2}\|\mathbf{x}-\nu{\bf 1}_{h}\|\right)$
		$\displaystyle=\min_{\nu\in\mathbb{R}}\bar{F}_{h}\left(\|\mathbf{x}-\nu{\bf 1}_{h}\|\right),$

where in the above we applied Jensen’s Inequality and the fact that $\bar{F}_{h}$ is positive homogenous (Proposition 3.1 (e) from [11]). On the other hand, since $F_{h}$ is monotone, we have:

\displaystyle\bar{\delta}_{h}(\mathbf{x})

\displaystyle=\min_{\nu\in\mathbb{R}}\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{+})+\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{-})\leq\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|)+\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|)=2\cdot\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|),

where the inequality follow from Fact 1. This completes the proof. ∎

See 5

Consider the hypergraph flows $\{\mathbf{Y}^{e}\}_{e\in E^{H}}$ associated with the embedding $H\preceq_{\rho}G$ and let $\mathbf{u}\in\mathbb{R}^{E_{G}}$ be an arbitrary vector over the hyperedges. As each $\mathbf{Y}^{e}$ routes demand $w^{H}_{e}$ between the endpoints of $e=\{u,v\}\in E_{H}$ , we have

	$\displaystyle w^{H}_{e}\cdot(\mathbf{x}_{u}-\mathbf{x}_{v})_{+}$	$\displaystyle=(\langle\operatorname{dem}(\mathbf{Y}^{e}),\mathbf{x}\rangle)_{+}=\left(\sum_{h\in E_{G}}\langle\mathbf{y}^{e}_{h},\mathbf{x}_{h}\rangle\right)_{+}=\left(\sum_{h\in E_{G}}\langle\mathbf{y}^{e}_{h},\mathbf{x}_{h}-u_{h}{\bf 1}_{h}\rangle\right)_{+}$
		$\displaystyle\leq\sum_{h\in E_{G}}\langle(\mathbf{y}^{e}_{h})_{+},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{+}\rangle+\langle(\mathbf{y}^{e}_{h})_{-},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{-}\rangle.$

where the last equality relies on the fact that $\mathbf{y}^{e}_{h}\in\mathbb{R}^{h}\perp{\bf 1}_{h}.$ Summing over all edges in $E_{h}$ yields:

	$\displaystyle\sum_{e=\{i,j\}\in E_{H}}w^{H}_{ij}\cdot(\mathbf{x}_{i}-\mathbf{x}_{j})_{+}\leq$	$\displaystyle\sum_{e\in E_{H}}\sum_{h\in E_{G}}\langle(\mathbf{y}^{e}_{h})_{+},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{+}\rangle+\langle(\mathbf{y}^{e}_{h})_{-},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{-}\rangle,$
	$\displaystyle=$	$\displaystyle\sum_{h\in E_{G}}\left\langle\sum_{e\in E_{H}}(\mathbf{y}^{e}_{h})_{+},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{+}\right\rangle+\left\langle\sum_{e\in E_{H}}(\mathbf{y}^{e}_{h})_{-},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{-}\right\rangle$
	$\displaystyle\leq$	$\displaystyle\rho\cdot\sum_{h\in E_{G}}w^{G}_{h}\cdot\left(\bar{F}_{h}^{-}((\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{+})+\bar{F}_{h}^{+}((\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{-})\right)$

The second inequality follows from Definition 9 and the characterization of Lovász extensions by the positive submodular polytope (see Proposition 3.4 in [11]). Finally, the first statement in the theorem follows by setting the arbitrary $\mathbf{u}$ to be the required minimizer. The specialization to cuts follows by plugging in $\mathbf{x}={\bf 1}^{S}$ .

In the symmetric case, the capacity constraints in the definition become simply, for all $h\in E_{G}$ :

\sum_{e\in E_{h}}(\mathbf{y}^{e}_{h})_{+}\,,\sum_{e\in E_{h}}(\mathbf{y}^{e}_{h})_{-}\in\rho\cdot w_{h}\cdot\mathcal{P}_{+}(F_{h}).

As a result, the orientation of each arc $e=(i,j)\in E_{H}$ can be discarded in this case, as the flow $\mathbf{Y}^{e}$ can be oriented arbitrarily while satisfying the capacity constraint. Therefore, we can repeat the same analysis for the flow embedding given by $\{-\mathbf{Y}^{e}\}_{e\in E_{H}}$ . Summing the positive and negative parts, we obtain the required statement. ∎

See 6

Consider the hypergraph flow as a flow over the factor graph $\hat{G}.$ By a standard application of dynamic trees (see for example Lemma 3.8 in [55]) to the computation of flow-path decompositions [4] over $\hat{G}$ , we can compute, in time $\tilde{O}(|\hat{E}|_{G})=\tilde{O}(\mathsf{sp}(G))$ , a graph $H=(V,E_{H},\mathbf{w}^{H}),$ such that $|E_{H}|=\tilde{O}(\mathsf{sp}(G))$ and there exist hypergraph flows $\{\mathbf{Y}^{e}\}_{e\in E_{H}}$ in $G$ with the following properties:

•

$\sum_{e\in E_{H}}\mathbf{Y}^{e}=\mathbf{Y},$

•

For any $e\in E_{H}$ , for any factor graph edge $\{i,h\}$ , the flow $(\mathbf{Y}^{e}_{h})_{i}$ has the same sign as $(\mathbf{Y}_{h})_{i}$ . In particular, for all $h\in E_{G}$ :

\sum_{e\in E_{H}}(\mathbf{y}^{e}_{h})_{+}=(\mathbf{y}_{h})_{+}\;\textrm{ and }\;\sum_{e\in E_{H}}(\mathbf{y}^{e}_{h})_{-}=(\mathbf{y}_{h})_{-}

(22)

•

For all $e=(i,j)\in E^{H}$ , we have $\operatorname{dem}_{i}(\mathbf{Y})\geq 0$ and $\operatorname{dem}_{j}(\mathbf{Y})<0.$ The flow $\mathbf{Y}^{e}$ routes $w^{H}_{e}$ units of flow between vertex $i$ and vertex $j$ , i.e.,

$\operatorname{dem}(\mathbf{Y}^{e})=w^{H}_{e}({\bf 1}_{i}-{\bf 1}_{j}).$

The last point proves the required statements about the bipartiteness of $H$ and its degrees. To complete the proof, we verify that $H\preceq_{\operatorname{cong}_{G}(Y)}G$ . By the definition of congestion in Equation 9, we have that, for all $h\in E_{G}:$

\mathbf{y}_{h}\in\operatorname{cong}_{G(\mathbf{Y})}\cdot w_{h}\cdot\mathcal{B}(\delta_{h}).

By the characterization of the base polytope in Equation 4.2, this implies that:

(\mathbf{y}_{h})_{+}\in\operatorname{cong}_{G(\mathbf{Y})}\cdot w_{h}\cdot\mathcal{P}(F^{-}_{h})\;\textrm{ and }\;(\mathbf{y}_{h})_{-}\in\operatorname{cong}_{G(\mathbf{Y})}\cdot w_{h}\cdot\mathcal{P}(F^{+}_{h}).

Together with Equation 22 and Definition 9, this completes the proof. ∎

See 2

Given any cut $(S,\overline{S})$ with $\mu(S)\leq\mu(\overline{S})$ , for each $i\in V$ , set the vector embedding of the vertices to

\mathbf{v}_{i}=\begin{cases}\begin{pmatrix}\sqrt{\frac{\mu(V)}{\mu(S)\mu(\bar{S})}}&0&\ldots&0\end{pmatrix}^{\top}&\mbox{if }i\in S\\ {\bf 0}&\mbox{otherwise.}\end{cases}

And for each hyperedge $h\in E$ :

\mathbf{v}_{h}=\begin{cases}\bm{0}&\text{if }\delta_{h}(S)=F_{h}(S),\\ \left(\sqrt{\mu(V)\over\mu(S)\mu(\overline{S})},0,\cdots,0\right)^{\top}&\text{if }\delta_{h}(S)=F_{h}(\overline{S}).\end{cases}

We then have:

\mathbf{d}_{i}^{h}=\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}.

It is easy to see that these vector embedding satisfies the triangle inequality constraints. For the variance constraint, we have:

\sum_{\{i,j\}\subseteq V}{\mu_{i}\mu_{j}\over\mu(V)}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}=\sum_{\begin{subarray}{c}i\in S\\ j\in\overline{S}\end{subarray}}{\mu_{i}\mu_{j}\over\mu(V)}\cdot{\mu(V)\over\mu(S)\cdot\mu(\overline{S})}=1.

Furthermore, this solution has objective value:

\sum_{h\in E}w_{h}\bar{F}_{h}(\mathbf{d}^{h})=\frac{\mu(V)}{\mu(S)\mu(\bar{S})}\cdot\sum_{h\in E}w_{h}\delta_{h}(S\cap h)\leq 2\cdot\frac{\sum_{h\in E}w_{h}\delta_{h}(S\cap h)}{\min\{\mu(S),\mu(\bar{S})\}}=2\Psi_{G}(S).

This shows that the combinatorial solution $(S,\overline{S})$ can be mapped to a feasible solution to the RC-Metric while preserving the objective to within a factor of two. ∎

See 4

Let $(S,\overline{S})$ be a candidate solution to the minimum hypergraph ratio-cut problem satisfying $\mu(S)\leq\mu(\overline{S})$ , with objective value $\kappa$ , i.e.:

\Psi_{G}(S)=\frac{\sum_{h\in E}w_{h}\delta_{h}(S\cap h)}{\mu(S)}=\kappa.

We can construct a solution to the RC-Directed-Semimetric program with a objective value $4\kappa$ as follows. Partition $E$ into $E^{+}\cup E^{-}$ where:

E^{+}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{h\in E\mid\delta_{h}(S)=F_{h}^{+}(\overline{S})\}\hskip 14.22636pt\text{and}\hskip 14.22636ptE^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{h\in E\mid\delta_{h}(S)=F_{h}^{-}(S)\}.

We can consider:

\mathbf{v}_{i}=\begin{cases}\left(\sqrt{\mu(V)\over\mu(S)\mu(\overline{S})},0,\cdots,0\right)^{\top}&\text{if }i\in S,\\ \bm{0}&\text{if }i\in\overline{S},\end{cases}\hskip 28.45274pt\text{and}\hskip 28.45274pt\mathbf{v}_{h}=\begin{cases}\bm{0}&\text{if }h\in E^{-},\\ \left(\sqrt{\mu(V)\over\mu(S)\mu(\overline{S})},0,\cdots,0\right)^{\top}&\text{if }h\in E^{+}.\end{cases}

and for any $h\in E$ let $\mathbf{d}_{h}^{+}$ and $\mathbf{d}_{h}^{-}$ be defined in terms of $\{\mathbf{v}_{i}\}_{i\in V}$ and $\{\mathbf{v}_{h}\}_{h\in E}$ so as to satisfy the equality constraints in the RC-Directed-Semimetric program. One can check that:

\mathbf{d}_{h}^{+}(i)=\begin{cases}2\cdot{\mu(V)\over\mu(S)\mu(\overline{S})}&\text{if }i\in\overline{S}\text{ and }h\in E^{+},\\ 0&\text{otherwise,}\end{cases}\hskip 14.22636pt\text{ and }\hskip 14.22636pt\mathbf{d}_{h}^{-}(i)=\begin{cases}2\cdot{\mu(V)\over\mu(S)\mu(\overline{S})}&\text{if }i\in S\text{ and }h\in E^{-},\\ 0&\text{otherwise.}\end{cases}

It easy to see that all the triangle inequality constraints are satisfied. We can also verify that the variance constraint is satisfied:

\displaystyle\sum_{\{i,j\}\subseteq V}{\mu(i)\mu(j)\over\mu(V)}\cdot\|\mathbf{v}_{i}-\mathbf{v}_{j}\|_{2}^{2}

\displaystyle=\sum_{\begin{subarray}{c}i\in S\\ j\in\overline{S}\end{subarray}}{\mu(i)\mu(j)\over\mu(V)}\cdot{\mu(V)\over\mu(S)\cdot\mu(\overline{S})}=1.

Hence, the vectors defined above form a feasible solution to the RC-Directed-Semimetric program. The objective value of this solution is:

	$\displaystyle\sum_{h\in E^{-}}w_{h}\overline{F_{h}^{-}}(\mathbf{d}_{h}^{-})+\sum_{h\in E^{+}}w_{h}\overline{F_{h}^{+}}(\mathbf{d}_{h}^{+})$	$\displaystyle=2\cdot\left(\sum_{h\in E^{-}}w_{h}{\mu(V)\over{\mu(S)\mu(\overline{S})}}\overline{F_{h}^{-}}({\bf 1}_{S})+\sum_{h\in E^{+}}w_{h}{\mu(V)\over{\mu(S)\mu(\overline{S})}}\overline{F_{h}^{+}}({\bf 1}_{\overline{S}})\right)$
		$\displaystyle={2\mu(V)\over{\mu(S)\mu(\overline{S})}}\cdot\left(\sum_{h\in E^{-}}w_{h}{F_{h}^{-}}(S\cap h)+\sum_{h\in E^{+}}w_{h}{F_{h}^{+}}(\overline{{S}}\cap h)\right)$
		$\displaystyle={2\mu(V)\over{\mu(S)\mu(\overline{S})}}\cdot\left(\sum_{h\in E}w_{h}{\delta_{h}}(S\cap h)\right)$
		$\displaystyle\leq 4\cdot{\sum_{h\in E}w_{h}{\delta_{h}}(S\cap h)\over{\mu(S)}}$
		$\displaystyle=4\kappa.$

completing the proof. ∎

See 8

Since the relaxation is invariant to translations, we will assume, without loss of generality, that:

\sum_{i\in V}\mu(i)\mathbf{v}_{i}=0,

(23)

and that:

\sum_{i,j\in V}{\mu(i)\mu(j)\over\mu(V)}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}=1.

(24)

We divide the proof in two cases corresponding to the case in which the embedding is balanced and unbalanced respectively. In Case 1, the embedding $\{\mathbf{v}_{i}\}_{i\in V}$ satisfies:

\sum_{\{i,j\}\in\binom{R_{3}}{2}}{\mu(i)\mu(j)\over\mu(V)}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}\geq 1/10,

where:

R_{t}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\left\{i\in V\mid\lVert\mathbf{v}_{i}\rVert^{2}\leq{t\over\mu(V)}\right\}.

In this case, it is implicit in the results of Arora Rao and Vazirani [10], that there exists polynomial-time computable sets $S,T\subseteq V$ satisfying $\mu(S),\mu(T)=\Omega(\mu(V))$ , and:

\min_{\begin{subarray}{c}i\in S\\ j\in T\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\Omega\left({1\over\mu(V)\sqrt{\log n}}\right).

(25)

We construct the map $\phi$ explicitly as follows. Let $S$ and $T$ be the sets satisfying (25). Let $r\in\mathbb{R}$ be a value such that:

\mu(\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\})=\mu(\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\geq r\}),

and define:

S^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\},\hskip 28.45274ptS^{+}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\},

and:

T^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\},\hskip 28.45274ptT^{+}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\}.

Suppose, without loss of generality, that $\mu(T^{+})\geq\mu(T^{-})$ ⁴⁴4If this condition does not hold, the rest of the argument still holds by exchanging the roles of $T^{+}$ with $T^{-}$ and $S^{+}$ with $S^{-}$ .. Note, in particular, that this implies that $\mu(S^{-})\geq\mu(S)/2$ and $\mu(T^{-})\geq\mu(T)/2$ . Let:

\phi(\mathbf{v}_{i})=\min_{j\in T^{+}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}.

As a simple consequence of the triangle inequality constraints, we have, for any $i,k\in V\cup E$ :

\phi(\mathbf{v}_{i})-\phi(\mathbf{v}_{k})\leq\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}.

Let $\mathbf{x}\in\mathbb{R}^{V}$ be the vector given by $\mathbf{x}(i)=\phi(\mathbf{v}_{i})$ . Then:

	$\displaystyle\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)\|\mathbf{x}(i)-\mathbf{x}(j)\|$	$\displaystyle\geq\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)(\mathbf{x}(i)-\mathbf{x}(j))_{+}\geq\sum_{\begin{subarray}{c}i\in S^{-}\\ j\in T^{+}\end{subarray}}\mu(i)\mu(j)(\mathbf{x}(i)-\mathbf{x}(j))_{+}$
		$\displaystyle=\sum_{\begin{subarray}{c}i\in S^{-}\\ j\in T^{+}\end{subarray}}\mu(i)\mu(j)(\mathbf{x}(i))_{+}=\mu(T^{+})\sum_{i\in S^{-}}\mu(i)(\mathbf{x}(i))_{+}$
		$\displaystyle=\mu(T^{+})\sum_{i\in S^{-}}\mu(i)\left(\min_{j\in T^{+}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}\right)_{+}$
		$\displaystyle\geq\mu(T^{+})\sum_{i\in S^{-}}\mu(i)\left(\min_{j\in T^{+}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+r^{2}-r^{2}\right)_{+}$
		$\displaystyle\geq\mu(T^{+})\sum_{i\in S^{-}}\mu(i)\left(\min_{j\in T^{+}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\right)_{+}$
		$\displaystyle\geq\mu(T^{+})\mu(S^{-})\cdot\min_{\begin{subarray}{c}i\in S^{-}\\ j\in T^{+}\end{subarray}}\left(\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\right)_{+}$
		$\displaystyle=\mu(T^{+})\mu(S^{-})\cdot\min_{\begin{subarray}{c}i\in S^{-}\\ j\in T^{+}\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}$
		$\displaystyle\geq\mu(T^{+})\mu(S^{-})\cdot\min_{\begin{subarray}{c}i\in S\\ j\in T\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}$
		$\displaystyle\geq{1\over 4}\mu(T)\mu(S)\cdot\min_{\begin{subarray}{c}i\in S\\ j\in T\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}$
		$\displaystyle\geq\Omega(\mu(V)^{2})\cdot\Omega\left({1\over\mu(V)\sqrt{\log n}}\right)$
		$\displaystyle\geq\Omega\left(\mu(V)\over\sqrt{\log n}\right).$

Giving that:

{\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)|\mathbf{x}(i)-\mathbf{x}(j)|\over\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}}={1\over\mu(V)}\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)|\mathbf{x}(i)-\mathbf{x}(j)|\geq\Omega\left({1\over\sqrt{\log n}}\right).

Hence, the map $\phi$ satisfies conditions 1 and 2 in the statement of the lemma. Moreover, it is easy to see that this map can be computed in polynomial time. This completes the proof for Case 1.

For Case 2, we consider what happens when Case 1 doesn’t hold, i.e.:

\sum_{\{i,j\}\in\binom{R_{3}}{2}}{\mu(i)\mu(j)\over\mu(V)}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}<1/10.

Here, we have:

{\mu(R_{3/2})\over\mu(V)}\geq{1\over 3}\hskip 28.45274pt\text{and}\hskip 28.45274pt\sum_{i\in V\setminus R_{3}}\mu_{i}\cdot\lVert\mathbf{v}_{i}\rVert^{2}\geq{3\over 5}.

Where the first inequality follows by (23), (24) and Markov’s inequality. Let $T=R_{3/2}$ and $S=V\setminus R_{3}$ . Consider the embedding:

\phi(\mathbf{v})=\min_{\mathbf{u}\in T}\lVert\mathbf{v}-\mathbf{u}\rVert^{2}_{2}+\lVert\mathbf{v}\rVert_{2}^{2}-\lVert\mathbf{u}\rVert_{2}^{2}.

Again, it is easy to see that the map $\phi$ satisfies condition 1 of the lemma. Moreover, note that, for any $\mathbf{v}_{i}\in S$ :

|\phi(\mathbf{v}_{i})|\geq|\lVert\mathbf{v}_{i}\rVert^{2}_{2}-\lVert\mathbf{u}\rVert_{2}^{2}|,

for some $\mathbf{u}\in T$ and hence:

|\phi(\mathbf{v}_{i})|\geq|\lVert\mathbf{v}_{i}\rVert^{2}_{2}-\lVert\mathbf{u}\rVert_{2}^{2}|\geq{1\over 2}\lVert\mathbf{v}_{i}\rVert^{2}_{2}.

Let $\mathbf{x}\in\mathbb{R}^{V}$ be the vector given by $\mathbf{x}(i)=\phi(\mathbf{v}_{i})$ . We then have:

	$\displaystyle{\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)\|\mathbf{x}(i)-\mathbf{x}(j)\|\over\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}}$	$\displaystyle={1\over\mu(V)}\sum_{\{i,j\}\in\binom{V}{2}}{\mu(i)\mu(j)}\|\mathbf{x}(i)-\mathbf{x}(j)\|={\mu(T)\over\mu(V)}\sum_{i\in S}\mu(i)\|\mathbf{x}(i)\|$
		$\displaystyle\geq{\mu(T)\over 2\mu(V)}\sum_{i\in S}\mu_{i}\lVert\mathbf{v}_{i}\rVert_{2}^{2}$
		$\displaystyle\geq{1\over 2}\cdot{1\over 3}\cdot{3\over 5}=\Omega(1).$

Note that the bound obtained in this case is tighter, i.e. $\Omega(\nicefrac{{1}}{{\sqrt{\log n}}})$ has been replaced by a constant. ∎

See 8

Let us proceed with taking the conjugate dual. Starting with (CI-Primal):

	$\displaystyle\min_{\begin{subarray}{c}\mathbf{x}\in\mathbb{R}^{V}\\ \langle\mathbf{s},\mathbf{x}\rangle_{\mu}=1\end{subarray}}\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x})$	$\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x})+\max_{\alpha\in\mathbb{R}}\Big{(}1-\langle\mathbf{s},\mathbf{x}\rangle_{\mu}\Big{)}$
		$\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\alpha\in\mathbb{R}}\bigg{\{}\alpha+\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x})-\big{\langle}\alpha\big{(}\mu\circ\mathbf{s}\big{)},\mathbf{x}\big{\rangle}\bigg{\}}\,.$

Recall that the Lovász extension is given by

\bar{\delta}_{h}(\mathbf{x})=\max_{\mathbf{f}\in\mathcal{B}(\delta_{h})}\langle\mathbf{f},\mathbf{x}\rangle\,,

and hence:

	$\displaystyle\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\alpha\in\mathbb{R}}\bigg{\{}\alpha+\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x})-\big{\langle}\alpha\big{(}\mu\circ\mathbf{s}\big{)},\mathbf{x}\big{\rangle}\bigg{\}}$	$\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\alpha\in\mathbb{R}}\bigg{\{}\alpha+\sum_{h\in E}\max_{\mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})}\langle\mathbf{f}_{h},\mathbf{x}\rangle-\big{\langle}\alpha\big{(}\mu\circ\mathbf{s}\big{)},\mathbf{x}\big{\rangle}\bigg{\}}$
		$\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\begin{subarray}{c}\alpha\in\mathbb{R},\mathbf{f}_{h}\in\mathbb{R}^{V}\\ \mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})\end{subarray}}\bigg{\{}\alpha+\sum_{h\in E}\langle\mathbf{f}_{h},\mathbf{x}\rangle-\big{\langle}\alpha\big{(}\mu\circ\mathbf{s}\big{)},\mathbf{x}\big{\rangle}\bigg{\}}$
		$\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\begin{subarray}{c}\alpha\in\mathbb{R},\mathbf{f}_{h}\in\mathbb{R}^{V}\\ \mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})\end{subarray}}\bigg{\{}\alpha+\bigg{\langle}\sum_{h\in E}\mathbf{f}_{h}-\alpha\big{(}\mu\circ\mathbf{s}\big{)},\,\mathbf{x}\bigg{\rangle}\bigg{\}}\,.$

Strong duality always holds when the feasible region is polyhedron. Hence, we have:

\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\begin{subarray}{c}\alpha\in\mathbb{R},\mathbf{f}_{h}\in\mathbb{R}^{V}\\ \mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})\end{subarray}}\bigg{\{}\alpha+\bigg{\langle}\sum_{h\in E}\mathbf{f}_{h}-\alpha\big{(}\mu\circ\mathbf{s}\big{)},\,\mathbf{x}\bigg{\rangle}\bigg{\}}=\max_{\begin{subarray}{c}\alpha\in\mathbb{R},\mathbf{f}_{h}\in\mathbb{R}^{V}\\ \mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})\end{subarray}}\alpha+\min_{\mathbf{x}\in\mathbb{R}^{V}}\bigg{\langle}\sum_{h\in E}\mathbf{f}_{h}-\alpha\big{(}\mu\circ\mathbf{s}\big{)},\,\mathbf{x}\bigg{\rangle},

which is equivalent to the required linear program by the definition of congestion (Equation 9) and demand (Equation 10)for hypergraph flows. ∎

See 9

Let $D=\frac{1}{\alpha}\cdot H.$ The sparsity of $D$ also follows directly from Theorem 6. By the same theorem, we have $H\preceq_{1}G$ , as $\operatorname{cong}_{G}(\mathbf{Y})\leq 1$ by Property 1 of approximate dual solutions. Hence, the scaled graph $D$ embeds into $G$ with congestion $\frac{1}{\alpha},$ as required. Taking into account the same scaling by $\frac{1}{\alpha}$ , the statements on the bipartiteness and degree bounds of $D$ follow directly from Property 2 and 3 of approximate dual solutions via Theorem 6. Similarly, the largeness of $D$ follows form Property 2. ∎

See 1

See 2

By Definition 12, each dual graph certificate $D_{t}$ played by the matching player has $\tilde{O}(\mathsf{sp}(G))$ edges. Hence, $H_{t}$ has at most $\tilde{O}(\mathsf{sp}(G))$ edges for all iterations $t\leq g(n).$ It follows that the running time of the cut strategy at every round is bound by $\tilde{O}(\mathsf{sp}(G))$ . On the matching side, the running time of the approximate primal-dual oracle is given by Theorem 9 and 10. Both dominate or are asymptotically equivalent to $\tilde{O}(\mathsf{sp}(G))$ . ∎

See 15

Fix any $\mathbf{x}\in\mathbb{R}^{V}$ such that $\big{\langle}\mathbf{x},\mathbf{M}\mathbf{x}\big{\rangle}=1$ . We can write the quadratic form $\big{\langle}\mathbf{x},\mathbf{L}_{+}(D)\mathbf{x}\big{\rangle}$ as

\big{\langle}\mathbf{x},\mathbf{L}_{+}(D)\mathbf{x}\big{\rangle}=\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}(x_{i}-x_{j})^{2}+x_{i}^{2}-x_{j}^{2}\big{)}

For the lower bound, note that

\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}(x_{i}-x_{j})^{2}+x_{i}^{2}-x_{j}^{2}\big{)}\geq-\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot x_{j}^{2}=-\sum_{j\in B}x_{j}^{2}\sum_{i\in\partial^{-}(j)}w_{ij}^{D}=-\sum_{j\in B}x_{j}^{2}\cdot\deg_{D}(j)

Using the fact that the degrees of $i\in V$ in $D$ is bounded above by $\mu_{i}$ , we have

-\sum_{j\in B}x_{j}^{2}\cdot\deg_{D}(j)\leq-\sum_{j\in B}x_{j}^{2}\cdot\mu_{j}\leq-\big{\langle}\mathbf{x},\mathbf{M}\mathbf{x}\big{\rangle}=-1

For the upper bound, note that

\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}(x_{i}-x_{j})^{2}+x_{i}^{2}-x_{j}^{2}\big{)}\leq\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}(x_{i}-x_{j})^{2}+x_{i}^{2}\big{)}\leq\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}2\cdot(x_{i}^{2}+x_{j}^{2})+x_{i}^{2}\big{)}

Let’s then exactract the dependence on the degree of each vertex.

	$\displaystyle\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}2\cdot(x_{i}^{2}+x_{j}^{2})+x_{i}^{2}\big{)}$	$\displaystyle=2\cdot\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot(x_{i}^{2}+x_{j}^{2})+\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot x_{i}^{2}$
		$\displaystyle=2\cdot\bigg{(}\sum_{i\in A}x_{i}^{2}\sum_{j\in\partial^{+}(i)}w_{ij}^{D}+\sum_{j\in B}x_{j}^{2}\sum_{i\in\partial^{-}(j)}w_{ij}^{D}\bigg{)}+\sum_{i\in A}x_{i}^{2}\sum_{j\in\partial^{+}(i)}w_{ij}^{D}$
		$\displaystyle=2\cdot\bigg{(}\sum_{i\in A}x_{i}^{2}\cdot\deg_{D}(i)+\sum_{j\in B}x_{j}^{2}\cdot\deg_{D}(j)\bigg{)}+\sum_{i\in A}x_{i}^{2}\cdot\deg_{D}(i)$
		$\displaystyle=2\sum_{i\in V}x_{i}^{2}\cdot\deg_{D}(i)+\sum_{i\in A}x_{i}^{2}\cdot\deg_{D}(i)$

Finally, using the fact that the degrees are bounded by $\bm{\mu}$ , we have

2\sum_{i\in V}x_{i}^{2}\cdot\deg_{D}(i)+\sum_{i\in A}x_{i}^{2}\cdot\deg_{D}(i)\leq 2\sum_{i\in V}x_{i}^{2}\cdot\mu_{i}+\sum_{i\in A}x_{i}^{2}\cdot\mu_{i}\leq 3\sum_{i\in V}x_{i}^{2}\cdot\mu_{i}=3

thus establishing the required bound. ∎

See 20

Because the embedding is normalized, the expect squared length of a vector is $1$ . Markov’s inequality then yields:

\mu(R_{3/2})\geq\bigg{(}1-\frac{1}{3/2}\bigg{)}\cdot\mu(V)=\frac{1}{3}\cdot\mu(V)\,\textrm{ and }\,\mu(R_{3})\geq\bigg{(}1-\frac{1}{3}\bigg{)}\cdot\mu(V)=\frac{2}{3}\cdot\mu(V)\,.

For the second part, by Lemma 19 and the definition of balanced embedding, we have that:

\sum_{i\in\overline{R_{3}}}\mu_{i}\cdot\lVert\mathbf{v}_{i}\rVert_{2}^{2}\geq\bm{\mu}(R_{3})\cdot\bigg{(}1-\frac{1}{(\bm{\mu}(V))^{2}}\cdot\sum_{i,j\in\binom{R_{3}}{2}}\mu_{i}\mu_{j}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\bigg{)}\geq\frac{2}{3}\cdot\mu(V)\cdot\frac{9}{10}=\frac{3}{5}\cdot\mu(V).

∎

We now complete the argument that round-cut outputs separated sets when the given vector embedding is balanced. We begin by recalling a fact about uniformly sampled vectors on the sphere.

Given $\mathbf{v}\in\mathbb{R}^{d}$ , and $\mathbf{g}$ a uniformly sampled random vector in $\mathcal{S}^{d-1}$ , the following hold.

1.

$\operatorname*{\mathbb{E}}_{\mathbf{g}}\langle\mathbf{g},\mathbf{v}\rangle^{2}=\frac{\lVert\mathbf{v}\rVert^{2}}{d}$
2.

For any $\delta>\frac{d}{16}$ , we have $\Pr_{\mathbf{g}}\Big{(}\langle\mathbf{g},\mathbf{v}\rangle^{2}\geq\delta\cdot\frac{\lVert\mathbf{v}\rVert^{2}}{d}\Big{)}\leq e^{-\delta/4}$ .

We also require a lemma proved in [51] which states that, with constant probability, one can extract two balanced partitions where projection lengths are well separated when projecting along a random direction.

Given a set $V$ such that $\lvert V\rvert=n$ , and a $\big{(}3,\frac{1}{10}\big{)}$ -balanced embedding $\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d}$ , let $r_{i}=\sqrt{d}\cdot\langle\mathbf{g},\mathbf{v}_{i}\rangle$ for a $\mathbf{g}\in\mathcal{S}^{d-1}$ and each $i\in V$ . Assume, without loss of generality, that $r_{1}\geq\ldots\geq r_{n}$ . Then, there exists an absolute constant $r>0$ such that, with $O(1)$ probability over a uniformly random choice of $\mathbf{g}\in\mathcal{S}^{d-1}$ , there exists $1\leq a<b\leq n$ satisfying the following:

1.

$\mu(\{1,\ldots,a\})\geq\frac{1}{3}\cdot\mu(V)$ ,
2.

$\mu(\{b,\ldots,n\})\geq\frac{1}{3}\cdot\mu(V)$ ,
3.

$\big{(}r_{a}-r_{b}\big{)}^{2}\geq r\cdot\operatorname*{Var}_{\mu}(\mathbf{v}_{i})$ .

Using the above two lemmata, we can now prove that round-cut produces a robustly separated set in the balanced case.

See 21

Define $\mathcal{E}_{1}$ to be the event that there exist $1\leq a<b\leq n$ such that the following hold simultaneously: (1) $\mu(\{1,\ldots,a\})\geq\frac{1}{3}\cdot\mu(V)$ , (2) $\mu(\{b,\ldots,n\})\geq\frac{1}{3}\cdot\mu(V)$ , and (3) $(r_{a}-r_{b})^{2}\geq\frac{r}{\mu(V)}$ . Next, for a constant $c_{1}>0$ to be chosen subsequently, and any $i,j\in V$ , define the event $\mathcal{E}_{2}(i,j)$ as

\mathcal{E}_{2}(i,j)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\Big{\{}(r_{i}-r_{j})^{2}<\big{(}4c_{1}\cdot\log n\big{)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\Big{\}}\qquad\textup{and}\qquad\mathcal{E}_{2}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\bigcap_{i,j\in V}\mathcal{E}_{2}(i,j).

To begin, we observe that if $\mathcal{E}_{1}$ and $\mathcal{E}_{2}$ occur jointly, then the cuts $S=\{1,\ldots,a\}$ and $T=\{b,\ldots,n\}$ are $\Omega\big{(}\frac{1}{\log n}\big{)}$ -separated set and, consequently, round-cut outputs the required separated set in the balanced case. Event $\mathcal{E}_{2}$ implies that

\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}>\frac{(r_{i}-r_{j})^{2}}{4c_{1}\cdot\log n}

holds for all $i\in S$ , and $j\in T$ . Item (3) defining event $\mathcal{E}_{1}$ then determines:

\frac{(r_{i}-r_{j})^{2}}{4c_{1}\cdot\log n}\geq r\cdot\operatorname*{Var}_{\mu}(\mathbf{v}_{i})\cdot\frac{1}{4c_{1}\cdot\log n}=\frac{r}{4c_{1}\cdot\log n}\,.

Finally, item (1) in event $\mathcal{E}_{1}$ implies that $\mu(S),\mu(T)\geq\frac{\mu(V)}{3}$ and hence, for all $i\in S,j\in T$ :

\frac{\mu(S)\mu(T)}{(\mu(V))^{2}}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\frac{r}{36\cdot c_{1}\cdot\log n}=\Omega\bigg{(}\frac{1}{\log n}\bigg{)}\,,

which establishes the separation guarantee.

Let us next bound the probability that $\mathcal{E}_{1}$ and $\mathcal{E}_{2}$ occur jointly. By Lemma 23, there exists a constant $p>0$ such that $\Pr_{\mathbf{g}}(\mathcal{E}_{1})=p$ and hence $\Pr_{\mathbf{g}}\big{(}\overline{\mathcal{E}_{1}}\big{)}=1-p$ . By Lemma 22, we have

	$\displaystyle\Pr_{\mathbf{g}}\big{(}\overline{\mathcal{E}_{2}(i,j)}\big{)}$	$\displaystyle=\Pr_{\mathbf{g}}\Big{(}(r_{i}-r_{j})^{2}\geq\big{(}4c_{1}\cdot\log n\big{)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\Big{)}$
		$\displaystyle=\Pr_{\mathbf{g}}\bigg{(}\langle\mathbf{g},\mathbf{v}_{i}-\mathbf{v}_{j}\rangle^{2}\geq\big{(}4c_{1}\cdot\log n\big{)}\cdot\frac{\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}}{d}\bigg{)}\leq\frac{1}{n^{c_{1}}}$

Performing a union bound over $\overline{\mathcal{E}_{1}}$ and $\overline{\mathcal{E}_{2}(i,j)}$ for each $i,j\in V$ derives

\Pr_{\mathbf{g}}\big{(}\mathcal{E}_{1}\cap\mathcal{E}_{2}\big{)}\geq p-\binom{n}{2}\frac{1}{n^{c_{1}}}\geq p-\frac{1}{n^{c_{1}-2}}=\Omega(1)

for $c_{1}$ large enough. We conclude that with constant probability, a single execution of step (1) in round-cut outputs $\Omega\big{(}\frac{1}{\log n}\big{)}$ -separated sets.

Note that upon computing candidate sets, it is possible to check whether these are in fact $\Omega(1/\log n)$ -separated. One can therefore boost the probability of success and obtain a high probability guarantee by repeating this procedure up to $O(\log n)$ many times. Note that this clearly affects the runtime of this procedure by at most a logarithmic factor. ∎

In this section, we briefly show that our formulation of the ratio-cut improvement problem in Definition 10 generalizes the definition of the modified quotient cut score, introduced by Andersen and Lang [7], to potentially non-integral seed vectors (rather than cut seeds) and to arbitrary hypergraphs (rather than graphs). Consider the following seed vector. Let $A\subseteq V$ be a cut, and define the vector $\mathbf{s}^{A,\bar{A}}\in\mathbb{R}^{V}$ to be

\mathbf{s}^{A,\bar{A}}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\boldsymbol{1}^{A}-\nicefrac{{\mu(A)}}{{\mu(\bar{A})}}\cdot\boldsymbol{1}^{\bar{A}}\,.

(26)

Assuming $\mu(A)\leq\mu(\bar{A})$ , note that $\langle\mathbf{s}^{A,\bar{A}},\boldsymbol{1}\rangle_{\mu}=0$ and $\lVert\mathbf{s}^{A,\bar{A}}\rVert_{\infty}\leq 1$ , so that $\mathbf{s}^{A,\bar{A}}$ is a valid seed vector. With this choice of seed vector, the ratio-cut improvement objective on a graph $G$ gives:

\min\left\{\Psi_{G,\mathbf{s}^{A,\bar{A}}}(S),\Psi_{G,\mathbf{s}^{A,\bar{A}}}(\bar{S})\right\}=\frac{\sum_{e\in E}w_{h}\cdot\delta_{e}(S)}{\lvert\mu(A\cap S)-\frac{\mu(A)}{\mu(\bar{A})}\cdot\mu(\bar{A}\cap S)\rvert},

which is exactly equal to the modified quotient cut score of cut $(S,\bar{S})$ as considered by Andersen and Lang.

Suppose we have an algorithm $\mathcal{A}$ that, on input $\alpha,$ either outputs a cut with a cut $S\subseteq V$ with $\Psi_{G,{\bf 1}^{A,B}}(S)\leq 2\alpha$ or an approximate dual solution of value $\alpha.$ Then, $\mathcal{A}$ can be used to construct an approximate primal-dual oracle by performing binary search on $\alpha$ and converting the final approximate dual solution to an approximate dual graph certificate via the flow decomposition of Theorem 6. Since the range of possible values for $\alpha$ is $[1/\mu(V),\sum_{h\in E}w_{h}]$ , which we assume is polynomial in $|V|,$ it suffices to run $\mathcal{A}$ for $O(\log|V|)$ iterations. Hence, in the rest of the section, we focus on implementing $\mathcal{A}$ for both cases in the theorems.

For a fixed $\alpha\geq 0,$ we consider the integral version of the feasibility problem for (CI-Primal):

\exists S\subseteq V:\sum_{h\in E}w_{h}\delta_{h}(S)-\alpha\cdot\langle{\bf 1}_{S},{\bf 1}^{A,B}\rangle_{\bm{\mu}}\leq 0.

The left-hand side of this equation can be minimized exactly by the proximal algorithms for decomposable submodular minimization (DSM) of Ene et al. [26, 27] when run with $\epsilon=\frac{1}{n}$ and rounded to integral solutions. If the resulting minimum is less than $0$ , the minimizer $S$ must have $\Psi_{G,{\bf 1}^{A,B}}(S)\leq\alpha,$ as required. Otherwhise, the DSM algorithm produces a matching dual solution, which takes exactly the form of a feasible solution to (CI-Dual) with value $\alpha$ . Applying the best result of Ene et al. , achieved by the accelerated coordinated descent method under the assumption that a quadratic optimization oracle is given for each cut function $\delta_{h},$ the running time to implement an approximate primal-dual oracle becomes $\tilde{O}\big{(}|V|\cdot\sum_{h\in E}\Theta_{h}\big{)},$ where $\Theta_{h}$ is the running time for a quadratic optimization oracle for $\delta_{h}$ .

As we have shown in Section 4, for directed and standard hypergraph cut functions, the congestion constraint in the hyperflow problem CI-Dual corresponds to vertex capacities on the hyperedge nodes of the factor graph $\tilde{G}.$ To implement the primitive $\mathcal{A}$ described above for a fixed $\alpha$ , we construct a network flow problem from this vertex-capacitated factor graph by adding an auxiliary source vertex $s$ and an auxiliary vertex $t$ . The vertex $s$ is connected to every vertex in $i\in A$ with an arc of capacity $\alpha\mu_{i}$ , while the vertex $t$ is connected to every vertex $j\in B$ with an arc of capacity $\alpha\cdot\mu_{j}\cdot\nicefrac{{\mu(A)}}{{\mu(B)}}.$ It is easy to see that the capacity of a cut of $V$ in this new network is, for every $S\subseteq V$ :

\operatorname{cap}(\{s\}\cup S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{h\in E}w_{h}\delta_{h}(S)-\alpha\langle{\bf 1}_{S},{\bf 1}^{A,B}\rangle_{\mu}+\alpha\cdot\mu(A)

(27)

Now, apply a $\frac{1}{2}$ -approximate maximum flow solver to this network flow problem to obtain a flow a value $F$ and an $s$ - $t$ cut $C$ of capacity $2F.$ We have two cases:

1.

If $F\geq\frac{\alpha}{2}\cdot\mu(A),$ then the flow routed is an approximate dual solution of value $\alpha$ , as required.
2.

If $F<\frac{\alpha}{2}\cdot\mu(A),$ the $s$ - $t$ cut has capacity strictly less then $\alpha\cdot\mu(A).$ As a result, we have by Equation 27:

$\sum_{h\in E}w_{h}\delta_{h}(S)<\alpha\langle{\bf 1}_{S},{\bf 1}^{A,B}\rangle_{\mu}.$

This implies that $\Psi_{G,{\bf 1}^{A,B}}(S)\leq\alpha\leq 2\alpha,$ are required for $\mathcal{A}.$

The almost-linear time result follows by applying the almost-linear time algorithm of Chen et al. [20] as our approximate maximum flow solver. For the standard hypergraph cut function, as the edges in the factor graph are undirected, it suffices to use the almost-linear time solver of Bernstein et al. [13] for undirected vertex-capacitated graphs.

See 11

First, we check that the dual graph certificates $D_{t}$ output by $\mathcal{A}_{ci}$ conform to the definition of an approximate matching response. Then, the bipartiteness of $D_{t}$ across $(A_{t},B_{t})$ follows from requirement 3 in Definition 12, while the degree and largeness properties follow from requirements 4 and 5 in the same definition. In the symmetric case, we may assume each $D_{t}$ is undirected.

In the following, let $\alpha_{t}$ the value of $\alpha$ achieved by the $t$ -invocation of $\mathcal{A}_{ci}$ and

\alpha^{*}=\min\{\alpha_{t}\}_{t=1}^{g(n)}.

By Lemma 6 and the definition of approximate primal-dual oracle and, we have that

\min\bigg{\{}\Psi_{G}(C_{t})\bigg{\}}_{t=1}^{g(n)}\leq\min\bigg{\{}\Psi_{G,{\bf 1}^{A_{t},B_{t}}}(C_{t})\bigg{\}}_{t=1}^{g(n)}\leq 3\cdot\min\bigg{\{}\alpha_{t}\bigg{\}}_{t=1}^{g(n)}=3\cdot\alpha^{*}.

By property 1 in Definition 12, we have that $\alpha_{t}D_{t}$ embeds in $G$ with congestion $1$ for all $t.$ Averaging this guarantee over all $t$ , we have:

\frac{\alpha^{*}}{g(n)}\cdot H_{g(n)}\preceq_{1}\frac{1}{g(n)}\cdot\sum_{t=1}^{g(n)}\alpha_{t}D_{t}\preceq_{1}G

Applying Theorem 5 and the definition of good strategy, we obtain the following lower bound on cuts of $G$ , for all $S\subseteq V:$

\alpha^{*}\cdot\frac{f(n)}{g(n)}\leq\frac{\alpha^{*}}{g(n)}\cdot\Psi_{H_{g(n)}}(S)\leq 2\cdot\Psi_{G}(S)

In particular, this applies to the minimum-ratio cut in $G$ , so that we have:

\min\bigg{\{}\Psi_{G}(C_{t})\bigg{\}}_{t=1}^{g(n)}\leq 3\cdot\alpha^{*}\leq 6\cdot\frac{g(n)}{f(n)}\cdot\Psi_{G}^{*}.

This completes the proof. ∎

$\displaystyle{\max}$

$\displaystyle\alpha$

$\displaystyle\operatorname{dem}(\mathbf{Y})=\alpha\cdot\big{(}\bm{\mu}\circ\mathbf{s}\big{)}$

$\displaystyle\operatorname{cong}_{G}(\mathbf{Y})\leq 1$

$\displaystyle\alpha\in\mathbb{R},\mathbf{Y}=\{\mathbf{y}_{h}\in\mathbb{R}^{h}\perp{\bf 1}_{h}\}_{h\in E}$