This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Submodular Hypergraph Partitioning: Metric Relaxations and Fast Algorithms via an Improved Cut-Matching Game

Antares Chen
[email protected]
University of Chicago
   Lorenzo Orecchia
[email protected]
University of Chicago
   Erasmo Tani
[email protected]
University of Chicago
Abstract

Despite there being significant work on developing spectral [16, 39, 38], and metric embedding [48] based approximation algorithms for hypergraph generalizations of conductance, little is known regarding the approximability of hypergraph partitioning objectives beyond this.

This work proposes algorithms for a general model of hypergraph partitioning that unifies both undirected and directed versions of many well-studied partitioning objectives. The first contribution of this paper introduces polymatroidal cut functions, a large class of cut functions amenable to approximation algorithms via metric embeddings and routing multicommodity flows. We demonstrate an O(logn)O(\sqrt{\log n})-approximation, where nn is the number of vertices in the hypergraph, for these problems by rounding relaxations to metrics of negative-type.

The second contribution of this paper generalizes the cut-matching game framework of Khandekar et al. [35] to tackle polymatroidal cut functions. This yields the first almost-linear time O(logn)O(\log n)-approximation algorithm for standard versions of undirected and directed hypergraph partitioning [38]. A technical consequence of our construction is that a cut-matching game which greatly relaxes the set of allowed actions for both players can be used to partition hypergraphs with negligible impact on the approximation ratio. We believe this to be of independent interest.

1 Introduction

In the past 20 years, increasing complexity in real-world data has necessitated generalizing graph models to represent higher order relations [3, 15]. Hypergraphs have served as the quintessential object to model relations involving multiple objects, and, as a consequence, the study of hypergraph algorithms has featured prominently in recent practical [12, 57] and theoretical developments [16, 17, 47, 48].

Hypergraph partitioning is a fundamental algorithmic problem in this broader landscape. From an unsupervised machine learning perspective, hypergraph partitioning enables detecting significant clusters in complex higher-order networks, such as social [57, 61] or biological networks [28, 37]. From a theoretical perspective, partitioning algorithms are a fundamental primitive to be exploited in the development of divide-and-conquer methods for other hypergraphs problems. In contrast to graphs, where there is only one way for a partition to intersect an edge, hypergraphs admit multiple definitions for the cost of cutting a hyperedge. This yields a broad spectrum partitioning objectives, and allows the end-user to better design partitioning problems that fit specific applications. While there has been significant work on developing spectral approximation algorithms for hypergraph generalizations of conductance [16, 39, 38], with few exceptions [48], little is known about the approximability of large classes of hypergraph partitioning objectives beyond this. Even less is known regarding fast algorithms for these tasks.

This paper proposes a general model for partitioning hypergraphs. The first half of this paper identifies a large class of hypergraph partitioning objectives, which we call polymatroidal cut functions. This class is a subset of submodular cut functions previously considered by [43, 44] and captures both undirected and directed versions of almost all partitioning measures found in prior theoretical work. We show that these objectives possess properties that imply they are particularly well-suited for approximation methods based on metric embeddings and routing multi-commodity flows. This leads us to polynomial time O(logn)O(\sqrt{\log n})-approximations for these objectives based on rounding 22\ell_{2}^{2}-metric relaxations. The second half of this paper constructs fast algorithms that achieve similar O(logn)O(\log n)-approximations by generalizing the cut-matching framework of Khandekar et al.  [35]. This produces the first almost-linear time polylogarithmic approximation for both undirected and directed hypergraph conductance.

Let us now define concepts relevant to hypergraph partitioning. A hypergraph is a tuple G=(V,E)G=(V,E), where E2VE\subseteq 2^{V} is a set of hyperedges. The rank of a hyperedge hEh\in E is its cardinality |h|\lvert h\rvert. If every hEh\in E satisfies |h|=k\lvert h\rvert=k, the hypergraph GG is called kk-uniform. Hence, an undirected graph is a 22-uniform hypergraph. A key step in formalizing hypergraph cut problems is to define a suitable cut function δh:2h[0,1]\delta_{h}:2^{h}\to[0,1] on each hyperedge hEh\in E. This expresses the cost incurred when hh is cut by (A,A¯)(A,\bar{A}) into AhA\cap h and A¯h\bar{A}\cap h. Central to our work are submodular cut functions, defined by [43, 44]:

Definition 1 (Submodular Cut Function).

A function δh:2h[0,1]\delta_{h}:2^{h}\to[0,1] is a submodular cut function if it is submodular and satisfies δh()=δh(h)=0\delta_{h}(\varnothing)=\delta_{h}(h)=0.

The space of cut functions for an edge {i,j}\{i,j\} is spanned by the undirected edge cut function δ{i,j}(S)=|𝟏iS𝟏jS|\delta_{\{i,j\}}(S)=\lvert{\bf 1}^{S}_{i}-{\bf 1}^{S}_{j}\rvert and the directed edge cut function δ(i,j)(S)=(𝟏iS𝟏jS)+\delta_{(i,j)}(S)=({\bf 1}^{S}_{i}-{\bf 1}^{S}_{j})_{+}, giving a complete classification of cut functions for rank-22 hyperedges. Higher-rank hyperedges support broader choices of cut functions. The most common example is the standard hypergraph cut function [16, 47]: δh𝖼𝗎𝗍(S)=defmin{1,|S|,|hS|}\delta^{\mathsf{cut}}_{h}(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min\{1,\lvert S\rvert,\lvert h\setminus S\rvert\}, which takes value 11 if hh is cut by SS and 0 otherwise. If the hypergraph G=(V,E)G=(V,E) is associated with positive hyperedge weights 𝐰>0E\mathbf{w}\in\mathbb{Z}^{E}_{>0}, we can succintly write the cut function for the entire graph as δG(S)=defhEwhδh(S)\delta_{G}(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{h\in E}w_{h}\cdot\delta_{h}(S).

With these definitions, we are interested in minimizing the submodular hypergraph ratio-cut objective.

Definition 2 (Submodular Hypergraph Ratio-Cut).

Given a hypergraph G=(V,E)G=(V,E) with positive hyperedge weights 𝐰>0E\mathbf{w}\in\mathbb{Z}^{E}_{>0}, non-negative vertex weights 𝝁0V\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}, and a collection of submodular cut functions {δh}hE\{\delta_{h}\}_{h\in E}, the ratio-cut objective on GG is defined for all SVS\subseteq V as:

ΨG(S)=defδG(S)min{μ(S),μ(S¯)}=hEwhδh(Sh)min{μ(S),μ(S¯)}.\Psi_{G}(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{\delta_{G}(S)}{\min\{\mu(S),\mu(\bar{S})\}}=\frac{\sum_{h\in E}w_{h}\cdot\delta_{h}(S\cap h)}{\min\{\mu(S),\mu(\bar{S})\}}\,.

The ratio-cut of GG is ΨG=defminSVΨG(S)\Psi^{*}_{G}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min_{S\subset V}\Psi_{G}(S).

When specialized to graphs111In the rest of the paper, for any undirected (resp. directed) graphs GG, unless otherwise specified, we assume that δG\delta_{G} and ΨG\Psi_{G} are formed using the undirected (resp. directed) cut functions. ratio-cut objectives include both graph expansion (𝝁=𝟏\bm{\mu}=\boldsymbol{1}), graph conductance (iV,μi=hE:ihwh\forall i\in V,\mu_{i}=\sum_{h\in E:i\in h}w_{h}). It also captures their directed counterparts, by replacing the undirected cut function δ{i,j}\delta_{\{i,j\}} with its directed analogue δ(i,j)\delta_{(i,j)}.

Previous Work on Submodular Hypergraph Partitioning

The standard hypergraph partitioning problem was first studied in the contexts of parallel numerical algorithms [15] and scientific computing [25].

Since then, many high-quality approximation algorithms have been developed for approximating hypergraph expansion and conductance. Louis and Makarychev [48] gave the first randomized polynomial-time O(logn)O\big{(}\sqrt{\log n}\big{)}-approximation algorithm for hypergraph expansion. This matches the best known approximation for the graph expansion [10], and uses the same relaxation to metrics of negative-type. Louis [47] and Chan et al.  [16] developed a spectral approach towards approximation hypergraph conductance, achieving Cheeger-like guarantees. Their algorithms require solving semidefinite programs (SDP) to approximate the hypergraph spectral gap, returning a partition with conductance at most O(ΨGlogmaxhE|h|)O\big{(}\sqrt{\Psi^{*}_{G}\log\max_{h\in E}|h|}\big{)}. Their results are known to be tight under the small-set expansion conjecture. In more recent work, Lau et al. used a similar approach to obtain a O(ΨGlog(1/ΨG)O(\sqrt{\Psi^{*}_{G}\log(1/\Psi^{*}_{G}}) guarantee for directed graph conductance [39].

Much less is known about the approximability of hypergraph ratio-cut objectives for general submodular cut functions. Following the work of Yoshida [62], Li and Milenkovic [43, 44] proposed submodular cut functions. For submodular hypergraph conductance, rounding an SDP relaxation yields algorithms with conductance guarantees O(ΨGlog|V|)O\big{(}\sqrt{\Psi^{*}_{G}\cdot\log|V|}\big{)} [62] and O(ΨGmaxhE|h|)O\big{(}\sqrt{\Psi^{*}_{G}\cdot\max_{h\in E}|h|}\big{)} [44]. Li and Milenkovic [44] further conjecture that improving the dependence on hypergraph rank is 𝖭𝖯\mathsf{NP}-hard. The work of Yoshida [62] and the survey by Veldt et al.  [60] provide excellent discussions of submodular cut functions and their applications.

Previous Work on Cut-Matching Games

Algorithms for the above settings require solving SDP relaxations over |V|×|V|\lvert V\rvert\times\lvert V\rvert symmetric matrices, with at least |E|\lvert E\rvert constraints. No faster algorithms are currently known beyond generic SDP solvers [32], which run in Ω(|V|2|E|)\Omega\big{(}\lvert V\rvert^{2}\lvert E\rvert\big{)} time. Even for the simplest form of standard hypergraph partitioning, the resulting SDP relaxation is a mixed packing-and-covering program for which no almost-linear time algorithm is currently known [31].

In order to develop fast approximation algorithms, one then considers primal-dual methods that solve the dual to metric relaxations for ratio-cut problems. This poses an immediate issue; the dual to these relaxations are given by multi-commodity flow problems whose demands are encoded by a 𝝁\bm{\mu}-weighted complete graph. Such problems may require quadratic time to solve [8] due to the large number of demands. The cut-matching game [36] was originally a method of approximately reducing this multi-commodity flow to a polylogarithmic number of exact single-commodity flow computations. Each single-commodity flow problem would route a perfect matching in the graph, until the union of matchings approximates the desired demand graph.

Through the cut-matching game, Khandekar et al.  [36] obtained a O(log2|V|)O(\log^{2}|V|)-approximation to undirected graph expansion by exactly solving O(log3|V|)O(\log^{3}|V|) maximum flow problems. Later, Orecchia et al.  [54] reduced both of these measures by O(logn)O(\log n). Subsequent works have vastly generalized the applicability of the cut-matching game: Louis [46] extends [36] to approximate directed graph expansion, Long and Saranurak extend [35] to approximate vertex and hypergraph expansion, and [49] demonstrate how to use the cut-matching game using approximate single-commodity flow solves. The cut-matching game has since become ubiquitous in designing fast deterministic algorithms for various static, and dynamic graph problems [13, 21, 22, 23].

Our Contributions

In this paper, we achieve the following.

  • We identify the class of polymatroidal cut functions, a subset of submodular cut functions amenable to metric and flow techniques. For this class, we develop notions of hypergraph flows and flow-embeddings of graphs into hypergraphs which certify lower bounds to the ratio-cut objective.

  • Using metric embedding techniques, we give polynomial-time O(logn)O(\sqrt{\log n})-approximation algorithms for the minimum ratio-cut problem over submodular hypergraphs with polymatroidal cut functions. Our algorithms generalize the seminal result of Arora, Rao and Vazirani [8] for sparsest cut to the largest class of problems to date.

  • We extend the cut-matching game framework to approximate minimum ratio-cuts on submodular hypergraphs with polymatroidal cut functions. Our approach is based on approximating a continuous non-convex formulation of the minimum ratio-cut via a family of “local” convex submodular minimization problems. We then use the cut-matching framework to “boost” solutions to problems in this family, producing an O(logn)O(\log n)-approximation algorithm for minimum ratio-cut that only requires a polylogarithmic number of approximate submodular minimization solves. This amounts to the first almost-linear time polylogarithmic approximation for both undirected, and directed hypergraph conductance.

A technical byproduct of our cut-matching game is that it allows for a cut player to play arbitrary disjoint vertex sets, and requires only a single O(1)O(1)-approximate submodular minimization solve per round. This may impact the deployment of cut-matching games in practice by lowering the required precision for the current running-time bottleneck of algorithms based on this framework [52, 58] as previous applications required either a single on(1)o_{n}(1)-approximate solves, or a polylogarithmic number of O(1)O(1)-approximate solves.

Our cut-matching game additionally avoids combinatorial restrictions found in previous applications of the framework. For example, the submodular minimization problem present in approximating hypergraph conductance is equivalent to a single-commodity flow whose demand graph is neither a perfect matching in the undirected case, nor Eulerian in the directed case. This may simplify the analyses of down-stream applications of the cut-matching game such as those found in [13, 21, 22, 23, 45, 49].

Concurrent Work

Subsequent to initial versions of this paper [6], Veldt [58] provided a generalization of the cut-matching framework to approximate minimum ratio-cut problems specified by cardinality-based symmetric submodular cut functions, a subset of polymatroidal cut functions. Their approach utilizes exact maximum flow solves to compute hypergraph cut preservers, a gadget which lower bounds the ratio-cut objective only when the cut function is cardinality-based [60, 59]. Unlike this work, [58] provides compelling empirical evaluations which should certainly be considered.

Independent of this work, Lau et al. [40] give an almost-linear time O(logn)O(\sqrt{\log n})-approximation for standard undirected and directed hypergraph cut functions. Their algorithm solves the dual to a relaxation of hypergraph conductance given by adding 22\ell_{2}^{2}-metric constraints to a minimum reweighted eigenvalue problem [39]. This yields a multi-commodity flow problem, which they demonstrate to be approximable using a sequence of single-commodity flows via an interesting generalization of Sherman’s algorithmic chaining [55]. Their algorithm is not based on the cut-matching game, and require similar combinatorial restrictions when routing the single-commodity flow as found in previous works.

Paper Organization

The paper is organized as follows:

  • Section 2 provides a technical overview of the results, and proof techniques found in this paper.

  • Section 3 outlines preliminary tools and notation.

  • Section 4.1 introduces polymatroidal cut functions, and provides key examples of functions in this class.

  • Sections 4.2 and 4.3 introduce hypergraph flow, and hypergraph flow embeddings. These serve as fundamental objects in constructing certificates of approximate optimality in the cut-matching game.

  • Section 5 details our O(logn)O(\sqrt{\log n})-approximation for minimum ratio-cut given polymatroidal cut functions, describing the 22\ell_{2}^{2}-metric relaxation, and embedding results required to round an integral solution.

  • Section 6 describes our non-convex approach to the minimum ratio-cut problem and its connection to the cut-matching game.

  • Section 7 describes our cut-matching game framework and uses it to construct an O(logn)O(\log n)-approximation algorithm to the minimum ratio-cut problem with polymatroidal cut functions.

The proof of several technical statements encountered throughout the paper are postponed to the appendix.

2 Our Results

2.1 The Metric Approach to Submodular Hypergraph Ratio-Cut

The best polynomial-time approximation algorithms for graph ratio-cut problems are obtained by combining metric relaxations, either to general metrics [42] or 22\ell_{2}^{2}-metrics [10], with metric embedding results for rounding. They achieve respectively an O(logn)O(\log n) and a O(logn)O(\sqrt{\log n})-approximation ratio. We study the applicability of this approach to the submodular hypergraph ratio-cut problem. At the outset, we remark that we should not hope to obtain polylogarithmic approximation for general submodular cut functions without further assumptions, as Svitkina and Fisher (Section 3 in [56]) exhibit a submodular cut function for which any o(n)o(\sqrt{n})-approximation requires exponentially many queries to a value oracle.

Given this negative result, it is necessary to restrict the problem to a smaller class of cut functions. At a high level, for a metric relaxation to work, the cut function must exhibit some kind of monotonicity, so that its value is preserved under low-distortion metric embeddings. However, a cut function cannot be monotone without being equal to 0, so the monotone structure must appear in some other way. To solve this issue, we introduce the class of polymatroidal cut functions, which are the infimal symmetrization of monotone non-decreasing submodular functions (see below). We believe that this new class captures most cut functions that are amenable to approximation via metric relaxation. As partial evidence to this statement, we show in Section 4 that the fractional dual objects associated with polymatroidal cut functions are closely related to polymatroidal flows, a well-studied generalization of network flows with many of the same favorable flow-cut gap properties [19].

Polymatroidal Cut Functions and Metric Embeddings

Our first contribution is to define polymatroidal cut functions, a subclass of submodular cut functions given by the infimal convolutions of non-decreasing submodular functions.

Definition 3 (Polymatroidal Cut Function).

A cut function δh:2h0\delta_{h}:2^{h}\to\mathbb{R}_{\geq 0} is polymatroidal if, for all ShS\subseteq h, it can be expressed as

δh(S)\displaystyle\delta_{h}(S) =min{Fh(S),Fh+(hS)},\displaystyle=\min\big{\{}F^{-}_{h}(S),\;F^{+}_{h}(h\setminus S)\big{\}},

where the associated functions Fh,Fh+:2hF_{h}^{-},F^{+}_{h}:2^{h}\rightarrow\mathbb{R} are non-decreasing submodular functions such that Fh()=Fh+()=0.F^{-}_{h}(\varnothing)=F^{+}_{h}(\varnothing)=0. When the associated functions FhF^{-}_{h} and Fh+F^{+}_{h} are identical, we refer to the cut function δh\delta_{h} as symmetric polymatroidal.

Note that δh\delta_{h} penalizes a cut (A,A¯)(A,\overline{A}) intersecting hh in a directed manner, with FF^{-} penalizing AhA\cap h and F+F^{+} penalizing A¯h\overline{A}\cap h. For a symmetric polymatroidal cut function δh\delta_{h}, such penalty is the same, so that δh(Ah)=δh(A¯h)\delta_{h}(A\cap h)=\delta_{h}(\overline{A}\cap h), generalizing the cut function of undirected graphs and hypergraphs. We show in Section 4 that Definition 3 captures most of the cut functions proposed in previous work, including all directed and undirected standard hypergraph cut functions. Moreover, our setup also captures possible combinations of all these types of cut functions into a single framework. We highlight two additional examples that are relevant in applications:

  • FhF_{h} is a cardinality-based non-decreasing submodular function [59], i.e., a non-decreasing concave function of the cardinality. This is useful when we wish to modify the star-expansion cut function by diminishing the penalty on balanced partitions of h,h, e.g., by taking δh(S)=min{|S|p,|hS|p}\delta_{h}(S)=\min\big{\{}|S|^{p},|h\setminus S|^{p}\big{\}} for p(0,1).p\in(0,1).

  • FhF_{h} is the entropy of a subset of random variables associated with vertices of hh. When the hypergraph GG is the factor graph of an undirected graphical model, this yields a variant of the minimum information partition [30, 50].

In Section 5, we investigate the polynomial-time approximability of ratio-cut problems over submodular hypergraphs with polymatroidal cut functions. In particular, we prove the following result.

Theorem 1.

There exists a randomized polynomial-time O(log|V|)O\big{(}\sqrt{\log|V|}\big{)}-approximation algorithm for solving the minimum ratio-cut problem on weighted submodular hypergraphs equipped with polymatroidal cut functions.

This result extends the O(logn)O\big{(}\sqrt{\log n}\big{)}-approximation of Louis and Makarychev [48] for standard hypergraph sparsest cut to all ratio-cut problems for the more general class of polymatroidal functions. Our algorithm for Theorem 1 solves a novel 22\ell_{2}^{2}-metric relaxation of the ratio-cut problem for weighted submodular hypergraphs (see the program RC-Directed-Semimetric in Section 5). This relaxation crucially exploits the form of the Lovász extension of polymatroidal cut functions. To the best of our knowledge, our results constitutes the broadest known generalization of the seminal result of Arora, Rao and Vazirani for sparsest cut, capturing a wealth of partitioning objectives, including directed and undirected graph ratio-cut, vertex-based ratio cut and hypergraph ratio-cut within a single simple analysis. It also improves on the best known approximation ratio for many other objectives in this class, such as cardinality-based cut functions [59].

Polymatroidal Cut Functions and Hypergraph Flows

In the second part of the paper, we explore the efficient solution of the minimum submodular hypergraph ratio cut problem via flow methods based on the cut-matching game. We start by developing the notions of flow-cut duality and flow embeddings over hypergraphs. For this purpose, in Section 4.2, we define a novel notion of hypergraph flows for weighted submodular hypergraphs equipped with polymatroidal cut functions. Specifically, for a weighted submodular hypergraph G=(V,E,𝐰,𝝁)G=(V,E,\mathbf{w},\bm{\mu}), we show that the subgradients of polymatroidal cut functions behave like network flows over the factor graph, a graph analogue of the input submodular hypergraph that includes an auxiliary vertex for each hyperedge hEh\in E and an edge {i,h}\{i,h\} for each ih.i\in h. For each hyperedge hEh\in E, the monotone functions FhF_{h}^{-} and Fh+F_{h}^{+} define capacity constraints on the flows entering and leaving the corresponding auxiliary node. In this process, we highlight new connections between submodular hypergraphs and the polymatroidal flows described by Lawler and Martell [41] and extended to the multi-commodity flow setting by Chekuri et al.  [19]. The main algorithmic result in this part of the paper is a flow decomposition result for hypergraph flows (Theorem 6) which allows us to construct flow embeddings of directed and undirected graphs into hypergraphs while lower bounding the size of the hypergraph cut with the size of the corresponding graph cut. This will become a fundamental building block for our cut-matching game construction.

2.2 A Non-Convex Optimization Approach To Minimum Ratio-Cut

To generalize the cut-matching game framework to the setting of submodular hypergraphs with polymatroidal cut functions, we devise a novel optimization approach based on a continuous non-convex formulation (RC-NonCvx) of the submodular hypergraph ratio-cut problem. Given a weighted submodular hypergraph G=(V,E,𝐰,𝝁)G=(V,E,\mathbf{w},\bm{\mu}), and cut functions {δh}hE\{\delta_{h}\}_{h\in E} with Lovász extensions {δ¯h}hE\{\bar{\delta}_{h}\}_{h\in E}, we define (RC-NonCvx) as the following non-convex optimization problem over V\mathbb{R}^{V}:

Ψ¯G(𝐱)\displaystyle\bar{\Psi}_{G}(\mathbf{x}) =defhEwhδ¯h(𝐱)minu𝐱u𝟏𝝁,1\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})}{\min_{u}\|\mathbf{x}-u{\bf 1}\|_{\bm{\mu},1}}
Ψ¯G\displaystyle\bar{\Psi}_{G}^{*} =defmin𝐱VΨ¯G(𝐱)\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min_{\mathbf{x}\in\mathbb{R}^{V}}\bar{\Psi}_{G}(\mathbf{x}) (RC-NonCvx)

The fact that solving (RC-NonCvx) is equivalent to solving the minimum ratio-cut problem (expressed in the following lemma) is a direct consequence of the submodularity of the cut functions. It is proved in full in Appendix A.

Lemma 1.

Given a weighted hypergraph G=(V,E,𝛍,𝐰)G^{*}=(V,E,\bm{\mu},\mathbf{w}), we have ΨG=Ψ¯G\Psi_{G}^{*}=\bar{\Psi}_{G}^{*}. Furthermore, there exists an algorithm that, given any 𝐱V\mathbf{x}\in\mathbb{R}^{V}, recovers a cut SVS\subseteq V satisfying

ΨG(S)Ψ¯G(𝐱)\Psi_{G}(S)\leq\bar{\Psi}_{G}(\mathbf{x})

in time O(|V|log|V|+𝗌𝗉(G))O\big{(}|V|\log|V|+\mathsf{sp}(G)\big{)}, where the sparsity of GG is defined as 𝗌𝗉(G)=defhE|h|.\mathsf{sp}(G)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{h\in E}|h|.

The main idea in our approach is to address the non-convexity of the Ψ¯G\bar{\Psi}_{G} objective by replacing the non-concave denominator with a linear lower bound. In particular, we can exploit the dual characterization of the 1\ell_{1}-norm in the lower bound to write:

minu𝐱u𝟏1,𝝁=minumax𝐲1𝐲,𝐱u𝟏𝝁=max𝐲1𝐲,𝟏𝝁=0𝐲,𝐱𝝁\min_{u\in\mathbb{R}}\lVert\mathbf{x}-u\boldsymbol{1}\rVert_{1,\bm{\mu}}=\min_{u\in\mathbb{R}}\max_{\lVert\mathbf{y}\rVert_{\infty}\leq 1}\langle\mathbf{y},\mathbf{x}-u\boldsymbol{1}\rangle_{\bm{\mu}}=\max_{\begin{subarray}{c}\lVert\mathbf{y}\rVert_{\infty}\leq 1\\ \langle\mathbf{y},\boldsymbol{1}\rangle_{\bm{\mu}}=0\end{subarray}}\langle\mathbf{y},\mathbf{x}\rangle_{\bm{\mu}} (2)

This calculation directly leads us to define a novel localized, convex formulation of the (RC-NonCvx) problem for each vector 𝐬V\mathbf{s}\in\mathbb{R}^{V} with 𝐬1\|\mathbf{s}\|_{\infty}\leq 1 and 𝐬,𝟏=0,\langle\mathbf{s},{\bf 1}\rangle=0, which we call a seed:

Ψ¯G,𝐬(𝐱)\displaystyle\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x}) =defhEwhδ¯h(𝐱)max{0,𝐬,𝐱𝝁}\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})}{\max\{0,\langle\mathbf{s},\mathbf{x}\rangle_{\bm{\mu}}\}}
Ψ¯G,𝐬\displaystyle\bar{\Psi}_{G,\mathbf{s}}^{*} =defmin𝐱VΨ¯G,𝐬(𝐱)\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min_{\mathbf{x}\in\mathbb{R}^{V}}\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x}) (Local-RC)

Intuitively, the program (Local-RC) seeks a distribution over cuts 𝐱\mathbf{x} with small expected cut size and high correlation with the seed 𝐬.\mathbf{s}. In Section 6, we investigate the properties of this formulation, including its equivalence with an integral cut problem, the ratio-cut improvement problem (Definition 10 in Section 6), which generalizes the cut improvement problem of Andersen and Lang [7] to submodular hypergraphs. For polymatroidal cut functions, we show that the natural notion of a dual solution for the convex problem Local-RC is exactly a hypergraph flow with demand vector diag(𝝁)𝐬.\operatorname{diag}(\bm{\mu})\mathbf{s}. By applying our flow decomposition result, we can turn such a hypergraph flow into a dual graph certificate, a graph DD such that ΨG,𝐬δD(S)δG(S)\Psi^{*}_{G,\mathbf{s}}\cdot\delta_{D}(S)\leq\delta_{G}(S) for all SV.S\subseteq V. To construct these objects, we prove the following algorithmic result:

Theorem 2 (Informal. See Theorems 9 and 10).

An approximate primal-dual solution to (Local-RC)\eqref{eq.rc-local} can be computed by solving O(log|V|)O(\log|V|)-many decomposable submodular minimization problems for general polymatroidal cut functions or O(log|V|)O(\log|V|)-many 1/2\nicefrac{{1}}{{2}}-approximate maximum flow problems over a graph of size O(𝗌𝗉(G)).O(\mathsf{sp}(G)).

Finally, and crucially for our purposes, for all 𝐱V,\mathbf{x}\in\mathbb{R}^{V}, we have Ψ¯G(𝐱)Ψ¯G,𝐬(𝐱)\bar{\Psi}_{G}(\mathbf{x})\leq\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x}) by Equation 2. Hence, any (not necessarily optimal) solution 𝐱\mathbf{x} to (Local-RC) for a seed 𝐬\mathbf{s} yields a solution to (RC-NonCvx) of value at most Ψ¯G,𝐬(𝐱)\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x}) and, via Lemma 1, a cut SS with ΨG(S)Ψ¯G,𝐬(𝐱).\Psi_{G}(S)\leq\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x}). In other words, if we can somehow find a seed 𝐬\mathbf{s} such that Ψ¯G,𝐬αΨG\bar{\Psi}^{*}_{G,\mathbf{s}}\leq\alpha\Psi^{*}_{G}, then solving the (Local-RC) will yield an α\alpha-approximation algorithm for ΨG.\Psi^{*}_{G}.

This reduction may not seem very useful, as finding the required seed may be as hard as solving the original problem. However, we can now exploit the dual feedback obtained when solving (Local-RC) for a sequence of seeds 𝐒=(𝐬1,𝐬2,,𝐬T)\mathbf{S}=(\mathbf{s}_{1},\mathbf{s}_{2},\ldots,\mathbf{s}_{T}) to effectively guide our search for a good seed and boost our approximation ratio. In particular, suppose we can adaptively construct a sequence of TT seeds 𝐒\mathbf{S} such that the sum H=t=1TDtH=\sum_{t=1}^{T}D_{t} of the corresponding dual demand graphs DtD_{t} has large ratio-cut, i.e., ΨHβ.\Psi_{H}^{*}\geq\beta. Then, it is easy to show (see Theorem 11 and its proof) that we must have mint=1TΨG,𝐬tT/βΨG.\min_{t=1}^{T}\Psi^{*}_{G,\mathbf{s}_{t}}\leq\nicefrac{{T}}{{\beta}}\cdot\Psi^{*}_{G}. Hence, solving (Local-RC) for one of the seeds in the sequence 𝐒\mathbf{S} must yield a T/β\nicefrac{{T}}{{\beta}}-approximation to the minimum ratio-cut problem. It turns out that the problem of constructing such sequence 𝐒\mathbf{S} is precisely the task addressed by the cut-matching framework of Khandekar et al.  [36], which we generalize and strengthen in our next contribution.

2.3 An Improved Cut-Matching Game

We now describe our new definition of the cut-matching game. A key distinction between our setup and the previous lies in what constitutes admissible actions played by the cut and matching players.

Definition 4.

Let VV be a set of vertices with vertex weights 𝝁0V\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}. A cut action is a pair (A,B)(A,B) of non-empty disjoint subsets A,BVA,B\subseteq V such that 𝝁(A)𝝁(B).\bm{\mu}(A)\leq\bm{\mu}(B).

The cut player is no longer restricted to returning a partition of the vertex set, let alone a bisection. The set of valid actions for a matching player is similarly relaxed, as the edges of the response do not need to form a perfect matching. It is only required that enough of the available degree is routed across the cut action.

Definition 5.

An approximate matching response to a cut action (A,B)(A,B) is an weighted bipartite graph D=(AB,ED,𝐰D,𝝁)D=\big{(}A\cup B,E_{D},\mathbf{w}^{D},\bm{\mu}\big{)} satisfying the following conditions.

  1. 1.

    Bounded degree: for all iVi\in V

    degD(i){μiiA𝝁(A)𝝁(B)μiiB.\deg_{D}(i)\leq\begin{cases}\mu_{i}\;&i\in A\\ \frac{\bm{\mu}(A)}{\bm{\mu}(B)}\cdot\mu_{i}\;&i\in B\end{cases}\,.
  2. 2.

    Largeness: 𝐰D(E(A,B))𝝁(A)2\mathbf{w}^{D}\big{(}E(A,B)\big{)}\geq{\bm{\mu}(A)\over 2}.

We may emphasize that DD is a directed approximate matching response, if the edges of DD are directed from AA to B.B.

This definition is motivated by the fact that an approximate matching response naturally arises from the dual graph certificate obtained by approximately solving the (Local-RC) program above. The matching response will be undirected when we wish to approximate the minimum ratio-cut of a hypergraph with symmetric cut functions and directed in the general case of asymmetric cut functions. Let us now define the cut-matching game.

Definition 6.

Let n>0n>0, and 𝝁0V\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0} be a weighting over a set of vertices VV where |V|=n\lvert V\rvert=n. A (n,𝝁)(n,\bm{\mu})-generalized cut-matching game is a multi-round, two-player game between a cut player 𝒞\mathcal{C}, and a matching player \mathcal{M} that proceeds as follows. In the tt-th round of the game, following definitions 4 and 5:

  1. 1.

    𝒞\mathcal{C} plays a cut action (At,Bt)(A_{t},B_{t}),

  2. 2.

    \mathcal{M} responds with an approximate matching response DtD_{t} to (At,Bt)(A_{t},B_{t}).

The state graph after tt rounds is Ht=s=1tDsH_{t}=\sum_{s=1}^{t}D_{s} the edge-wise sum of the matching response seen thus far.

For comparison, the original cut-matching game of [36] is captured by the above definition when one specifically considers graph expansion 𝝁=𝟏\bm{\mu}=\boldsymbol{1}, restricts cut actions to be exact bisections (A,A¯)(A,\bar{A}) where |A|=|A¯|\lvert A\rvert=\lvert\bar{A}\rvert, and requires exact matching responses: 𝐰D(E(A,A¯))=𝝁(A)\mathbf{w}^{D}\big{(}E(A,\bar{A})\big{)}=\bm{\mu}(A).

To approximate minimum ratio-cut using our cut-matching games, we seek a cut player 𝒞\mathcal{C} that outputs a sequence of cuts which, in as few cuts as possible, forces the matching player \mathcal{M} to place edges that make the minimum ratio-cut objective for the state graph HTH_{T} large. We call a cut player good if it can achieve this.

Definition 7.

A cut strategy is a randomized algorithm 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}} that takes as input the current state graph HH, and outputs a cut action (A,B)(A,B). A cut strategy 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}} is (f(n),g(n))\big{(}f(n),g(n)\big{)}-good if a cut player 𝒞\mathcal{C} using 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}} at every iteration, achieves:

ΨHg(n)f(n)\Psi_{H_{g(n)}}\geq f(n)\,\,

with constant probability, for any (potentially adaptive) sequence of approximate matching responses D1,,Dg(n)D_{1},\ldots,D_{g(n)}.

Note that, when dealing with symmetric cut functions, HH will be undirected, so that ΨH\Psi_{H} refers to the undirected ratio-cut objective. If the cut functions are asymmetric, then ΨH\Psi_{H} is naturally replaced with the directed ratio-cut objective.

These definitions reveal the mechanism by which the cut-matching framework adaptively constructs a sequence of seeds to (Local-RC): the seed we choose at every iteration of cut-matching game is just a 𝝁\bm{\mu}-weighted indicator vector for the cut action of that round. A good strategy guarantees that, no matter what dual graph certificates we receive for each seed, at the end of the game at T=g(n)T=g(n) iterations, we can guarantee HTH_{T} has ratio-cut objective at least β=f(n)\beta=f(n). This shows that a good strategy yields an approximation ratio of O(g(n)/f(n))O(\nicefrac{{g(n)}}{{f(n)}}) for minimum ratio cut. In Section 7, we formalize this result as Theorem 11. Our main technical theorem (Theorem 12) regarding the new cut-matching game shows the existence of good strategies for the cut-player that yield O(logn)O(\log n)-approximation algorithms for the minimum ratio-cut problem over submodular hypergraphs, for both the cases of symmetric and general polymatroidal cut functions. The proof of this theorem is based on a new geometric result about separated sets in vector embeddings, which we discuss below. Together with Theorem 2, our results on good cut-player strategies directly imply the following results for the solution of the minimum ratio-cut problem. Their short proof appear in Appendix A.5.

Corollary 1.

There exists an O(logn)O(\log n)-approximation algorithm for the minimum ratio-cut problem over submodular hypergraphs with polymatroidal cut functions whose running time is dominated by the solution of a polylogarthmic number of decomposable submodular minimization problems over the hypergraph cut function.

Corollary 2.

There exists an O(logn)O(\log n)-approximation algorithm for the minimum ratio-cut problem over submodular hypergraphs equipped with the directed or standard hypergraph cut function whose running time is almost linear in the hypergraph sparsity.

Separated Sets

In our new cut-matching game, the cut player must pay considerable more care in choosing a cut action (A,B)(A,B) than in the original version, as the approximate nature of the matching response means that the matching player may easily keep some small subset of vertices disconnected from the rest of the graph. Suppose for instance that the cut player restricted itself to play μ\mu-bisections. Then, the matching player could select a small cut TT, with μ(T)1/10μ(V)\mu(T)\leq\nicefrac{{1}}{{10}}\cdot\mu(V) and never place any edge across the boundary of T.T. The only way the cut player can force progress on TT is to depart from playing bisections and play a cut action (A,B)(A,B) where AA is close or equal to T.T.

Let us see how this intuition is formalized in our design of cut-player strategies. As in the analysis of Orecchia et al.  [54], our cut strategy maintains at each iteration tt a spectral embedding for the current state graph HtH_{t}. We make progress if we can force the matching player to connect vertices that are far away in this embedding. We show that we can achieve this, even against an approximate matching player, by choosing a cut action coming from the following notion of separated sets:

Definition 8 (σ\sigma-separated sets).

Given an embedding {𝐯i}i[n]\{\mathbf{v}_{i}\}_{i\in[n]} and a measure 𝝁0n\bm{\mu}\in\mathbb{R}^{n}_{\geq 0} we say two sets S,T[n]S,T\subseteq[n] are σ\sigma-separated if:

𝝁(S)𝝁(T)miniSjT𝐯i𝐯j2σi,j(V2)μiμj𝐯i𝐯j2.\bm{\mu}(S)\cdot\bm{\mu}(T)\cdot\min_{\begin{subarray}{c}i\in S\\ j\in T\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}\geq\sigma\cdot\sum_{i,j\in\binom{V}{2}}\mu_{i}\mu_{j}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}.

This definition generalizes the idea of well-separated sets [10], which was applied only to large balanced sets, to possibly unbalanced sets, as long as SS and TT contribute a σ\sigma-fraction to the total variance of the embedding. In Section 8, we show how to efficiently find separated sets and prove the following result.

Theorem 3.

Given an embedding {𝐯i}i[n]d\{\mathbf{v}_{i}\}_{i\in[n]}\subseteq\mathbb{R}^{d} and a measure 𝛍0n\bm{\mu}\in\mathbb{Z}^{n}_{\geq 0}, with μ(V)=𝗉𝗈𝗅𝗒(n)\mu(V)=\mathsf{poly}(n), there exists a randomized algorithm running in time O~(nd)\tilde{O}(nd) that outputs σ\sigma-separated sets SS and TT for σ=Ω(1/logn)\sigma=\Omega(\nicefrac{{1}}{{\log n}}) with high probability.

3 Preliminaries

Hypergraphs

The sparsity 𝗌𝗉(G)\mathsf{sp}(G) of a hypergraph is defined as 𝗌𝗉(G)=defhE|h|.\mathsf{sp}(G)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{h\in E}|h|. Throughout the paper, we assume that the edge weights 𝐰\mathbf{w} and vertex weights 𝝁\bm{\mu} of our instance submodular hypergraph G=(V,E,𝐰,𝝁)G=(V,E,\mathbf{w},\bm{\mu}) are polynomials in |V|.|V|. In particular, we have 𝝁(V)𝗉𝗈𝗅𝗒(|V|).{\bm{\mu}}(V)\in\mathsf{poly}(|V|).

Linear Algebra Notation

Throughout this paper, we adopt the convention of representing scalars xx in unbolded lowercase, vectors 𝐱\mathbf{x} in boldface lowercase, and matrices 𝐗\mathbf{X} as boldface uppercase. Given vectors 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}, we will use 𝐱𝐲\mathbf{x}\circ\mathbf{y} to denote the Hadamard (entry-wise) product between 𝐱\mathbf{x} and 𝐲\mathbf{y}. We will use |𝐱|\lvert\mathbf{x}\rvert to denote the vector whose entries are the absolute value of the entries in 𝐱\mathbf{x}. We let 𝐱,𝐲\langle\mathbf{x},\mathbf{y}\rangle denote the standard Euclidean inner product, \lVert\cdot\rVert the Euclidean norm, and p\lVert\cdot\rVert_{p} the p\ell_{p}-norm.

We define an inner product between 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{R}^{n} weighted by a non-negative vector 𝝁0n\bm{\mu}\in\mathbb{R}^{n}_{\geq 0} as 𝐱,𝐲𝝁=defi=1nμixiyi.\langle\mathbf{x},\mathbf{y}\rangle_{\bm{\mu}}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{i=1}^{n}\mu_{i}\cdot x_{i}\cdot y_{i}\,. The quantity 𝐱𝝁=def𝐱,𝐱𝝁\lVert\mathbf{x}\rVert_{\bm{\mu}}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sqrt{\langle\mathbf{x},\mathbf{x}\rangle_{\bm{\mu}}} will denote the norm induced by the inner product ,𝝁\langle\cdot,\cdot\rangle_{\bm{\mu}}, while for any p>0p>0, we use 𝐱𝝁,p\lVert\mathbf{x}\rVert_{\bm{\mu},p} to denote the p\ell_{p}-norm weighted by 𝝁\bm{\mu} 𝐱𝝁,p=def(i=1nμi|xi|p)1/p.\lVert\mathbf{x}\rVert_{\bm{\mu},p}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\left(\sum_{i=1}^{n}\mu_{i}\cdot\lvert x_{i}\rvert^{p}\right)^{1/p}\,. Note consequently that 𝐱𝝁=𝐱𝝁,2\lVert\mathbf{x}\rVert_{\bm{\mu}}=\lVert\mathbf{x}\rVert_{\bm{\mu},2}. For 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}, we write 𝐱𝐲\mathbf{x}\perp\mathbf{y} to indicate the relation: 𝐱,𝐲=0\langle\mathbf{x},\mathbf{y}\rangle=0.

Given any vector 𝐱n\mathbf{x}\in\mathbb{R}^{n}, we define the positive and negative parts of 𝐱\mathbf{x} as the vectors 𝐱+,𝐱n\mathbf{x}_{+},\mathbf{x}_{-}\in\mathbb{R}^{n} with 𝐱+(i)=defmax{0,𝐱(i)}\mathbf{x}_{+}(i)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\max\{0,\mathbf{x}(i)\} and 𝐱(i)=defmax{0,𝐱(i)}.\mathbf{x}_{-}(i)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\max\{0,-\mathbf{x}(i)\}.

If 𝐗,𝐘n×n\mathbf{X},\mathbf{Y}\in\mathbb{R}^{n\times n}, then 𝐗,𝐘=i,j=1nXijYij\langle\mathbf{X},\mathbf{Y}\rangle=\sum_{i,j=1}^{n}X_{ij}Y_{ij} denotes the Frobenius inner product. We let 𝒮n×n\mathcal{S}^{n\times n} be the set of nn by nn symmetric matrices. Given 𝐗,𝐘𝒮n×n\mathbf{X},\mathbf{Y}\in\mathcal{S}^{n\times n} we denote by 𝐗0\mathbf{X}\succeq 0 the statement that 𝐗\mathbf{X} is positive semidefinite, and by 𝐗𝐘\mathbf{X}\preceq\mathbf{Y} the relation 0𝐘𝐗0\preceq\mathbf{Y}-\mathbf{X}. We use diag(𝐱)n×n\operatorname{diag}(\mathbf{x})\in\mathbb{R}^{n\times n} to denote the real matrix with the coordinates of 𝐱\mathbf{x} placed along its diagonal. We will reserve specifically the matrix 𝐌=defdiag(𝝁)\mathbf{M}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\operatorname{diag}(\bm{\mu}) for the non-negative vector 𝝁0n\bm{\mu}\in\mathbb{R}^{n}_{\geq 0}.

Given a finite set VV, a vector 𝐱V\mathbf{x}\in\mathbb{R}^{V}, and a subset SVS\subseteq V, we let 𝐱(S)=vSxv\mathbf{x}(S)=\sum_{v\in S}x_{v}. We also denote by 𝐱SV\mathbf{x}_{S}\in\mathbb{R}^{V}, the restriction of 𝐱\mathbf{x} to coordinates in SS. In this way, the vector 𝟏S\boldsymbol{1}_{S} denotes the {0,1}\{0,1\}-indicator of set SS.

Preliminaries on Graphs

Given a graph H=(V,E,𝐰)H=(V,E,\mathbf{w}), denote its combinatorial Laplacian as 𝐋(H)\mathbf{L}(H):

𝐋(H)=def𝐃(H)𝐀(H),\mathbf{L}(H)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\mathbf{D}(H)-\mathbf{A}(H),

If HH is directed, then we will associate with it a directed Laplacian 𝐋+\mathbf{L}_{+} given by

𝐋+(H)=(i,j)Ewij(𝐋ij+𝐄ii𝐄jj)\mathbf{L}_{+}(H)=\sum_{(i,j)\in E}w_{ij}\cdot\big{(}\mathbf{L}_{ij}+\mathbf{E}_{ii}-\mathbf{E}_{jj}\big{)}

where 𝐋ij\mathbf{L}_{ij} is the Laplacian of an undirected graph (V,{{i,j}})\big{(}V,\big{\{}\{i,j\}\big{\}}\big{)}, and 𝐄ii\mathbf{E}_{ii} is 11 on iiii-th entry and 0 everywhere else. We will denote H~\tilde{H} as the symmetrization of HH, i.e. the weighted undirected graph formed by taking the edge-wise sum 𝐀(G)+𝐀(G){\mathbf{A}(G)+\mathbf{A}(G)^{\top}} of HH and its reverse graph.

Factor Graphs

A hypergraph G=(V,EG,𝐰,𝝁)G=(V,E_{G},\mathbf{w},\bm{\mu}) is naturally associated with its factor graph, an unweighted graph G^=(VEG,E^G)\hat{G}=(V\cup E_{G},\hat{E}_{G}) that is bipartite between the variable vertices representing vertices iVi\in V, and factor vertices representing hyperedges hEGh\in E_{G}. For iVi\in V and hEGh\in E_{G}, the edge {i,h}\{i,h\} belongs to E^G\hat{E}_{G} if and only if ihi\in h, i.e.:

E^G={{i,h}V×EG:ih}.\hat{E}_{G}=\{\{i,h\}\in V\times E_{G}:i\in h\}.
Figure 1: A hypergraph (left) and its factor graph (right). In the factor graph image, the variable vertices are shown in solid black, while the factor vertices are shown as empty circles.
Submodular Functions

A set function F:2VF:2^{V}\rightarrow\mathbb{R} is submodular if for all A,BVA,B\subseteq V, we have:

F(AB)F(A)+F(B)F(AB).F(A\cup B)\leq F(A)+F(B)-F(A\cap B).

From the above, one can easily see that the set of submodular functions is closed under taking conic combinations. Following standard conventions, we will always assume that F()=0F(\varnothing)=0 unless specified otherwise.

Any submodular set function FF can be associated with a convex body (F)\mathcal{B}(F), known as its base polytope:

(F)=def{𝐱V:𝐱,𝟏=F(V)SV,𝐱,𝟏SF(S)}.\mathcal{B}(F)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}\mathbf{x}\in\mathbb{R}^{V}\,:\,\langle\mathbf{x},\boldsymbol{1}\rangle=F(V)\,\wedge\,\forall S\subseteq V,\,\langle\mathbf{x},\boldsymbol{1}_{S}\rangle\leq F(S)\big{\}}. (4)

When FF is monotone non-decreasing and F()=0,F(\emptyset)=0, it is convenient to employ the positive submodular polyhedron:

𝒫+(F)=def{𝐱0V:SV,𝐱,𝟏SF(S)}.\mathcal{P}_{+}(F)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}\mathbf{x}\in\mathbb{R}^{V}_{\geq 0}\,:\,\forall S\subseteq V,\,\langle\mathbf{x},\boldsymbol{1}_{S}\rangle\leq F(S)\big{\}}.

Crucially, a submodular set function FF can be extended to its convex closure, known as its Lovász extension and denoted by F¯:V.\bar{F}:\mathbb{R}^{V}\rightarrow\mathbb{R}. There are a number of equivalent characterizations of the Lovász extension and its connection to the base polytope \mathcal{B} and the positive polyhedron 𝒫+\mathcal{P}_{+}. Rather than listing them here, we will refer directly to the results in Chapter 3 and 4 of the excellent reference by Bach [11]. We make an exception for the following fact, which plays a crucial role in the analysis of our approximation algorithms based on metric embeddings. See Proposition 4.8 in [11] for a proof.

Fact 1.

Let F:2V0F:2^{V}\to\mathbb{R}_{\geq 0} be a monotone non-decreasing submodular function with F()=0F(\emptyset)=0. Its Lovász extension F¯\bar{F} is monotone, non-decreasing over V.\mathbb{R}^{V}.

Online Linear Optimization and Regret Minimization

Fix a convex set of actions 𝒳𝒮n×n\mathcal{X}\subseteq\mathcal{S}^{n\times n}. The setting of online linear optimization over symmetric matrices considers the following multi-round game: in each round tt, a player plays an action 𝐗t𝒳\mathbf{X}_{t}\in\mathcal{X}, receives a feedback matrix 𝐅t𝒮n×n\mathbf{F}_{t}\in\mathcal{S}^{n\times n}, and suffers a loss of 𝐅t,𝐗t\langle\mathbf{F}_{t},\mathbf{X}_{t}\rangle. The goal of the player is to play actions which minimize total loss incurred. In general, one cannot hope for a strategy that is competitive against an arbitrary, adaptive adversary. Thus, one instead seeks to minimize regret: the difference, up to a time horizon T>0T>0, between total loss and the loss suffered by the best fixed action in hindsight.

We will require a regret minimization strategy to produce a sequence of 𝐗t\mathbf{X}_{t} for this setting. Fortunately, many prior results [9, 5, 29, 34, 51, 53] have studied this, and a Matrix Multiplicative Weight Update algorithm works off-the-shelf.

Theorem 4 (Theorem 3.3.3 in [51]).

Fix T>0T>0, and a weight vector 𝛍0n\bm{\mu}\in\mathbb{Z}_{\geq 0}^{n}. Let 𝐅1,,𝐅T\mathbf{F}_{1},\ldots,\mathbf{F}_{T} be a sequence of feedback matrices such that there exists ρ>0\rho>0 where 𝐅t1ρ𝐌\mathbf{F}_{t}\succeq-\frac{1}{\rho}\cdot\mathbf{M} for every t>0t>0. For any step size 0<η<ρ0<\eta<\rho, define 𝐗t\mathbf{X}_{t} to be given by the matrix multiplicative weight update

𝐖t+1\displaystyle\mathbf{W}_{t+1} =MMWUη,𝝁(𝐅1,,𝐅t)\displaystyle=\textup{{MMWU}}_{\eta,\bm{\mu}}(\mathbf{F}_{1},\ldots,\mathbf{F}_{t}) (matrix-mwu)
=def𝐌1/2exp(ηk=1t𝐌1/2𝐅k𝐌1/2)𝐌1/2\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\mathbf{M}^{-1/2}\exp\bigg{(}-\eta\cdot\sum_{k=1}^{t}\mathbf{M}^{-1/2}\mathbf{F}_{k}\mathbf{M}^{-1/2}\bigg{)}\mathbf{M}^{-1/2}
𝐗t+1\displaystyle\mathbf{X}_{t+1} =𝐖t+1𝐌,𝐖t+1\displaystyle=\frac{\mathbf{W}_{t+1}}{\langle\mathbf{M},\mathbf{W}_{t+1}\rangle}

Then, the following inequalities hold for any 𝐔Δ𝛍n×n\mathbf{U}\in\Delta^{n\times n}_{\bm{\mu}}. We have:

1Tt=1T𝐅t,𝐔1Tt=1T𝐅t,𝐗tηTt=1T𝐌1/2𝐅t𝐌1/2𝗌𝗉𝖾𝖼2lognηT.\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{F}_{t},\mathbf{U}\rangle\geq\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{F}_{t},\mathbf{X}_{t}\rangle-\frac{\eta}{T}\sum_{t=1}^{T}\,\big{\lVert}\mathbf{M}^{-1/2}\mathbf{F}_{t}\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}^{2}-\frac{\log n}{\eta T}\,. (6)

Furthermore, if 𝐅t1ρ𝐌\mathbf{F}_{t}\succeq-\frac{1}{\rho}\cdot\mathbf{M} for every ρ>0\rho>0 (i.e. 𝐅t𝟎\mathbf{F}_{t}\succeq\mathbf{0}), and there exist ϵ>0\epsilon>0 such that 𝐅tϵ𝐌\mathbf{F}_{t}\preceq\epsilon\cdot\mathbf{M}, then:

1Tt=1T𝐅t,𝐔1ϵηTt=1T𝐅t,𝐗tlognηT.\frac{1}{T}\sum_{t=1}^{T}\big{\langle}\mathbf{F}_{t},\mathbf{U}\big{\rangle}\geq\frac{1-\epsilon\eta}{T}\cdot\sum_{t=1}^{T}\big{\langle}\mathbf{F}_{t},\mathbf{X}_{t}\big{\rangle}-\frac{\log n}{\eta T}\,. (7)

We remark that equation (6) is the standard regret bound for the Matrix Multiplicative Weight Update algorithm, while (7) is the regret bound proven using local norm convergence.

4 Polymatroidal Cut Functions and Hypergraph Flows

In this section, we establish the main properties of the polymatroidal cut functions in Definition 3. In the second part of the section, we introduce the dual notion of hypergraph flows as natural lower bounds to polymatroidal cut functions and highlight their connection with previously studied polymatroidal flows [19]. Finally, we define the notion of flow embedding of a graph into a hypergraph.

4.1 Properties and Examples

We start by verifying that any polymatroidal cut function is submodular. A proof of the following fact can be found in Appendix A:

Fact 2.

Let VV be an arbitrary finite set and F,G:2VF,G:2^{V}\to\mathbb{R} be monotone non-decreasing submodular functions. If δ:2V\delta:2^{V}\to\mathbb{R} is the function defined by:

δ(S)=defmin{F(S),G(VS)},\delta(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min\{F(S),G(V\setminus S)\},

for all SVS\subseteq V, then δ\delta is submodular.

Examples of Polymatroidal Cut Functions

We now show that the class of polymatroidal cut functions encompasses many popular cut objectives. Our first example is the directed edge cut function in graphs. This function is defined on a directed edge (i,j)(i,j) as:

δ(i,j)(S)={1if iS and jS¯,0otherwise.\delta_{(i,j)}(S)=\begin{cases}1&\text{if }i\in S\text{ and }j\in\overline{S},\\ 0&\text{otherwise}.\end{cases}

It is easy to check that this function is polymatroidal, since it can be rewritten as:

δ(i,j)(S)=min{|S{i}|,|(hS){j}|}\delta_{(i,j)}(S)=\min\{|S\cap\{i\}|,|(h\setminus S)\cap\{j\}|\}

for all ShS\subseteq h. Next, we consider the standard hyperedge cut function. Given a hyperedge hEh\in E, this function is defined as:

δh𝖼𝗎𝗍(S)=def{0if Sh{,h},1otherwise.\delta^{\mathsf{cut}}_{h}(S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\begin{cases}0&\text{if }S\cap h\in\{\emptyset,h\},\\ 1&\text{otherwise}.\end{cases}

Note that the undirected cut function in graphs is a special case of δ𝖼𝗎𝗍\delta^{\mathsf{cut}}. This function is symmetric polymatroidal, as we can let Fh(S)=min{1,|hS|}=Gh(S)F_{h}(S)=\min\{1,|h\cap S|\}=G_{h}(S) for any hEh\in E to obtain:

δh𝖼𝗎𝗍(S)=min{1,|hS|,|hS¯|},\delta_{h}^{\mathsf{cut}}(S)=\min\{1,|h\cap S|,|h\cap\overline{S}|\},

which is 11 if and only if SS intersects hh non-trivially, and zero otherwise.

Moreover, the class of directed hypergraph cut functions recently introduced by Lau et al. (Section 6.2 of [40]) to define directed hypergraph expansion is also a sub-class of polymatroidal cut functions, as we now show. Given a hyperedge hEh\in E and a partition of hh into a tail set ThT_{h} and a headset HhH_{h}, we can capture the directed cut function of Lau et al., by letting Fh(S)=min{1,|ThS|}F_{h}(S)=\min\{1,|T_{h}\cap S|\} and Gh(S¯)=min{1,|HhS¯|}G_{h}(\overline{S})=\min\{1,|H_{h}\cap\overline{S}|\}.

Finally, Veldt et al. [59] consider a class of cut functions which are cardinality-based and submodular. In Appendix B.2 of their paper, they argue that every such cut function δh\delta_{h} can be written as:

δh(S)=gh(min{|Sh|,|hS¯|}),\delta_{h}(S)=g_{h}(\min\{|S\cap h|,|h\cap\overline{S}|\}),

for some concave and monotonically increasing function ghg_{h}. Since ghg_{h} is monotonically increasing, the above gives:

δh(S)=min{gh(|Sh|),gh(|hS¯|)},\delta_{h}(S)=\min\{g_{h}(|S\cap h|),g_{h}(|h\cap\overline{S}|)\},

and hence all functions in this class are also examples of polymatroidal cut functions (note that the composition of a monotone, concave function with the cardinality function is always monotone and submodular).

Lovász Extensions of Polymatroidal Cut Functions

We next establish the form of the Lovász extension of a polymatroidal cut function. We will make heavy use of this form in building our metric relaxations and in rounding them via low-distortion metric embeddings in Section 5. The proof of the following fact is found in Appendix A:

Fact 3.

Given a polymatroidal cut function δh\delta_{h}, we have, for all 𝐱h\mathbf{x}\in\mathbb{R}^{h}:

δ¯h(𝐱)=minνF¯h((𝐱ν𝟏h)+)+F¯h+((𝐱ν𝟏h)).\overline{\delta}_{h}(\mathbf{x})=\min_{\nu\in\mathbb{R}}\bar{F}^{-}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{+})+\bar{F}^{+}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{-}).

It is instructive to consider how the previous results are instantiated for the two canonical examples of the directed edge cut function and the standard hypergraph cut function. For the former on an arc (i,j)(i,j), we have F¯(𝐱)=𝐱i\bar{F}^{-}(\mathbf{x})=\mathbf{x}_{i} and F¯+(𝐱)=𝐱j,\bar{F}^{+}(\mathbf{x})=\mathbf{x}_{j}, so that:

𝐱{i,j},δ¯(i,j)(𝐱)=minν(𝐱(i)ν)++(𝐱(j)ν)=(𝐱(i)𝐱(j))+.\forall\mathbf{x}\in\mathbb{R}^{\{i,j\}},\;\overline{\delta}_{(i,j)}(\mathbf{x})=\min_{\nu\in\mathbb{R}}(\mathbf{x}(i)-\nu)_{+}+(\mathbf{x}(j)-\nu)_{-}=(\mathbf{x}(i)-\mathbf{x}(j))_{+}.

For the latter, on a hyperedge hh:

𝐱h,δ¯h𝖼𝗎𝗍(𝐱)=minν(𝐱ν𝟏)++(𝐱ν𝟏)=.\forall\mathbf{x}\in\mathbb{R}^{h},\;\overline{\delta}^{\mathsf{cut}}_{h}(\mathbf{x})=\min_{\nu\in\mathbb{R}}\|(\mathbf{x}-\nu{\bf 1})_{+}\|_{\infty}+\|(\mathbf{x}-\nu{\bf 1}\|)_{-}\|_{\infty}=.

In the case of a symmetric polymatroidal cut function, the characterization of Fact 3 can take a more compact form, as shown in Appendix A:

Fact 4.

Given a symmetric polymatroidal cut function δh\delta_{h} with Fh=defFh+=FhF_{h}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}F^{+}_{h}=F^{-}_{h}, we have, for all 𝐱h\mathbf{x}\in\mathbb{R}^{h}:

minνF¯h(|𝐱ν𝟏h|)δ¯h(𝐱)2minνF¯h(|𝐱ν𝟏h|).\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|)\leq\overline{\delta}_{h}(\mathbf{x})\leq 2\cdot\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|).

4.2 Base Polytope, Hypergraph Flows and Polymatroidal Networks

It is natural to attempt to lower bound the cost of cuts in a submodular hypergraph G=(V,EG,𝐰,𝝁)G=(V,E_{G},\mathbf{w},\bm{\mu}) by considering collections of vectors 𝐘=(𝐲h(δh))hEG\mathbf{Y}=(\mathbf{y}_{h}\in\mathcal{B}(\delta_{h}))_{h\in E_{G}}, where each vector 𝐲h\mathbf{y}_{h} belongs to the base polytope of the polymatroidal cut function δh\delta_{h} under consideration. In this section, we interpret these vectors as hypergraph flows over GG and show that they take the form of polymatroidal flows [41, 19] over the factor graph G^\hat{G}.

Base Polytope of Polymatroidal Cut Function

We start by characterizing the base polytope (δh)\mathcal{B}(\delta_{h}) for a polymatroidal cut function δh.\delta_{h}. By the definition of base polytope in Equation 4.2, we have

(δh)={𝐲h:𝐲,𝟏h=0Sh,𝐲,𝟏Smin{Fh(S),Fh+(hS)}}.\mathcal{B}(\delta_{h})=\{\mathbf{y}\in\mathbb{R}^{h}:\langle\mathbf{y},{\bf 1}_{h}\rangle=0\,\wedge\,\forall S\subset h,\,\langle\mathbf{y},{\bf 1}_{S}\rangle\leq\min\{F_{h}^{-}(S),F_{h}^{+}(h\setminus S)\}\}.

The first constraint implies that for all Sh,S\subset h, we must have 𝐲,𝟏S=𝐲,𝟏hS.\langle\mathbf{y},{\bf 1}_{S}\rangle=-\langle\mathbf{y},{\bf 1}_{h\setminus S}\rangle. Hence, we can rewrite:

(δh)\displaystyle\mathcal{B}(\delta_{h}) ={𝐲h𝟏h:Sh,𝐲,𝟏SFh(S)𝐲,𝟏hSFh+(hS)}\displaystyle=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}:\forall S\subset h,\;\langle\mathbf{y},{\bf 1}_{S}\rangle\leq F_{h}^{-}(S)\,\wedge\,-\langle\mathbf{y},{\bf 1}_{h\setminus S}\rangle\leq F_{h}^{+}(h\setminus S)\}
={𝐲h𝟏h:Sh,Fh+(S)𝐲,𝟏SFh(S)}\displaystyle=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}:\forall S\subset h,\;-F_{h}^{+}(S)\leq\langle\mathbf{y},{\bf 1}_{S}\rangle\leq F_{h}^{-}(S)\} (8)
={𝐲h𝟏h:(𝐲)+𝒫+(Fh)(𝐲)𝒫+(Fh+)}.\displaystyle=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}:(\mathbf{y})_{+}\in\mathcal{P}_{+}(F^{-}_{h})\,\wedge\,(\mathbf{y})_{-}\in\mathcal{P}_{+}(F^{+}_{h})\}.

We will interpret an element 𝐲\mathbf{y} of (δh)\mathcal{B}(\delta_{h}) as a flow over the neighborhood of factor vertex hh in the factor graph G^\hat{G}, with 𝐲i>0\mathbf{y}_{i}>0 indicating flow going from ii to hh and 𝐲i<0\mathbf{y}_{i}<0 flow going from hh to ii. The constraint 𝐲𝟏h\mathbf{y}\perp{\bf 1}_{h} simply represents flow conservation at h.h. With this setup, for each set Sh,S\subset h, the function FhF_{h}^{-} (resp. Fh+F_{h}^{+}) constrains the amount of flow that can be routed from the set SS to hh (resp. to the set SS from hh).

Example: Base Polytope for Standard Hypergraph Cut Function

Consider the case of the standard hypergraph cut function δh𝖼𝗎𝗍(S)\delta^{\mathsf{cut}}_{h}(S) defined in the previous section. Then, it is easy to see that:

(δh𝖼𝗎𝗍)={𝐲h𝟏h𝐲12}.\mathcal{B}(\delta^{\mathsf{cut}}_{h})=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}\mid\lVert\mathbf{y}\rVert_{1}\leq{2}\}.

As discussed above, we can think of each entry 𝐲(i)\mathbf{y}(i) of 𝐲\mathbf{y} for ihi\in h, as the amount of flow flowing into hh from vertex ii, and the constraint 𝐲12\lVert\mathbf{y}\rVert_{1}\leq 2 enforces that the total amount of flow through the hyperedge hh is no more than 11. This is equivalent to a vertex capacity constraint on the factor vertex corresponding to the hyperedge hh in the factor graph G^\hat{G}.

Example: Base Polytope for Directed Hypergraph Cut Functions

Consider the class of directed hypergraph cut functions defined by Lau et al. in Section 6.2 of [40]. Each of these cut functions δh\delta_{h} has an associated head set HhhH_{h}\subseteq h and an associated tail set ThhT_{h}\subseteq h and δh(S)\delta_{h}(S) has value one if and only if the head set intersects HhSH_{h}\cap S\neq\emptyset and S¯\overline{S}\cap\emptyset\neq\emptyset. Their base polytope is then given by:

(δh)={𝐲h𝟏h𝐲(Hh)1iThHh:𝐲(i)0}.\mathcal{B}(\delta_{h})=\{\mathbf{y}\in\mathbb{R}^{h}\perp{\bf 1}_{h}\mid\mathbf{y}(H_{h})\leq 1\land\forall i\in T_{h}\setminus H_{h}:\mathbf{y}(i)\leq 0\}.

The notion of hypergraph flow, which we introduce next, formalizes the network flow interpretation for all hyperedges hEG.h\in E_{G}.

Hypergraph Flows

Given a weighted submodular hypergraph G=(V,E,𝐰>0E,𝝁0V)G=(V,E,\mathbf{w}\in\mathbb{Z}^{E}_{>0},\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}), a hyperedge flow over a hyperedge hEh\in E is a vector 𝐲hh𝟏h.\mathbf{y}_{h}\in\mathbb{R}^{h}\perp{\bf 1}_{h}. A hypergraph flow 𝐘hEh𝟏h\mathbf{Y}\in\bigoplus_{h\in E}\mathbb{R}^{h}\perp{\bf 1}_{h} is a direct sum of hyperedge flows 𝐘=(𝐲h)hE.\mathbf{Y}=(\mathbf{y}_{h})_{h\in E}. Analogous to the case for graph flows, we define a notion of congestion based on the maximum violation of the polymatroidal constraints defining each (δh)\mathcal{B}(\delta_{h}):

congG(𝐘)=min{ρ0:hEG,𝐲hρ𝐰h(δh)}.\operatorname{cong}_{G}(\mathbf{Y})=\min\{\rho\geq 0:\forall h\in E_{G},\,\mathbf{y}_{h}\in\rho\cdot\mathbf{w}_{h}\cdot\mathcal{B}(\delta_{h})\}. (9)

The demand vector dem(𝐘)V𝟏\operatorname{dem}(\mathbf{Y})\in\mathbb{R}^{V}\perp{\bf 1} of a hypergraph flow 𝐘=(𝐲h)hE\mathbf{Y}=(\mathbf{y}_{h})_{h\in E} is

demi(𝐘)=hE:hi𝐲h(i).\operatorname{dem}_{i}(\mathbf{Y})=\sum_{h\in E:h\ni i}\mathbf{y}_{h}(i). (10)

We will make use of the vector space structure inherited from 𝐘h𝟏h\mathbf{Y}\in\bigoplus\mathbb{R}^{h}\perp{\bf 1}_{h} to define linear combinations of hypergraph flows c𝐘1+𝐘2c\mathbf{Y}_{1}+\mathbf{Y}_{2} for any c.c\in\mathbb{R}. The demand vector also behaves linearly as dem(c𝐘1+𝐘2)=cdem(𝐘1)+dem(𝐘2).\operatorname{dem}(c\mathbf{Y}_{1}+\mathbf{Y}_{2})=c\cdot\operatorname{dem}(\mathbf{Y}_{1})+\operatorname{dem}(\mathbf{Y}_{2}).

Finally, notice that hypergraph flows over hyperedges of rank 22 are equivalent to graph flows as, for an edge {i,j}E\{i,j\}\in E, 𝐲ij(i)\mathbf{y}_{ij}(i) represents the flow from ii to jj, while 𝐲ij(j)=𝐲ij(j)\mathbf{y}_{ij}(j)=-\mathbf{y}_{ij}(j) is the flow from jj to ii.

Connection to Polymatroidal Flows

In the polymatroidal network flow model [19, 41], one has a directed (2-uniform) graph HH representing a flow network where the edges are not individually capacitated, but rather for every vertex vV(H)v\in V(H), there are monotone submodular functions ρ\rho^{-} and ρ+\rho^{+} constraining the amount of flow entering and exiting a vertex respectively.

Given an hypergraph flow 𝐘={𝐲h}hE\mathbf{Y}=\{\mathbf{y}_{h}\}_{h\in E} over GG, one can interpret it as a graph flow over the factor graph G^\hat{G}, where the flow from vertex ii to factor vertex hh over the edge {i,h}\{i,h\} is given by 𝐲h(i)\mathbf{y}_{h}(i). When the congestion (as defined in (9)) is required to be at most one, the constraints imposed on this flow on G^\hat{G} can be cast as constraints defining a flow in a polymatroidal network supported on G^\hat{G}. Here, each constraint on the hyperedge flow for hyperedge hE(G)h\in E(G) defined by its base polytope, can be encoded as polymatroidal constraints on the graph flow into and out of the vertex hh in G^\hat{G}.

Despite this connection with polymatroidal network, we remark that, to the best of our knowledge, the hypergraph partitioning problems studied in this paper cannot be expressed in the setup proposed by Chekuri [19] for defining sparsest cut problems over polymatroidal networks.

4.3 Hypergraph Flow Embeddings

To construct efficient flow-based algorithms for the minimum ratio-cut problem in the second part of this paper, we will need a way to generalize the notion of flow embedding of a demand graph into a graph to the setting of hypergraphs. In this section, we leverage the definition of hypergraph flows to provide this generalization. In particular, we define the flow embedding of a (directed) graph into a weighted submodular hypergraphs with polymatroidal cut functions. We also show that these embeddings yield lower bounds on the hypergraph cut functions in terms of the graph cuts of the embedded graph. Analogously to graphs, we also show that any hypergraph flow can be converted into a flow embedding via flow-path decompositions.

Definition 9 (Hypergraph Flow Embedding).

Let G=(V,EG,𝐰G,𝝁)G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)} be a weighted submodular hypergraph equipped with a collection of polymatroidal cut functions {δh}hEG\{\delta_{h}\}_{h\in E_{G}} and let H=(V,EH,𝐰H)H=\big{(}V,E_{H},\mathbf{w}^{H}\big{)} be a weighted directed graph on the same vertex set. We say that the graph HH embeds into GG as a flow with congestion ρ0\rho\geq 0, denoted HρGH\preceq_{\rho}G if there exist hypergraph flows {𝐘e}eEH\{\mathbf{Y}^{e}\}_{e\in E_{H}} such that:

  1. 1.

    For all e=(i,j)EHe=(i,j)\in E_{H}, the flow 𝐘e\mathbf{Y}^{e} routes weHw^{H}_{e} units of flow from vertex ii to vertex jj, i.e.,

    dem(𝐘e)=weH(𝟏i𝟏j).\operatorname{dem}(\mathbf{Y}^{e})=w^{H}_{e}\cdot({\bf 1}_{i}-{\bf 1}_{j})\,.
  2. 2.

    For each hEGh\in E_{G}, the flows 𝐘eeEH{\mathbf{Y}^{e}}_{e\in E_{H}} respect the polymatroidal capacity constraints, i.e., for all hEGh\in E_{G}:

    eEh(𝐲he)+ρwh𝒫+(Fh),\displaystyle\sum_{e\in E_{h}}(\mathbf{y}_{h}^{e})_{+}\in\rho\cdot w_{h}\cdot\mathcal{P}_{+}(F_{h}^{-}),
    eEh(𝐲he)ρwh𝒫+(Fh+).\displaystyle\sum_{e\in E_{h}}(\mathbf{y}_{h}^{e})_{-}\in\rho\cdot w_{h}\cdot\mathcal{P}_{+}(F_{h}^{+}).

The capacity constraints can be understood by analogy with the capacity constraints for multi-commodity flows over a graph. As in that case, the constraints in the definition ensure that flows from different commodity along the same arc do not cancel out.

We will use this notion of embeddability to certify lower bounds to the ratio-cut objective for general polymatroidal cut functions. We will crucially make use of the following theorem in our development of the cut-matching game for solving ratio-cut problems over weighted submodular hypergraphs. Its proof is in Section A.

Theorem 5.

Let G=(V,EG,𝐰G,𝛍)G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)} be a weighted hypergraph equipped with a collection of polymatroidal cut functions {δh}hEG\{\delta_{h}\}_{h\in E_{G}}, and H=(V,EH,𝐰H)H=\big{(}V,E_{H},\mathbf{w}^{H}\big{)} be a weighted directed graph. If HρGH\preceq_{\rho}G, then

𝐱V,(i,j)EHwijH(𝐱i𝐱j)+ρhEGwhGδ¯h(𝐱) and SV,δH(S)ρδG(S).\forall\mathbf{x}\in\mathbb{R}^{V}\,,\sum_{(i,j)\in E_{H}}w^{H}_{ij}\cdot(\mathbf{x}_{i}-\mathbf{x}_{j})_{+}\leq\rho\cdot\sum_{h\in E_{G}}w^{G}_{h}\cdot\overline{\delta}_{h}(\mathbf{x})\;\textrm{ and }\;\forall S\subseteq V\,,\,\delta_{H}(S)\leq\rho\cdot\delta_{G}(S).

For symmetric polymatroidal cut functions {δh}hEG\{\delta_{h}\}_{h\in E_{G}} , then HρGH\preceq_{\rho}G implies the following bound on the undirected symmetrization H~=(V,EH~,𝐰H~)\tilde{H}=(V,E_{\tilde{H}},\mathbf{w}^{\tilde{H}}):

𝐱V,{i,j}EH~wijH~|𝐱i𝐱j|2ρhEGwhGδ¯h(𝐱) and SV,δH~(S)2ρδG(S).\forall\mathbf{x}\in\mathbb{R}^{V}\,,\sum_{\{i,j\}\in E_{\tilde{H}}}w^{\tilde{H}}_{ij}\cdot\lvert\mathbf{x}_{i}-\mathbf{x}_{j}\rvert\leq 2\cdot\rho\cdot\sum_{h\in E_{G}}w^{G}_{h}\cdot\overline{\delta}_{h}(\mathbf{x})\;\textrm{ and }\;\forall S\subseteq V\,,\,\delta_{\tilde{H}(S)}\leq 2\cdot\rho\cdot\delta_{G}(S).

We now state the main algorithmic result of this section, which allows us to construct hypergraph flow embedding from any single hypergraph flow. Just like in the case of graphs, this result is based on constructing a flow-path decomposition of the original flow. The particular form of polymatroidal cut functions plays monotonicity of the functions F+F^{+} and FF^{-} plays a crucial role in bounding the congestion of the embedding, which cannot be controlled in the same way for general submodular cut functions. The proof appears in Appendix A.

Theorem 6.

Let G=(V,EG,𝐰G,𝛍)G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)} be a weighted hypergraph equipped with a collection of polymatroidal cut functions {δh}hEG\{\delta_{h}\}_{h\in E_{G}}. Given a hypergraph flow 𝐘\mathbf{Y} over GG, there is an algorithm that, in time O~(𝗌𝗉(G))\tilde{O}\big{(}\mathsf{sp}(G)\big{)}, computes a weighted directed bipartite graph H=(V,Eh,𝐰H)H=\big{(}V,E_{h},\mathbf{w}^{H}\big{)} such that HcongG(𝐘)GH\preceq_{\operatorname{cong}_{G}(\mathbf{Y})}G and |EH|=O~(𝗌𝗉(G))|E_{H}|=\tilde{O}\big{(}\mathsf{sp}(G)\big{)}. Furthermore, all vertices iVi\in V with non-negative (resp. negative) demand demi(𝐘)\operatorname{dem}_{i}(\mathbf{Y}) are source nodes (resp. sink nodes) in HH, each with out-degree (resp. in-degree) equal demi(𝐘).\operatorname{dem}_{i}(\mathbf{Y}).

5 An O(logn)O(\sqrt{\log n})-Approximation via 22\ell_{2}^{2}-Metric Embeddings

In this section, we prove Theorem 1 by giving an 22\ell_{2}^{2}-metric relaxation and rounding argument yielding a randomized polynomial-time O(logn)O(\sqrt{\log n})-approximation algorithm for the minimum ratio-cut problem over submodular hypergraphs equipped with polymatroidal cut functions. As a warm-up, we give the simpler argument for symmetric polymatroidal cut functions, before proving the result for general polymatroidal cut functions in Section 5.2

5.1 Warm-up: Symmetric Polymatroidal Cut Functions

We begin by providing a relaxation for the symmetric version of the problem, in which Fh+=FhF_{h}^{+}=F_{h}^{-} for every hEh\in E. All the results in this subsection will focus exclusively on this special case, but they will be extended to the general case of all polymatroidal cut functions in the following subsection.

Given a weighted submodular hypergraph G=(V,E,𝐰,𝝁)G=(V,E,\mathbf{w},\bm{\mu}), we construct a vector-program relaxation of the continuous non-convex formulation in Equation (RC-NonCvx) by associating to each vertex iVEi\in V\cup E of the factor graph G^\hat{G} a vector 𝐯i\mathbf{v}_{i}. For a hyperedge hEh\in E and ih,i\in h, we can then relax the terms |𝐱iνh𝟏h||\mathbf{x}_{i}-\nu_{h}{\bf 1}_{h}|, which make up the argument of F¯h\bar{F}_{h} in Fact 4, with the distance 𝐝ih=def𝐯i𝐯h2.\mathbf{d}^{h}_{i}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\|\mathbf{v}_{i}-\mathbf{v}_{h}\|^{2}. We obtain the following semidefinite program in its vector embedding form:

min𝐯i\displaystyle\underset{\mathbf{v}_{i}}{\min} hEwhF¯h(𝐝h)\displaystyle\sum_{h\in E}w_{h}\cdot\bar{F}_{h}(\mathbf{d}^{h}) (RC-Metric)
s.t. {i,j}Vμ(i)μ(j)μ(V)𝐯i𝐯j221\displaystyle\sum_{\{i,j\}\subseteq V}{\mu(i)\mu(j)\over\mu(V)}\cdot\|\mathbf{v}_{i}-\mathbf{v}_{j}\|_{2}^{2}\geq 1
𝐯i𝐯j22𝐯i𝐯k22+𝐯k𝐯j22\displaystyle\|\mathbf{v}_{i}-\mathbf{v}_{j}\|^{2}_{2}\leq\|\mathbf{v}_{i}-\mathbf{v}_{k}\|_{2}^{2}+\|\mathbf{v}_{k}-\mathbf{v}_{j}\|_{2}^{2} i,j,kVE\displaystyle\forall\,i,j,k\in V\cup E
𝐝ih=𝐯i𝐯h2\displaystyle\mathbf{d}^{h}_{i}=\|\mathbf{v}_{i}-\mathbf{v}_{h}\|^{2} hE,ih\displaystyle\forall h\in E,i\in h
𝐯in,𝐝h0h\displaystyle\mathbf{v}_{i}\in\mathbb{R}^{n},\mathbf{d}^{h}\in\mathbb{R}^{h}_{\geq 0} iVE,hE\displaystyle\hskip 28.45274pt\forall\,i\in V\cup E,\,\forall\,h\in E

The following simple lemma is proved in Appendix A.

Lemma 2.

The (RC-Metric) vector program is, up to a constant, a relaxation of the minimum ratio-cut program for the hypergraph G=(V,E,𝐰,𝛍)G=(V,E,\mathbf{w},\bm{\mu}) with symmetric polymatroidal cut functions {δh}hE.\{\delta_{h}\}_{h\in E}.

Rounding via Metric Embedding

To round the convex program (RC-Metric), we apply the following well-known embedding result, which is implicit in the seminal work of Arora, Rao and Vazirani [10].

Theorem 7 ([10]).

Let {𝐯id}iVE\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V\cup E} be a feasible solution to the 22\ell_{2}^{2}-metric relaxation (RC-Metric). Then, there exists an efficiently computable 11-Lipschitz222Recall that, given a metric space (X,d)(X,d), a function f:Xf:X\to\mathbb{R} is said to be KK-Lipschitz for some constant KK if |f(x)f(y)|Kd(x,y)|f(x)-f(y)|\leq K\cdot d(x,y) for every x,yXx,y\in X. map ϕ:d\phi:\mathbb{R}^{d}\to\mathbb{R} satisfying:

i,jVμ(i)μ(j)|ϕ(𝐯i)ϕ(𝐯j)|i,jVμ(i)μ(j)𝐯i𝐯j2Ω(1log|V|).\frac{\sum_{i,j\in V}\mu(i)\mu(j)\cdot|\phi(\mathbf{v}_{i})-\phi(\mathbf{v}_{j})|}{\sum_{i,j\in V}\mu(i)\mu(j)\cdot\|\mathbf{v}_{i}-\mathbf{v}_{j}\|^{2}}\geq\Omega\left(\frac{1}{\sqrt{\log|V|}}\right).

To complete the proof of Theorem 1 for the special case of symmetric polymatroidal cut functions, we make use of the embedding in Theorem 7. The proof crucially relies on the monotonicity of the Lovász extension F¯\bar{F} (Fact 1). The ability to perform this step can be taken as a justification for our definition of polymatroidal cut functions.

Lemma 3.

There is a polynomial-time algorithm that given a solution {𝐯i}iVE\{\mathbf{v}_{i}\}_{i\in V\cup E} to (RC-Metric) of objective value κ:=hEwhF¯h(𝐝h)\kappa:=\sum_{h\in E}w_{h}\cdot\bar{F}_{h}(\mathbf{d}^{h}), returns a cut SVS\subseteq V such that:

ΨG(S)O(κlog|V|).\Psi_{G}(S)\leq O\big{(}\kappa\cdot\sqrt{\log|V|}\big{)}\,.
Proof.

Apply Theorem 7 to obtain 𝐱V\mathbf{x}\in\mathbb{R}^{V} and 𝜼RE\bm{\eta}\in R^{E} with 𝐱(i)=defϕ(𝐯i)\mathbf{x}(i)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\phi(\mathbf{v}_{i}) for all iVi\in V and 𝜼(h)=defϕ(𝐯h)\bm{\eta}(h)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\phi(\mathbf{v}_{h}). The 11-Lipschitzness of ϕ\phi guarantees, for every hEh\in E and ihi\in h:

|𝐱(i)𝜼(h)|=|ϕ(𝐯i)ϕ(𝐯h)|𝐯i𝐯h2=𝐝ih.\lvert\mathbf{x}(i)-\bm{\eta}(h)\rvert=\lvert\phi(\mathbf{v}_{i})-\phi(\mathbf{v}_{h})\rvert\leq\|\mathbf{v}_{i}-\mathbf{v}_{h}\|^{2}=\mathbf{d}^{h}_{i}.

By the monotonicity of F¯h\bar{F}_{h} in Fact 1, we have:

hEwhminνhF¯h(|𝐱hνh𝟏h|)hEwhF¯h(|𝐱h𝜼(h)𝟏h|)hEwhF¯h(𝐝h)=κ.\sum_{h\in E}w_{h}\cdot\min_{\nu_{h}\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h}|)\leq\sum_{h\in E}w_{h}\cdot\bar{F}_{h}(|\mathbf{x}_{h}-\bm{\eta}(h){\bf 1}_{h}|)\leq\sum_{h\in E}w_{h}\cdot\bar{F}_{h}(\mathbf{d}^{h})=\kappa. (12)

It now suffices to obtain a lower bound on the denominator in the continuous formulation (RC-NonCvx):

minγ𝐱γ𝟏1,𝝁12i,jV𝝁(i)𝝁(j)𝝁(V)|𝐱i𝐱j|Ω(1log|V|)i,jV𝝁(i)𝝁(j)𝝁(V)𝐯i𝐯j22Ω(1log|V|).{\min_{\gamma\in\mathbb{R}}\|\mathbf{x}-\gamma\boldsymbol{1}\|_{1,\bm{\mu}}}\geq{1\over 2}{\sum_{i,j\in V}{\bm{\mu}(i)\bm{\mu}(j)\over\bm{\mu}(V)}\cdot|\mathbf{x}_{i}-\mathbf{x}_{j}|}\geq\Omega\left({1\over\sqrt{\log|V|}}\right){\sum_{i,j\in V}{\bm{\mu}(i)\bm{\mu}(j)\over\bm{\mu}(V)}\|\mathbf{v}_{i}-\mathbf{v}_{j}\|^{2}_{2}}\geq\Omega\left({1\over\sqrt{\log|V|}}\right). (13)

Finally, Equations (12) and (13), yield, together with Fact 4:

hEwhδ¯h(𝐱)minγ𝐱γ𝟏1,𝝁2hEwhminνF¯h(|𝐱ν1|)minγ𝐱γ𝟏1,𝝁O(κlogn).{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})\over\min_{\gamma\in\mathbb{R}}\|\mathbf{x}-\gamma\boldsymbol{1}\|_{1,\bm{\mu}}}\leq 2\cdot{\sum_{h\in E}w_{h}\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu 1|)\over\min_{\gamma\in\mathbb{R}}\|\mathbf{x}-\gamma\boldsymbol{1}\|_{1,\bm{\mu}}}\leq O(\kappa\cdot\sqrt{\log n}).

The vector 𝐱\mathbf{x} can then be efficiently rounded to a subset SVS\subseteq V by Lemma 1. ∎

5.2 General Polymatroidal Cut Functions

For the case of general polymatroidal cut function, our metric relaxation must be able to capture the signed terms (𝐱hνh𝟏h)+(\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h})_{+} and (𝐱hνh𝟏h)(\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h})_{-} that appear in the Lovász extension in Fact 3. For this purpose, we will use the directed 22\ell_{2}^{2}-semimetrics introduced by Charikar, Makarychev and Makarychev [18] and, for every hEh\in E and ihi\in h, relax (𝐱iνh)+(\mathbf{x}_{i}-\nu_{h})_{+} to 𝐝ih,=def𝐯i𝐯h2+𝐯i22𝐯h22\mathbf{d}^{h,-}_{i}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{h}\rVert_{2}^{2} and (𝐱iνh)(\mathbf{x}_{i}-\nu_{h})_{-} to 𝐝ih,+=def𝐯i𝐯h2+𝐯h22𝐯i22\mathbf{d}^{h,+}_{i}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}+\lVert\mathbf{v}_{h}\rVert_{2}^{2}-\lVert\mathbf{v}_{i}\rVert_{2}^{2}. We obtain the following semidefinite program:

min𝐯i,h\displaystyle\underset{\mathbf{v}_{i},\bm{\ell}^{h}}{\min} hEwh(F¯h(𝐝h)+F¯h+(𝐝h+))\displaystyle\sum_{h\in E}w_{h}\cdot\left(\bar{F}_{h}^{-}(\mathbf{d}_{h}^{-})+\bar{F}_{h}^{+}(\mathbf{d}_{h}^{+})\right) (RC-Directed-Semimetric)
s.t. {i,j}Vμ(i)μ(j)μ(V)𝐯i𝐯j221\displaystyle\sum_{\{i,j\}\subseteq V}{\mu(i)\mu(j)\over\mu(V)}\cdot\|\mathbf{v}_{i}-\mathbf{v}_{j}\|_{2}^{2}\geq 1
𝐯i𝐯j22𝐯i𝐯k22+𝐯k𝐯j22\displaystyle\|\mathbf{v}_{i}-\mathbf{v}_{j}\|^{2}_{2}\leq\|\mathbf{v}_{i}-\mathbf{v}_{k}\|_{2}^{2}+\|\mathbf{v}_{k}-\mathbf{v}_{j}\|_{2}^{2} i,j,kVE\displaystyle\forall\,i,j,k\in V\cup E
𝐝ih,=𝐯i𝐯h2+𝐯i22𝐯h22\displaystyle\mathbf{d}^{h,-}_{i}=\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{h}\rVert_{2}^{2} hE,ih\displaystyle\forall h\in E,i\in h
𝐝ih,+=𝐯i𝐯h2+𝐯h22𝐯i22\displaystyle\mathbf{d}^{h,+}_{i}=\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}+\lVert\mathbf{v}_{h}\rVert_{2}^{2}-\lVert\mathbf{v}_{i}\rVert_{2}^{2} hE,ih\displaystyle\forall h\in E,i\in h
𝐯in,𝐝h,+,𝐝h,0h\displaystyle\mathbf{v}_{i}\in\mathbb{R}^{n},\mathbf{d}^{h,+},\mathbf{d}^{h,-}\in\mathbb{R}^{h}_{\geq 0} iVE,hE\displaystyle\forall\,i\in V\cup E,\,\forall\,h\in E

We have the following lemma, which we prove in Appendix A.

Lemma 4.

The (RC-Directed-Semimetric) vector program is, up to a constant, a relaxation of the minimum submodular hypergraph ratio-cut problem for the hypergraph G=(V,E,𝐰,𝛍)G=(V,E,\mathbf{w},\bm{\mu}) with polymatroidal cut functions {δh}hE.\{\delta_{h}\}_{h\in E}.

Rounding via Metric Embedding

To round a solution to the program (RC-Directed-Semimetric) to a solution of the original ratio-cut problem (RC-NonCvx), we require a version of the embedding result of Theorem 7 for directed semimetrics. The following result is a simple consequence of the rounding algorithm of Agarwal et al.  [2] for a similar relaxation of directed expansion. We prove it for completeness in Appendix A.

Theorem 8.

Given a feasible solution {𝐯id}iVE\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V\cup E} to RC-Directed-Semimetric and a measure 𝛍0V\bm{\mu}\in\mathbb{R}_{\geq 0}^{V}, there exists a polynomial-time computable map ϕ:d\phi:\mathbb{R}^{d}\to\mathbb{R} such that:

  1. 1.

    (ϕ(𝐯)ϕ(𝐰))+𝐯𝐰2+𝐯2𝐰2(\phi(\mathbf{v})-\phi(\mathbf{w}))_{+}\leq\lVert\mathbf{v}-\mathbf{w}\rVert^{2}+\lVert\mathbf{v}\rVert^{2}-\lVert\mathbf{w}\rVert^{2} for all 𝐯,𝐰d\mathbf{v},\mathbf{w}\in\mathbb{R}^{d},

  2. 2.

    and:

    i,jVμ(i)μ(j)|ϕ(𝐯i)ϕ(𝐯j)|i,jVμ(i)μ(j)𝐯i𝐯j22Ω(1log|V|)\frac{\sum_{i,j\in V}\mu(i)\mu(j)|\phi(\mathbf{v}_{i})-\phi(\mathbf{v}_{j})|}{\sum_{i,j\in V}\mu(i)\mu(j)\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}}\geq\Omega\left(\frac{1}{\sqrt{\log|V|}}\right)

We can now complete the proof of Theorem 1 for the general polymatroidal case by showing that the map ϕ\phi in Theorem 8 yields a O(logn)O(\sqrt{\log n})-approximate solution to the minimum ratio-cut problem.

Lemma 5.

Given a solution ({𝐯i}iVE)(\{\mathbf{v}_{i}\}_{i\in V\cup\in E}) to RC-Directed-Semimetric of value κ\kappa, we can produce in polynomial time a set SVS\subseteq V such that:

ΨG(S)O(κlog|V|).\Psi_{G}(S)\leq O(\kappa\cdot\sqrt{\log|V|}).
Proof.

Let ϕ\phi be the map whose existence is guaranteed by Theorem 8. Let 𝐱V\mathbf{x}\in\mathbb{R}^{V} and 𝜼E\bm{\eta}\in\mathbb{R}^{E} be the vector given by 𝐱i=ϕ(𝐯i)\mathbf{x}_{i}=\phi(\mathbf{v}_{i}) for all iVi\in V and ηh=ϕ(𝐱h)\eta_{h}=\phi(\mathbf{x}_{h}) for all hE.h\in E. By the first condition in Theorem 8, we have, for all hEh\in E and ihi\in h:

(𝐱iηh)+𝐝ih, and (𝐱iηh)𝐝ih,+.(\mathbf{x}_{i}-\eta_{h})_{+}\leq\mathbf{d}^{h,-}_{i}\,\textrm{ and }\,(\mathbf{x}_{i}-\eta_{h})_{-}\leq\mathbf{d}^{h,+}_{i}.

We can now exploit the monotonicity of the functions FhF^{-}_{h} and Fh+F^{+}_{h} associated with the polymatroidal cut functions δh\delta_{h}. By Fact 3 and Fact 1, we have:

hEwhδ¯h(𝐱)\displaystyle\sum_{h\in E}w_{h}\overline{\delta}_{h}(\mathbf{x}) =hEwhminνh{F¯h((𝐱hνh𝟏h)+)+F¯h+((𝐱hνh𝟏h))}\displaystyle=\sum_{h\in E}w_{h}\min_{\nu_{h}\in\mathbb{R}}\left\{\bar{F}^{-}_{h}((\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h})_{+})+\bar{F}^{+}_{h}((\mathbf{x}_{h}-\nu_{h}{\bf 1}_{h})_{-})\right\}
hEwh{F¯h((𝐱hηh𝟏h)+)+F¯h+((𝐱hηh𝟏h))}\displaystyle\leq\sum_{h\in E}w_{h}\left\{\bar{F}^{-}_{h}((\mathbf{x}_{h}-\eta_{h}{\bf 1}_{h})_{+})+\bar{F}^{+}_{h}((\mathbf{x}_{h}-\eta_{h}{\bf 1}_{h})_{-})\right\}
hEwh(F¯h(𝐝h,)+F¯h+(𝐝h,+))=κ.\displaystyle\leq\sum_{h\in E}w_{h}\left(\bar{F}^{-}_{h}(\mathbf{d}^{h,-})+\bar{F}^{+}_{h}(\mathbf{d}^{h,+})\right)=\kappa.

We can now lower bound the denominator in the objective of Equation (RC-NonCvx) as in Equation 13. Finally, we have:

hEwhδ¯h(𝐱)minγ𝐱γ𝟏1,𝝁O(κlogn).{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})\over\min_{\gamma\in\mathbb{R}}\|\mathbf{x}-\gamma\boldsymbol{1}\|_{1,\bm{\mu}}}\leq O(\kappa\cdot\sqrt{\log n}).

The vector 𝐱\mathbf{x} can then be efficiently rounded to a subset SVS\subseteq V by Lemma 1. ∎

6 Localized Convex Formulations and Ratio-Cut Improvement

In this section, we consider the localized formulation (Local-RC) introduced in Section 2.2, which we rewrite as a convex program. We show that the dual solutions to this problem are hypergraph flows, which can be used to obtain hypergraph flow embeddings into the input graph. Finally, we give algorithms that construct approximate primal-dual solutions for the (Local-RC) problem. These algorithms will play an important part in our cut-matching game.

We start by introducing the ratio-cut improvement objective, a local partitioning objective parametrized by a seed 𝐬RV\mathbf{s}\in R^{V}, which is the integral analogue of the (Local-RC) problem:

Definition 10 (Ratio-Cut Improvement Problem).

Given a weighted submodular hypergraph G=(V,E,𝐰,𝝁)G=(V,E,\mathbf{w},\bm{\mu}), cut functions {δh}hE\{\delta_{h}\}_{h\in E}, and a seed vector 𝐬V\mathbf{s}\in\mathbb{R}^{V} with 𝐬,𝟏μ=0\langle\mathbf{s},\boldsymbol{1}\rangle_{\mu}=0 and 𝐬1,\lVert\mathbf{s}\rVert_{\infty}\leq 1, the ratio-cut improvement objective ΨG,s\Psi_{G,s} is defined for all SVS\subset V as:

ΨG,𝐬(S)\displaystyle\Psi_{G,\mathbf{s}}(S) =defhEwhδh(Sh)max{0,𝐬,𝟏Sμ}\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{\sum_{h\in E}w_{h}\cdot\delta_{h}(S\cap h)}{\max\{0,\langle\mathbf{s},\boldsymbol{1}^{S}\rangle_{\mu}\}}
ΨG,𝐬\displaystyle\Psi^{*}_{G,\mathbf{s}} =defminSVΨG,𝐬(S)\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\min_{S\subseteq V}\Psi_{G,\mathbf{s}}(S)

The ratio-cut improvement problem asks to find SVS\subseteq V achieving ΨG,𝐬=ΨG,𝐬(S)\Psi^{*}_{G,\mathbf{s}}=\Psi_{G,\mathbf{s}}(S).

The ratio-cut improvement problem can be thought of as attempting to find a cut (S,S¯)(S,\overline{S}) which has both a small boundary and high correlation with the input seed 𝐬\mathbf{s} simultaneously. In Appendix B, we demonstrate that the problem of minimizing ΨG,𝐬\Psi_{G,\mathbf{s}} is precisely the generalization of the cut-improvement problem of Andersen and Lang [7] to weighted submodular hypergraphs and fractional seeds.

The relation between the minimum ratio-cut problem, the ratio-cut improvement problem and their continuous formulations  (RC-NonCvx) and  (Local-RC) is formalized in the following lemma.

Lemma 6.

For all 𝐱V\mathbf{x}\in\mathbb{R}^{V}, and seeds 𝐬V\mathbf{s}\in\mathbb{R}^{V} satisfying 𝐬1\lVert\mathbf{s}\rVert_{\infty}\leq 1 and 𝐬,𝟏𝛍=0\langle\mathbf{s},\boldsymbol{1}\rangle_{\bm{\mu}}=0, we have:

Ψ¯G(𝐱)Ψ¯G,𝐬(𝐱).\bar{\Psi}_{G}(\mathbf{x})\leq\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x})\,.

In particular, setting 𝐱=𝟏S\mathbf{x}=\boldsymbol{1}^{S} yields ΨG,𝐬\Psi_{G,\mathbf{s}}^{*} is a relaxation of ΨG\Psi_{G}^{*} for any SVS\subset V:

ΨG(S)ΨG,𝐬(S).\Psi_{G}(S)\leq\Psi_{G,\mathbf{s}}(S)\,.
Proof.

By the duality of conjugate norms, we have

0minu𝐱u𝟏1,𝝁=minumax𝐲1𝐲,𝐱u𝟏𝝁=max𝐲1𝐲,𝟏𝝁=0𝐲,𝐱𝐬,𝐱𝝁0\geq\min_{u\in\mathbb{R}}\lVert\mathbf{x}-u\boldsymbol{1}\rVert_{1,\bm{\mu}}=\min_{u\in\mathbb{R}}\max_{\lVert\mathbf{y}\rVert_{\infty}\leq 1}\langle\mathbf{y},\mathbf{x}-u\boldsymbol{1}\rangle_{\bm{\mu}}=\max_{\begin{subarray}{c}\lVert\mathbf{y}\rVert_{\infty}\leq 1\\ \langle\mathbf{y},\boldsymbol{1}\rangle_{\bm{\mu}}=0\end{subarray}}\langle\mathbf{y},\mathbf{x}\rangle\geq\langle\mathbf{s},\mathbf{x}\rangle_{\bm{\mu}}

for all 𝐱V\mathbf{x}\in\mathbb{R}^{V}. Consequently,

Ψ¯G(𝐱)=hEwhδ¯h(𝐱)minu𝐱u𝟏1,𝝁hEwhδ¯h(𝐱)max{0,𝐬,𝐱𝝁}=Ψ¯G,𝐬(𝐱)\bar{\Psi}_{G}(\mathbf{x})=\frac{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})}{\min_{u\in\mathbb{R}}\lVert\mathbf{x}-u\boldsymbol{1}\rVert_{1,\bm{\mu}}}\leq\frac{\sum_{h\in E}w_{h}\bar{\delta}_{h}(\mathbf{x})}{\max\{0,\langle\mathbf{s},\mathbf{x}\rangle_{\bm{\mu}}\}}=\bar{\Psi}_{G,\mathbf{s}}(\mathbf{x})

as required. ∎

Of critical, algorithmic, consequence is the fact that computing the value of Ψ¯G,𝐬\bar{\Psi}_{G,\mathbf{s}}^{*} can be written as a convex program. We refer to this program as (CI-Primal):

min𝐱V\displaystyle\underset{\mathbf{x}\in\mathbb{R}^{V}}{\min} hEwhδ¯h(𝐱)\displaystyle\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x}) (CI-Primal)
s.t. 𝐬,𝐱μ=1\displaystyle\langle\mathbf{s},\mathbf{x}\rangle_{\mu}=1

The following lemma is a simple consequence of the positive homogeneity of the Lovász extensions of submodular functions (Proposition 3.1 in [11]).

Lemma 7.

The optimum value of (CI-Primal) equals ΨG,𝐬\Psi^{*}_{G,\mathbf{s}}.

With this convex program, one can leverage convex duality to construct lower bounds for Ψ¯G,𝐬\bar{\Psi}^{*}_{G,\mathbf{s}}, and consequently ΨG,𝐬\Psi^{*}_{G,\mathbf{s}}. We consider a dual formulation of (CI-Primal) as a problem over hypergraph flows:

max\displaystyle{\max} α\displaystyle\alpha (CI-Dual)
s.t. dem(𝐘)=α(𝝁𝐬)\displaystyle\operatorname{dem}(\mathbf{Y})=\alpha\cdot\big{(}\bm{\mu}\circ\mathbf{s}\big{)}
congG(𝐘)1\displaystyle\operatorname{cong}_{G}(\mathbf{Y})\leq 1
α,𝐘={𝐲hh𝟏h}hE\displaystyle\alpha\in\mathbb{R},\mathbf{Y}=\{\mathbf{y}_{h}\in\mathbb{R}^{h}\perp{\bf 1}_{h}\}_{h\in E}

In this setting, strong duality between (CI-Primal) and (CI-Dual) holds by virtue of the (CI-Primal) having a finite number of polyhedral constraints. The full proof of the following lemma can be found in Appendix A.

Lemma 8.

The programs (CI-Primal) and (CI-Dual) are dual to each other. Strong duality holds.

From this point onwards, we restrict our attention to the kind of seed vectors that we will encounter in our application to the cut-matching game. For this purpose, we define the following notation for disjoint sets A,BVA,B\subseteq V with μ(A)μ(B)\mu(A)\leq\mu(B):

𝟏A,B=𝟏Aμ(A)μ(B)𝟏B.{\bf 1}^{A,B}={\bf 1}_{A}-\frac{\mu(A)}{\mu(B)}{\bf 1}_{B}.

Notice that 𝟏A,B{\bf 1}^{A,B} is a valid seed for the local formulation of the minimum ratio-cut problem as 𝟏A,B1\|{\bf 1}^{A,B}\|_{\infty}\leq 1 and 𝟏A,B,𝟏𝝁=0.\langle{\bf 1}^{A,B},{\bf 1}\rangle_{\bm{\mu}}=0. For this kind of seed, the dual problem can be interpreted as a maximum concurrent single-commodity flow over the hypergraph GG, in which we each vertex iAi\in A is asked to route demand μi\mu_{i} to BB and each vertex jj in BB is asked to receive demand μ(A)/μ(B)μj\nicefrac{{\mu(A)}}{{\mu(B)}}\cdot\mu_{j} from AA. For a solution of value α\alpha, each vertex will be able to route an α\alpha fraction of its demand and the total demand routed from AA to BB will be αμ(A).\alpha\mu(A). Notice that, in the graph case, for B=A¯B=\bar{A}, this construction specializes exactly to the undirected maximum concurrent flow problem used by Andersen and Lang [7] to solve the ratio-cut improvement problem.

Approximate Solutions

In Section 6.2, we will discuss how, for many kinds of polymatroidal cut functions, approximately feasible dual solutions to (CI-Dual) can be obtained by approximately solving a maximum flow problem over the factor graph or an augmented version of it. For this purpose, it is convenient to introduce a notion of approximate dual solution of value α\alpha to (CI-Dual).

Definition 11 (Approximate dual solution).

Let G=(V,EG,𝐰G,𝝁)G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)} be a weighted submodular hypergraph equipped with polymatroidal cut functions {δh}hE\{\delta_{h}\}_{h\in E}, and A,BVA,B\subseteq V disjoint sets with μ(A)μ(B)\mu(A)\leq\mu(B). An approximate dual solution of value α0\alpha\geq 0 for (G,𝟏A,B)(G,{\bf 1}^{A,B}) is a hypergraph flow 𝐘=(𝐲hh𝟏h)\mathbf{Y}=\big{(}\mathbf{y}^{h}\in\mathbb{R}^{h}\perp{\bf 1}_{h}\big{)} such that the following hold simultaneously:

  1. 1.

    𝐘\mathbf{Y} has unit congestion, i.e., congG(𝐘)1\operatorname{cong}_{G}(\mathbf{Y})\leq 1 ,

  2. 2.

    𝐘\mathbf{Y} routes a 1/2\nicefrac{{1}}{{2}}-fraction of the required flow from AA to BB, i.e., at least 1/2αμ(A)\nicefrac{{1}}{{2}}\cdot\alpha\cdot\mu(A) flow,

  3. 3.

    𝐘\mathbf{Y} does not exceed the require demand at every vertex, i.e., |dem(𝐘)|α|𝝁𝟏A,B||\operatorname{dem}(\mathbf{Y})|\leq\alpha\cdot|\bm{\mu}\circ{\bf 1}^{A,B}| .

6.1 Graph Certificates from Approximate Dual Solutions

As discussed in Section 2.2, our plan is to apply the cut-matching game to combine local lower bounds given by approximately optimal solutions to (CI-Dual) into global lower bounds for the non-convex problem Ψ¯G\bar{\Psi}^{*}_{G}. To carry out this plan, we need to express our local lower bounds, i.e. dual solutions to (CI-Dual) in terms of hypergraph embeddings of graphs into the instance hypergraph.

Definition 12.

Let G=(V,EG,𝐰G,𝝁)G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)} be a weighted submodular hypergraph equipped with polymatroidal cut functions {δh}hE\{\delta_{h}\}_{h\in E}, and A,BVA,B\subseteq V disjoint sets with μ(A)μ(B).\mu(A)\leq\mu(B). Let α=ΨG,𝟏A,B.\alpha=\Psi^{*}_{G,{\bf 1}^{A,B}}. An approximate dual graph certificate of value α0\alpha\geq 0 for the (Local-RC) problem with seed 𝟏A,B{\bf 1}^{A,B} is a directed graph D=(V,ED,𝐰D)D=(V,E_{D},\mathbf{w}^{D}) with the following properties:

  1. 1.

    DD is sparse, i.e., |ED|O~(𝗌𝗉(G)),|E_{D}|\leq\tilde{O}(\mathsf{sp}(G)),

  2. 2.

    αD\alpha\cdot D embeds into GG with congestion 11, or equivalently D1/αG,D\preceq_{\nicefrac{{1}}{{\alpha}}}G,

  3. 3.

    DD is bipartite from AA to BB, i.e. all arcs of DD go from AA to BB,

  4. 4.

    for all iA,jBi\in A,j\in B, we have the degree bounds degD(i)μi\deg_{D}(i)\leq\mu_{i} and degD(j)μ(A)/μ(B)μj\deg_{D}(j)\leq\nicefrac{{\mu(A)}}{{\mu(B)}}\cdot\mu_{j},

  5. 5.

    DD is large, i.e., wD(E(A,B))1/2μ(A)w^{D}(E(A,B))\geq\nicefrac{{1}}{{2}}\cdot\mu(A).

Following Theorem 5, we may assume that DD is undirected if all cut functions are symmetric.

In the cut-matching game framework, we will use dual graph certificates to iteratively construct lower bounds to ΨG\Psi^{*}_{G} through the application of Theorem 5. For example, given an approximate dual graph certificate DD of value α\alpha, Theorem 5 yields the lower bound:

SV,δG(S)αδD(S)=αwD(E(SA,S¯B)).\forall S\subseteq V\,,\delta_{G}(S)\geq\alpha\cdot\delta_{D}(S)=\alpha\cdot w^{D}(E(S\cap A,\bar{S}\cap B)).

As we built DD from a dual solution for (CI-Dual) with seed 𝟏A,B{\bf 1}^{A,B}, this lower bound is tighter for cuts SS that are well-correlated with 𝟏A,B{\bf 1}^{A,B}, reaching its maximum when ASB¯.A\subseteq S\subseteq\bar{B}.

We can now apply the flow-path decomposition of Theorem 6 to show how to efficiently construct an approximate dual graph certificate from an approximate dual solution. The straightforward proof is found in Appendix A.

Lemma 9.

Under the same assumptions of Definition 12, consider an approximate dual solution 𝐘\mathbf{Y} of value α\alpha to the ratio-cut improvement problem on (G,𝟏A,B).(G,{\bf 1}^{A,B}). Let HH be the directed graph obtained by applying the flow-path decomposition algorithm of Theorem 6 to the hypergraph flow 𝐘\mathbf{Y}. Then, the scaled graph 1αH\frac{1}{\alpha}\cdot H is an approximate dual graph certificate of value α.\alpha.

6.2 Approximately Solving (Local-RC)

In this section, we discuss algorithms that construct approximate solutions to (CI-Primal) and (CI-Dual), the primal and dual convex formulations of (Local-RC) . Our specific focus is to describe algorithms that implement the following specification:

Definition 13.

Given a weighted submodular hypergraph G=(V,E,𝐰,𝝁)G=(V,E,\mathbf{w},\bm{\mu}) with polymatroidal cut functions {δh}hE\{\delta_{h}\}_{h\in E} and disjoint sets A,BVA,B\subset V with μ(A)μ(B)\mu(A)\leq\mu(B), an approximate primal-dual oracle problem is an algorithm 𝒜CI\mathcal{A}_{\textup{{CI}}} that takes as input GG and the seed vector 𝟏A,B{\bf 1}^{A,B}. For some α0,\alpha\geq 0, the algorithm 𝒜CI\mathcal{A}_{\textup{{CI}}} outputs both:

  1. 1.

    a cut SVS\subseteq V with ΨG,𝟏A,B(S)3α,\Psi_{G,{\bf 1}^{A,B}}(S)\leq 3\alpha,

  2. 2.

    an approximate dual graph certificate of value α.\alpha.

We prove the following two theorems in Appendix C by a simple application of binary search over α.\alpha. The first thereom bounds the complexity of constructing an approximate primal-dual oracle for general polymatroidal cut functions.

Theorem 9.

For submodular hypergraphs with general polimatroidal cut functions, an approximate primal-dual oracle can be implemented in time O~(|V|hEΘh),\tilde{O}\big{(}|V|\cdot\sum_{h\in E}\Theta_{h}\big{)}, where Θh\Theta_{h} is the running time for a quadratic optimization oracle for δh,\delta_{h}, by solving O(log|V|)O(\log|V|)-many decomposable submodular minimization problems.

Our second theorem shows how this result can be significantly improved in the case of the standard hypergraph cut functions by relying on almost-linear-time maximum flow solvers [13, 20].

Theorem 10.

For a submodular hypergraph GG equipped with the directed or standard hypergraph cut functions, an approximate primal-dual oracle can be implemented in almost linear-time by solving O(log|V|)O(\log|V|)-many 1/2\nicefrac{{1}}{{2}}-approximate maximum flow problems over a directed, vertex-capacitated version of the factor graph G^.\hat{G}.

7 An O(logn)O(\log n)-Approximation via Generalized Cut-Matching Games

We now describe how to use our cut-matching game to approximate minimum ratio-cut. Let us begin by stating the precise approximation guarantee garnered by running a game between a good cut strategy, and an approximate primal-dual solver for the ratio-cut improvement problem (Local-RC).

Theorem 11.

Let G=(V,E,𝐰,𝛍)G=(V,E,\mathbf{w},\bm{\mu}) be an input submodular hypergraph equipped with polymatroidal cut functions. Consider an execution of the cut-matching game in which the cut player 𝒞\mathcal{C} plays an (f(n),g(n))\big{(}f(n),g(n)\big{)}-good cut strategy, while the matching player plays the dual graph certificates output by an approximate primal-dual oracle 𝒜CI\mathcal{A}_{\textup{{CI}}} (Definition 13) on GG. Let CtC_{t} be the cuts output by 𝒜CI\mathcal{A}_{\textup{{CI}}}. Then, with constant probability, we have the following approximation result:

min{ΨG(Ct)}t=1g(n)O(g(n)f(n))ΨG.\min\bigg{\{}\Psi_{G}(C_{t})\bigg{\}}_{t=1}^{g(n)}\leq O\bigg{(}\frac{g(n)}{f(n)}\bigg{)}\cdot\Psi^{*}_{G}.

This is a standard result for cut-matching games and its proof does not deviate too far from that in previous work. We give its proof in Appendix D for completeness.

Theorem 11 reduces the task of approximating minimum ratio-cut to constructing a good cut strategy. Hence, this section is devoted towards proving the following theorem.

Theorem 12.

Let H=(V,EH,𝐰H,𝛍)H=\big{(}V,E_{H},\mathbf{w}^{H},\bm{\mu}\big{)} be the state graph for an (n,𝛍)(n,\bm{\mu})-generalized cut-matching game. There exists a cut strategy 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}} satisfying the following:

  1. 1.

    If HH is undirected, then 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}} is (Ω(logn),O(log2n))\left(\Omega(\log n),O(\log^{2}n)\right)-good with probability O(1)O(1).

  2. 2.

    If HH is directed, then 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}} is (Ω(log2n),O(log3n))\left(\Omega(\log^{2}n),O(\log^{3}n)\right)-good with probability O(1)O(1).

Furthermore, 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}} outputs a cut action in time O~(𝗌𝗉(H))\tilde{O}\big{(}\mathsf{sp}(H)\big{)}.

To prove this theorem, we give an optimization-based analysis of the cut-matching game by using regret minimization techniques from online optimization to construct and analyze 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}}. This approach is very natural: the goal of online optimization is to adaptively produce a sequence of actions that will lead to some terminal outcome despite the presence of adversarial response. For the cut-matching game, the cut player’s goal is to produce a short sequence of cut actions such that, no matter what approximate dual responses are given by \mathcal{M}, the state graph HH will have large ratio-cut objective.

This approach also provides insight into why earlier combinatorial restrictions on valid cut and matching responses are not necessary when using the cut-matching game to partition (hyper)graphs. An interesting byproduct of our construction, in the case where {δh}hE\{\delta_{h}\}_{h\in E} are asymmetric, is that \mathcal{M} does not necessarily route Eulerian demand graphs. Additionally, no further reductions are required to enforce these combinatorial restrictions when approximating ratio-cut objectives beyond expansion.

To prove Theorem 12, we first describe our construction of 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}} for the case where the cut functions defining the ratio-cut objective are symmetric, and HH is undirected. Simply considering this case already demonstrates a large fraction of our technical contributions. We then consider asymmetric cut functions and directed HH, highlighting key differences from the symmetric case.

7.1 An (Ω(logn),O(log2n))\left(\Omega(\log n),O\big{(}\log^{2}n\big{)}\right)-good Cut Strategy for Symmetric Cut Functions

We first consider the case where the polymatroidal cut functions {δh}hE\{\delta_{h}\}_{h\in E} are symmetric.

Let us start by instantiating the online linear optimization setting in the context of our cut-matching game. For an instance of a (n,𝝁)(n,\bm{\mu})-generalized cut-matching game, fix the action set to be 𝒳=Δ𝝁n×n\mathcal{X}=\Delta^{n\times n}_{\bm{\mu}} where

Δ𝝁n×n=def{𝐗𝟎:𝐋(K𝝁),𝐗1}.\Delta^{n\times n}_{\bm{\mu}}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}\mathbf{X}\succeq\mathbf{0}:\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1\big{\}}\,.

At the end of round tt, the matching player \mathcal{M} will have produced approximate matching responses D1,,DtD_{1},\ldots,D_{t}. Choose the tt-th feedback matrix to be 𝐅t=𝐋(Dt)\mathbf{F}_{t}=\mathbf{L}(D_{t}). The loss incurred during round tt is then

𝐋(Dt),𝐗.\big{\langle}\mathbf{L}(D_{t}),\mathbf{X}\big{\rangle}\,.

We instantiate online linear optimization in this way because any bound on regret now translates to a lower bound on the ratio-cut objective for the state graph HH. When {δh}hE\{\delta_{h}\}_{h\in E} are symmetric, HH is an undirected graph, and ΨH\Psi_{H}^{*} is simply the undirected ratio-cut objective, which has the following trivial SDP relaxation:

Lemma 10.

Given any undirected graph G=(V,EG,𝐰G,𝛍)G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)}, we have 𝗈𝗉𝗍2ΨG\frac{\mathsf{opt}}{2}\leq\Psi_{G}^{*} where 𝗈𝗉𝗍\mathsf{opt} is the optimal objective value for the following semidefinite program.

min\displaystyle{\min} 𝐋(G),𝐗\displaystyle\big{\langle}\mathbf{L}(G),\mathbf{X}\big{\rangle} (cond-sdp)
s.t. 𝐋(K𝝁),𝐗1\displaystyle\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1
𝐗𝟎\displaystyle\mathbf{X}\succeq\mathbf{0}

Our choice of action set thus coincides with the feasible set for (cond-sdp), while the feedback matrices are chosen so that the sum of the losses give the SDP objective for the state graph. This is as intended; we want the cut player to force a state graph with large ratio-cut objective, thus it is crucial to have some lower bound on the ratio-cut objective that is a function of both cut, and matching responses. This is also to be expected; if we wanted regret minimizing strategies to produce approximate solutions for (cond-sdp), then we would have defined the action set and losses in exactly the same way.

Three issues remain if we wish to use a regret bound from Theorem 4 to analyze a good cut strategy:

  1. (1)

    Ensuring Small Width: We need to ensure that the width term 𝐌1/2𝐋(Dt)𝐌1/2𝗌𝗉𝖾𝖼\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}} is small given any approximate matching response DtD_{t}.

  2. (2)

    Ensuring Large Loss: For our cut strategy to be good, we will need to specify how to produce a cut action such that the loss 𝐋(Dt),𝐗t\big{\langle}\mathbf{L}(D_{t}),\mathbf{X}_{t}\big{\rangle} regardless of what approximate matching response DtD_{t} is played.

  3. (3)

    Implementing MMWUη,𝛍\textup{{MMWU}}_{\eta,\bm{\mu}} in Nearly-linear Time: If we want our cut player to run in nearly-linear time, then we will (a) need to compute the update MMWUη,𝝁\textup{{MMWU}}_{\eta,\bm{\mu}} in nearly-linear time, and (b) ensure that queries on the vector embedding given by the Gram matrix of the update can be computed in sublinear time.

Let us address these issues now.

7.1.1 Ensuring Small Width

The width term is constant for any approximate matching response. This follows simply from the fact that the degree of any vertex ii is bounded by μi\mu_{i}.

Lemma 11.

Any approximate matching response D=(V,ED,𝐰D,𝛍)D=\big{(}V,E_{D},\mathbf{w}^{D},\bm{\mu}\big{)} satisfies 𝟎𝐋(D)2𝐌\mathbf{0}\preceq\mathbf{L}(D)\preceq 2\cdot\mathbf{M}.

Notice that, since 𝐋(D)𝟎\mathbf{L}(D)\succeq\mathbf{0}, the regret guarantee in Theorem 4 holds for any step-size η>0\eta>0.

7.1.2 Ensuring Large Loss

Let {𝐯i(t)}iVd\big{\{}\mathbf{v}^{(t)}_{i}\big{\}}_{i\in V}\subseteq\mathbb{R}^{d} be a vector embedding of VV whose Gram matrix is 𝐗t\mathbf{X}_{t}. If Dt=(V,EDt,𝐰Dt)D_{t}=\big{(}V,E_{D_{t}},\mathbf{w}^{D_{t}}\big{)} is the approximate matching response at time tt, then the loss 𝐋Dt,𝐗t\big{\langle}\mathbf{L}_{D_{t}},\mathbf{X}_{t}\big{\rangle} is equivalent to

𝐋Dt,𝐗t=(i,j)EDtwijDt𝐯i(t)𝐯j(t)2.\big{\langle}\mathbf{L}_{D_{t}},\mathbf{X}_{t}\big{\rangle}=\sum_{(i,j)\in E_{D_{t}}}w_{ij}^{D_{t}}\cdot\big{\lVert}\mathbf{v}^{(t)}_{i}-\mathbf{v}^{(t)}_{j}\big{\rVert}^{2}\,.

For our cut strategy to be good, we need to produce a cut action (At,Bt)(A_{t},B_{t}) such that, no matter how DtD_{t} is played, the edges of DtD_{t} will always be placed across pairs of embedding vectors 𝐯i(t)\mathbf{v}^{(t)}_{i}, 𝐯j(t)\mathbf{v}^{(t)}_{j} that are well-separated under the 22\ell_{2}^{2}-norm. This can be difficult to guarantee if the flow routed is approximate. If the primal-dual solver can only route a constant factor approximate flow, then an adversarially chosen constant fraction of the demand may be dropped. If one naively rounds (At,Bt)(A_{t},B_{t}) using a random 1-dimensional projection, as in [9, 53], then the rounding can produce an unbalanced cut, upon which the adversary can choose to drop a fraction of demand that routes between all vertex pairs whose embedding vectors are far apart in 22\ell_{2}^{2}-distance.

Previous works [23, 22, 49] have addressed this issue by routing a polylogarithmic number of O(1)O(1)-factor approximate maximum flows during a single iteration of the cut-matching game However, a technical contribution of this construction is that a single approximate primal-dual solve of (Local-RC) suffices, provided (At,Bt)(A_{t},B_{t}) are σ\sigma-separated in the sense of Definition (8).

The following lemma shows that any approximate matching response routing demand across σ\sigma-separated (At,Bt)(A_{t},B_{t}) must always route a large amount of flow between embedding vectors that are well separated in 22\ell_{2}^{2}-distance. Using (At,Bt)(A_{t},B_{t}) as the cut action will thus always result in large loss.

Lemma 12.

Given 𝛍0\bm{\mu}\in\mathbb{Z}_{\geq 0}, let 𝐗Δ𝛍n×n\mathbf{X}\in\Delta_{\bm{\mu}}^{n\times n} and {𝐯i}iVd\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d} be the vector embedding whose Gram matrix is 𝐗\mathbf{X}. If (A,B)(A,B) is σ\sigma-well separated with respect to 𝛍\bm{\mu} and {𝐯i}iV\{\mathbf{v}_{i}\}_{i\in V}, then for any approximate matching response D=(AB,ED,𝐰D)D=\big{(}A\cup B,E_{D},\mathbf{w}^{D}\big{)} with respect to (A,B)(A,B), we have:

{i,j}EDwijD𝐯i𝐯j22σ2.\sum_{\{i,j\}\in E_{D}}w_{ij}^{D}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq{\sigma\over 2}\,.
Proof.

Since (A,B)(A,B) are σ\sigma-separated we have for all iAi\in A and jBj\in B:

𝝁(A)𝝁(B)𝝁(V)𝐯i𝐯j22σi,j(V2)μiμj𝝁(V)𝐯i𝐯j22=σ𝐋(K𝝁),𝐗σ.\frac{\bm{\mu}(A)\bm{\mu}(B)}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\sigma\cdot\sum_{i,j\in\binom{V}{2}}\frac{\mu_{i}\mu_{j}}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}=\sigma\cdot\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq\sigma\,.

Here, we use 𝐗Δ𝝁n×n\mathbf{X}\in\Delta_{\bm{\mu}}^{n\times n} to deduce 𝐋(K𝝁),𝐗1\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1. Rearranging the above inequality then implies

𝐯i𝐯j22σ𝝁(A)𝝁(V)𝝁(B)σ𝝁(A)iA,jB\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\frac{\sigma}{\bm{\mu}(A)}\cdot\frac{\bm{\mu}(V)}{\bm{\mu}(B)}\geq\frac{\sigma}{\bm{\mu}(A)}\qquad\qquad\forall\,i\in A,j\in B

where the last inequality follows by 𝝁(B)𝝁(V)\bm{\mu}(B)\leq\bm{\mu}(V). Finally, any approximate matching response is bipartite across (A,B)(A,B), and has large total weight. Thus we can conclude

{i,j}EDwijD𝐯i𝐯j22σ𝝁(A)iA,jBwijD=σ𝝁(A)𝐰(e(A,B))σ2\sum_{\{i,j\}\in E_{D}}w_{ij}^{D}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\frac{\sigma}{\bm{\mu}(A)}\cdot\sum_{i\in A,j\in B}w_{ij}^{D}=\frac{\sigma}{\bm{\mu}(A)}\cdot\mathbf{w}\big{(}e(A,B)\big{)}\geq{\sigma\over 2}

as required. ∎

In section 8, we give an algorithm round-cut that produces Ω(1logn)\Omega\big{(}\frac{1}{\log n}\big{)}-separated (At,Bt)(A_{t},B_{t}). This algorithm will subsequently be used in our cut strategy construction for the symmetric case.

7.1.3 Implementing MMWU in Nearly-linear Time

Computing MMWUη,𝝁\textup{{MMWU}}_{\eta,\bm{\mu}} in nearly-linear time, along with 22\ell_{2}^{2}-distances and projections in logarithmic time are standard techniques. We briefly sketch them here.

Lemma 6 in [9] states one can compute a matrix-vector product with a matrix exponential by truncating its Taylor expansion. Given any 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} and 𝐯n\mathbf{v}\in\mathbb{R}^{n}, one can compute 𝐮n\mathbf{u}\in\mathbb{R}^{n} such that

𝐮exp(𝐀)𝐯ϵ\big{\lVert}\mathbf{u}-\exp(\mathbf{A})\mathbf{v}\big{\rVert}\leq\epsilon

in time O(𝒯mvmax{𝐌,log(1ϵ)})O\big{(}\mathcal{T}_{\textsf{mv}}\cdot\max\big{\{}\lVert\mathbf{M}\rVert,\log\big{(}\frac{1}{\epsilon}\big{)}\big{\}}\big{)} where 𝒯mv\mathcal{T}_{\textsf{mv}} is matrix-vector product time. To compute MMWUη,𝝁\textup{{MMWU}}_{\eta,\bm{\mu}} in nearly-linear time, we take advantage of two details.

First, each loss matrix provided to MMWU is given by the Laplacian of a graph DtD_{t}. Consequently, the sparsity of the matrix being exponentiated in MMWU is bounded by 𝗌𝗉(Ht)\mathsf{sp}(H_{t}). Consequently, matrix-vector product time satisfies 𝒯mv=O(𝗌𝗉(HT))\mathcal{T}_{\textsf{mv}}=O\big{(}\mathsf{sp}(H_{T})\big{)} where TT is the total number of cut-matching game rounds. Second, Lemma 11 implies that 𝐌1/2𝐋(Dt)𝐌1/2𝗌𝗉𝖾𝖼O(1)\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}\leq O(1) for every tTt\in T, and thus

k=1t𝐌1/2𝐋(Dt)𝐌1/2𝗌𝗉𝖾𝖼O(t).\Big{\lVert}\sum_{k=1}^{t}\mathbf{M}^{-1/2}\mathbf{L}(D_{t})\mathbf{M}^{-1/2}\Big{\rVert}_{\mathsf{spec}}\leq O(t)\,.

If we run at most 𝗉𝗈𝗅𝗒𝗅𝗈𝗀(n)\operatorname*{\mathsf{polylog}}(n) iterations of the cut-matching game, we can compute an O(1𝗉𝗈𝗅𝗒(n))O\big{(}\frac{1}{\mathsf{poly}(n)}\big{)}-additive approximation to MMWU by truncating the Taylor expansion up to a 𝗉𝗈𝗅𝗒𝗅𝗈𝗀(n)\operatorname*{\mathsf{polylog}}(n) number of terms.

To compute 22\ell_{2}^{2}-distances, and projections in logarithmic time, one can use the Johnson–Lindenstrauss transform [33] to project the embedding into O(lognδ2)O\big{(}\frac{\log n}{\delta^{2}}\big{)} dimensions for δ=O(1)\delta=O(1) (See, e.g. [24, 1, 14]). Since the error is multiplicative in terms of pairwise 22\ell_{2}^{2}-distances, picking δ\delta to be constant only causes a constant factor loss in the number of cut strategy plays, and the final approximation ratio. We will ignore this constant in the subsequent calculations 333Picking δ>110\delta>\frac{1}{10} loses a factor 22 in the final bound on g(n)g(n) when proving Theorem 12.

7.1.4 Completing the Analysis for Symmetric Cut Functions

We are now ready to describe the cut strategy which we call 𝒜𝖼𝗎𝗍\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}}. Fix a step size η>0\eta>0 to be chosen subsequently, and consider round tt of a (n,𝝁)(n,\bm{\mu})-generalized cut-matching game. Up to this point, the matching player \mathcal{M} will have produced approximate matching responses D1,,DtD_{1},\ldots,D_{t}.

The cut strategy is straightforward: use Johnson-Lindenstrauss and the truncated Taylor expansion, as in Lemma 6 of [9], to compute a low-dimensional projection of the embedding whose Gram matrix is given by MMWUη,𝝁(𝐋(D1),,𝐋(Dt))\textup{{MMWU}}_{\eta,\bm{\mu}}\big{(}\mathbf{L}(D_{1}),\ldots,\mathbf{L}(D_{t})\big{)}, execute round-cut on the resulting embedding to produce σ\sigma-separated sets (At,Bt)(A_{t},B_{t}), and output (At,Bt)(A_{t},B_{t}) as the cut action. Our strategy is summarized in Algorithm 2.

Algorithm 1 𝒜𝖼𝗎𝗍\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}}. Input: vertex measure 𝝁0V\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}, and approximate matching responses D1,,DtD_{1},\ldots,D_{t}. Parameters: step size η>0\eta>0. Do: The following. 1. Sample d=O(logn)d=O\big{(}\log n\big{)} random unit vectors 𝐮1,,𝐮d𝒮n1\mathbf{u}_{1},\ldots,\mathbf{u}_{d}\sim\mathcal{S}^{n-1}. 2. Denote 𝐖t=MMWUη,𝝁(𝐋(D1),,𝐋(Dt))\mathbf{W}_{t}=\textup{{MMWU}}_{\eta,\bm{\mu}}\big{(}\mathbf{L}(D_{1}),\ldots,\mathbf{L}(D_{t})\big{)}, and compute {𝐯^i(t)}iVd\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\subseteq\mathbb{R}^{d} given by (𝐯^i(t))j=𝐖t𝐮j,𝐞iiVj[d]\big{(}\hat{\mathbf{v}}^{(t)}_{i}\big{)}_{j}=\langle\mathbf{W}_{t}\mathbf{u}_{j},\mathbf{e}_{i}\rangle\qquad\qquad\forall\,i\in V\;j\in[d] using the truncated Taylor series via [9]. 3. Execute round-cut(d,{𝐯^i(t)}iV)\textup{{round-cut}}\big{(}d,\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\big{)} to produce the cut (At,Bt)(A_{t},B_{t}). Output: The cut (At,Bt)(A_{t},B_{t}).

Figure 2: The cut strategy 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}}^{\leftrightarrow} for symmetric polymatroidal cut functions {δh}hE\{\delta_{h}\}_{h\in E}.

Let us now demonstrate that 𝒜𝖼𝗎𝗍\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}} is a (Ω(logn),O(log2n))\left(\Omega(\log n),O\big{(}{\log^{2}n}\big{)}\right)-good cut strategy.

Lemma 13.

Let G=(V,E,𝐰,𝛍)G=(V,E,\mathbf{w},\bm{\mu}) be an input submodular hypergraph equipped with symmetric polymatroidal cut functions {δh}hE\{\delta_{h}\}_{h\in E} such that |V|=n\lvert V\rvert=n. Then 𝒜𝖼𝗎𝗍\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}} is (Ω(logn),O(log2n))\Big{(}\Omega(\log n),O\big{(}\log^{2}n\big{)}\Big{)}-good with probability O(1)O(1).

Proof.

Let T=g(n)T=g(n), and for any sequence of weighted undirected approximate matching responses D1,,DTD_{1},\ldots,D_{T}, suppose 𝒜𝖼𝗎𝗍\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}} plays cut actions (A1,B1),,(AT,BT)(A_{1},B_{1}),\ldots,(A_{T},B_{T}) in response. To show that 𝒜𝖼𝗎𝗍\mathcal{A}^{\leftrightarrow}_{\mathsf{cut}} is a (Ω(logn),O(log2n))\left(\Omega(\log n),O\big{(}{\log^{2}n}\big{)}\right)-good cut strategy, we first show that ΨHΩ(logn)\Psi^{*}_{H}\geq\Omega(\log n) after T=O(log2n)T=O\big{(}\log^{2}n\big{)} rounds of the cut-matching game.

Fix H=t=1TDtH=\sum_{t=1}^{T}D_{t} to be the state graph of the cut-matching game. For every t>0t>0, the feedback matrix satisfies 𝐋(Dt)𝟎\mathbf{L}(D_{t})\succeq\mathbf{0}, hence we can use the regret bound given by (7) in Theorem 4. Lemma 11 ensures

𝐌1/2𝐋(Dt)𝐌1/2𝗌𝗉𝖾𝖼2\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}\leq 2

Consequently, we have for every 𝐔Δ𝝁n×n\mathbf{U}\in\Delta_{\bm{\mu}}^{n\times n} and η>0\eta>0:

1T𝐋(H),𝐔=1Tt=1T𝐋(Dt),𝐔12ηTt=1T𝐋(Dt),𝐗tlognηT,\big{\langle}\tfrac{1}{T}\cdot\mathbf{L}(H),\mathbf{U}\big{\rangle}=\frac{1}{T}\sum_{t=1}^{T}\big{\langle}\mathbf{L}(D_{t}),\mathbf{U}\big{\rangle}\geq\frac{1-2\eta}{T}\sum_{t=1}^{T}\big{\langle}\mathbf{L}(D_{t}),\mathbf{X}_{t}\big{\rangle}-\frac{\log n}{\eta T}\,,

Next, for fixed t>0t>0, let {𝐯i(t)}iV\big{\{}\mathbf{v}_{i}^{(t)}\big{\}}_{i\in V} be the vector embedding whose Gram matrix is 𝐗t\mathbf{X}_{t}. Theorem 3 implies round-cut outputs Ω(1logn)\Omega\big{(}\frac{1}{\log n}\big{)}-separated sets (At,Bt)(A_{t},B_{t}). Lemma 12 then implies

12ηTt=1T𝐋(Dt),𝐗tlognηT=12ηTt=1TijE(Dt)𝐰ijDt𝐯i(t)𝐯j(t)22lognηTC1(12η)lognlognηT.\frac{1-2\eta}{T}\sum_{t=1}^{T}\big{\langle}\mathbf{L}(D_{t}),\mathbf{X}_{t}\big{\rangle}-\frac{\log n}{\eta T}=\frac{1-2\eta}{T}\sum_{t=1}^{T}\sum_{ij\in E(D_{t})}\mathbf{w}_{ij}^{D_{t}}\cdot\lVert\mathbf{v}^{(t)}_{i}-\mathbf{v}^{(t)}_{j}\rVert^{2}_{2}-\frac{\log n}{\eta T}\geq\frac{C_{1}(1-2\eta)}{\log n}-\frac{\log n}{\eta T}\,.

for some absolute constant C1>0C_{1}>0. Let C2C_{2} be constant satisfying 22C2<C12\sqrt{\frac{2}{C_{2}}}<\sqrt{C_{1}}, then with choice of η=12C1C2\eta={1\over\sqrt{2C_{1}C_{2}}}, and T=C2log2nT=C_{2}\cdot{\log^{2}n}, we have that:

C1(12η)lognlognηT\displaystyle\frac{C_{1}(1-2\eta)}{\log n}-\frac{\log n}{\eta T} =C1logn(22C1C2)1lognΩ(1logn),\displaystyle={C_{1}\over\log n}-\left(2\sqrt{2C_{1}\over C_{2}}\right){1\over\log n}\geq\Omega\left({1\over\log n}\right),

and thus establishing for every 𝐔Δ𝝁n×n\mathbf{U}\in\Delta_{\bm{\mu}}^{n\times n}

𝐋(H),𝐔Ω(Tlogn)=Ω(logn)\big{\langle}\mathbf{L}(H),\mathbf{U}\big{\rangle}\geq\Omega\left({T\over\log n}\right)=\Omega(\log n) (18)

Now choose 𝐔=𝐗\mathbf{U}=\mathbf{X}^{*}, the minimizer to the semidefinite relaxation (cond-sdp) for ΨH\Psi_{H}^{*}. Certainly 𝐗Δ𝝁n×n\mathbf{X}^{*}\in\Delta_{\bm{\mu}}^{n\times n}, since 𝐗\mathbf{X}^{*} is feasible. If 𝗈𝗉𝗍H\mathsf{opt}_{H} is the optimal objective value of (cond-sdp), then by inequality (18)

𝗈𝗉𝗍H=𝐋(H),𝐗Ω(logn).\mathsf{opt}_{H}=\big{\langle}\mathbf{L}(H),\mathbf{X}^{*}\big{\rangle}\geq\Omega(\log n)\,.

Finally, Lemma 10 implies the required

ΨH𝗈𝗉𝗍H2Ω(logn)\Psi_{H}^{*}\geq\frac{\mathsf{opt}_{H}}{2}\geq\Omega(\log n)

To ensure that we obtain the above guarantee with good probability, note that the randomizes Johnson-Lindenstrauss transform fails to produce an embedding with the necessary distortion guarantees with probability O(1n)O\left({1\over n}\right). By the union bound, the probability that this procedure fails at all over any iteration is then O(𝗉𝗈𝗅𝗒𝗅𝗈𝗀(n)n)=on(1)O\left({\operatorname*{\mathsf{polylog}}(n)\over n}\right)=o_{n}(1). The only other randomized part of the algorithm is the balanced case of Algorithm 4. We know from Theorem 3 that one obtains a Ω(1/logn)\Omega\left(1/\log n\right)-separated set with 1O(1/n)1-O(1/n) probability at each iteration, which again by the union bound guarantees that this procedure will fail during the algorithm with probability O(𝗉𝗈𝗅𝗒𝗅𝗈𝗀(n)n)O\left({\operatorname*{\mathsf{polylog}}(n)\over n}\right). Hence the algorithm will produce the desired output with probability 1o(1)1-o(1). ∎

7.2 An (Ω(log2n),O(log3n))\left(\Omega(\log^{2}n),O(\log^{3}n)\right)-good Cut Strategy for Asymmetric Cut Functions

Let us now consider the case where {δh}hE\{\delta_{h}\}_{h\in E} are asymmetric. We describe our setup for the online linear optimization setting.

Given an instance of a (n,𝝁)(n,\bm{\mu})-generalized cut-matching game, we again fix the set of actions to be 𝒳=Δ𝝁n×n\mathcal{X}=\Delta_{\bm{\mu}}^{n\times n}. For round tt of the online optimization setting, we now define the feedback matrix to be the directed Laplacian of the approximate matching response DtD_{t}: 𝐅t=𝐋+(Dt)\mathbf{F}_{t}=\mathbf{L}_{+}(D_{t}) so that the losses become

𝐋+(Dt),𝐗\big{\langle}\mathbf{L}_{+}(D_{t}),\mathbf{X}\big{\rangle}

Similar to the symmetric case, our choices of action set and loss functions are guided by the trivial spectral relaxation for the directed ratio-cut of a graph.

Lemma 14.

Given any weighted, directed graph G=(V,EG,𝐰G,𝛍)G=\big{(}V,E_{G},\mathbf{w}^{G},\bm{\mu}\big{)}, we have 𝗈𝗉𝗍2ΨG\frac{\mathsf{opt}}{2}\leq\Psi_{G}^{*} where 𝗈𝗉𝗍\mathsf{opt} is the optimal objective value for the following semidefinite program.

min\displaystyle{\min} 𝐋+(G),𝐗\displaystyle\big{\langle}\mathbf{L}_{+}(G),\mathbf{X}\big{\rangle} (dir-cond-sdp)
s.t. 𝐋(K𝝁),𝐗1\displaystyle\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1
𝐗𝟎\displaystyle\mathbf{X}\succeq\mathbf{0}

In order to use a regret bound from Theorem 4, we need only address the first two issues presented in the symmetric case:

  1. (1)

    Ensuring Small Width: We need to ensure that the width 𝐌1/2𝐋+(Dt)𝐌1/2𝗌𝗉𝖾𝖼\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}_{+}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}} is small given any approximate matching response DtD_{t}. Similar to the symmetric case, we show this is bounded by a constant.

  2. (2)

    Ensuring Large Loss: We now need to ensure that 𝐋+(Dt),𝐗t\big{\langle}\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\big{\rangle} is large. A key difference present in the cut strategy for the asymmetric case is that ensuring large loss requires an additional step in rounding algorithm to control for terms in the loss that correspond to the embedding vector lengths.

To compute the matrix exponential update MMWUη,𝝁(𝐋+(D1),,𝐋+(Dt))\textup{{MMWU}}_{\eta,\bm{\mu}}\big{(}\mathbf{L}_{+}(D_{1}),\ldots,\mathbf{L}_{+}(D_{t})\big{)} in nearly-linear time, we can reuse the sketch provided in Section 7.1.3.

7.2.1 Ensuring Small Width

Whe width in the asymmetric case is still bounded by an absolute constant given any directed approximate matching response. We prove the following Lemma in Appendix A.5

Lemma 15.

Suppose D=(AB,ED,𝐰D,𝛍)D=\big{(}A\cup B,E_{D},\mathbf{w}^{D},\bm{\mu}\big{)} is an approximate matching response with respect to cut action (A,B)(A,B). Then 𝐌𝐋+(D)3𝐌-\mathbf{M}\preceq\mathbf{L}_{+}(D)\preceq 3\cdot\mathbf{M}.

This lemma implies that the regret guarantee of Theorem 4 only for step-size 0<η<10<\eta<1 when the losses are 𝐋+(D),𝐗\big{\langle}\mathbf{L}_{+}(D),\mathbf{X}\big{\rangle}. We will subsequently choose an η\eta satisfying this.

7.2.2 Ensuring Large Loss

In the asymmetric case, we wish to produce a cut action (At,Bt)(A_{t},B_{t}) such that no matter how the matching player \mathcal{M} plays DD, the edges of DD will be weighted such that the loss

𝐋+(Dt),𝐗=(i,j)EDtwijDt(𝐯i𝐯j22+𝐯i22𝐯j22)\big{\langle}\mathbf{L}_{+}(D_{t}),\mathbf{X}\big{\rangle}=\sum_{(i,j)\in E_{D_{t}}}w^{D_{t}}_{ij}\cdot\Big{(}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}\Big{)}

is large. The critical distinction from the symmetric case is the presence of the embedding vector lengths.

This is where one might be inclined to require that valid matching responses be Eulerian directed graphs. If the matching response DD is Eulerian, then the graph can be split into two parts where the edges are all oriented from AtA_{t} to BtB_{t}, and from BtB_{t} to AtA_{t}. One can then cancel the embedding vector lengths in the loss using the fact that the in- and out-degrees are identical for each vertex. This reduces the problem of constructing a good cut strategy back to that present in the symmetric case: one need only round a cut (At,Bt)(A_{t},B_{t}) such that a large amount of flow is always routed between vertex pairs whose embedding vectors are far in 22\ell_{2}^{2}-distance.

Absent of this trick, one then needs to control the length of the embedding vectors. A priori this seems difficult. To ensure that an approximation algorithm based on cut-matching games runs in nearly-linear time, some low-dimensional projection needs to be applied to the embedding. One then cannot hope to simultaneously control for both the distortion of the projected embedding, and the length of the projected vectors, suggesting that the matching response must be an Eulerian graph. We remark that if one tries to directly adapt the potential analysis of [36] or [54], then the same terms in the loss appear in lower bounding the potential progress made when the matching player routes a large flow. This might suggest why current works generalizing the cut-matching game to directed settings [46, 40] require the matching player to play Eulerian graphs.

However, if one is cognizant of the underlying SDP being used to certify lower bounds on the directed conductance of the state graph, then it is natural to consider rounding algorithms which directly target the SDP. Indeed, applying the rounding algorithm present in [2] (Algorithm 4) augments a σ\sigma-separated cut (S,T)(S,T) into a cut action (A,B)(A,B) such that any matching response across (A,B)(A,B) must have large loss. We describe this rounding procedure in Algorithm 7.2.2 directed-round-cut.

Algorithm 2 directed-round-cut. Input: A cut (S,T)(S,T) of VV, and vector embedding {𝐯i}iVd\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d}. 1. Find r>0r>0 such that 𝝁(Ar(0))𝝁(S)2\bm{\mu}\big{(}A\cap\mathcal{B}_{r}(0)\big{)}\geq\frac{\bm{\mu}(S)}{2} and 𝝁(Ar(0)¯)𝝁(S)2\bm{\mu}\big{(}A\cap\overline{\mathcal{B}_{r}(0)}\big{)}\geq\frac{\bm{\mu}(S)}{2}. 2. Define the following sets. S+\displaystyle S^{+} =def{iS:𝐯i22r2}S=def{iS:𝐯i22r2}\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}i\in S:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\leq r^{2}\big{\}}\qquad\qquad S^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}i\in S:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\geq r^{2}\big{\}} T+\displaystyle T^{+} =def{iT:𝐯i22r2}T=def{iT:𝐯i22r2}\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}i\in T:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\leq r^{2}\big{\}}\qquad\qquad T^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\big{\{}i\in T:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\geq r^{2}\big{\}} 3. If 𝝁(T+)𝝁(T)\bm{\mu}(T^{+})\leq\bm{\mu}(T^{-}), then let A=S+A=S^{+} and B=TB=T^{-}. Else, let A=T+A=T^{+} and B=SB=S^{-}. Output: the cut (A,B)(A,B)

The intuition for directed-round-cut is this: one does not need to route the flow bidirectionally across (S,T)(S,T), and subsequently produce an Eulerian demand graph. The SDP solution, and specifically the position of the median embedding vector, should dictate which direction flow needs to be routed in a given round the cut-matching game. We now prove our lower bound on the loss

Lemma 16.

Let 𝛍0V\bm{\mu}\in\mathbb{Z}_{\geq 0}^{V}, 𝐗Δ𝛍n×n\mathbf{X}\in\Delta_{\bm{\mu}}^{n\times n}, and {𝐯i}iVd\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d} be a vector embedding whose Gram matrix is 𝐗\mathbf{X}. For σ>0\sigma>0, suppose directed-round-cut is given (S,T)(S,T) that are σ\sigma-separated with respect to 𝛍\bm{\mu} and {𝐯i}iV\{\mathbf{v}_{i}\}_{i\in V} as input, and outputs (A,B)(A,B). Then, any approximate directed matching response D=(AB,ED,𝐰D)D=\big{(}A\cup B,E_{D},\mathbf{w}^{D}\big{)} with respect to A,BA,B satisfies the following:

(i,j)EDwijD(𝐯i𝐯j22+𝐯i22𝐯j22)σ8\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\Big{(}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}\Big{)}\geq\frac{\sigma}{8}
Proof.

We assume, without loss of generality, that directed-round-cut outputs (A,B)(A,B) such that A=S+A=S^{+} and B=TB=T^{-} as the analysis for the case where A=T+A=T^{+} and B=SB=S^{-} is identical.

In step (1) of directed-round-cut, r>0r>0 is chosen to be the length of the median element in SS with respect to measure 𝝁\bm{\mu} , we have 𝝁(A)𝝁(S)2\bm{\mu}(A)\geq\frac{\bm{\mu}(S)}{2}. Furthermore, if A=S+A=S^{+} and B=TB=T^{-}, then 𝝁(T)𝝁(T+)\bm{\mu}(T^{-})\geq\bm{\mu}(T^{+}) and hence 𝝁(T)𝝁(T)2\bm{\mu}(T^{-})\geq\frac{\bm{\mu}(T)}{2}. Consequently,

𝝁(A)𝝁(B)14𝝁(S)𝝁(T).\bm{\mu}(A)\bm{\mu}(B)\geq\frac{1}{4}\cdot\bm{\mu}(S)\bm{\mu}(T)\,.

Using the fact that (S,T)(S,T) are σ\sigma-separated, we derive for all iA,jBi\in A,j\in B:

𝝁(A)𝝁(B)𝝁(V)𝐯i𝐯j2214𝝁(S)𝝁(T)𝝁(V)𝐯i𝐯j22σ4i,j(V2)μiμj𝝁(V)𝐯i𝐯j22\frac{\bm{\mu}(A)\bm{\mu}(B)}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\frac{1}{4}\cdot\frac{\bm{\mu}(S)\bm{\mu}(T)}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\frac{\sigma}{4}\cdot\sum_{i,j\in\binom{V}{2}}\frac{\mu_{i}\mu_{j}}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}

Because 𝐗Δ𝝁n×n\mathbf{X}\in\Delta_{\bm{\mu}}^{n\times n}, we have 𝐋(K𝝁),𝐗1\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq 1 and

σ4i,j(V2)μiμj𝝁(V)𝐯i𝐯j22=σ4𝐋(K𝝁),𝐗σ4\frac{\sigma}{4}\cdot\sum_{i,j\in\binom{V}{2}}\frac{\mu_{i}\mu_{j}}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}=\frac{\sigma}{4}\cdot\big{\langle}\mathbf{L}(K_{\bm{\mu}}),\mathbf{X}\big{\rangle}\geq\frac{\sigma}{4}

Combining the above two calculations, we have the inequality

𝝁(A)𝝁(B)𝝁(V)𝐯i𝐯j22σ4𝐯i𝐯j22σ4𝝁(A)𝝁(V)𝝁(B)σ4𝝁(A)\frac{\bm{\mu}(A)\bm{\mu}(B)}{\bm{\mu}(V)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\frac{\sigma}{4}\qquad\Longleftrightarrow\qquad\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}\geq\frac{\sigma}{4\bm{\mu}(A)}\cdot\frac{\bm{\mu}(V)}{\bm{\mu}(B)}\geq\frac{\sigma}{4\bm{\mu}(A)}

where we have used the fact that 𝝁(B)𝝁(V)\bm{\mu}(B)\leq\bm{\mu}(V). Finally, because any approximate directed matching response is bipartite across (A,B)(A,B), and the total weight is large, we conclude

(i,j)EDwijD𝐯i𝐯j22σ4𝝁(A)(i,j)EDwijD=σ4𝝁(A)𝐰(e(A,B))σ8\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\frac{\sigma}{4\bm{\mu}(A)}\sum_{(i,j)\in E_{D}}w_{ij}^{D}=\frac{\sigma}{4\bm{\mu}(A)}\cdot\mathbf{w}\big{(}e(A,B)\big{)}\geq\frac{\sigma}{8}

as required. ∎

We also bound the running time of directed-round-cut.

Lemma 17.

directed-round-cut runs in time O(nd+nlogn)O(nd+n\log n) for n=|V|n=\lvert V\rvert.

Proof.

Computing the norm of a vector takes time O(d)O(d), so all norm computations take time O(nd)O(nd) overall. The vectors then need to be sorted by norm value, which takes time O(nlogn)O(n\log n). After that, completing the algorithm takes O(n)O(n) time. This yield the desired runtime. ∎

7.2.3 Completing the Analysis for Asymmetric Cut Functions

We are now ready to give our good cut strategy

Algorithm 3 𝒜𝖼𝗎𝗍\mathcal{A}^{\rightarrow}_{\mathsf{cut}}. Input: vertex measure 𝝁0V\bm{\mu}\in\mathbb{Z}^{V}_{\geq 0}, and directed approximate matching responses D1,,DtD_{1},\ldots,D_{t}. Parameters: step size η>0\eta>0. Do: The following. 1. Sample d=O(logn)d=O\big{(}\log n\big{)} random unit vectors 𝐮1,,𝐮d𝒮n1\mathbf{u}_{1},\ldots,\mathbf{u}_{d}\sim\mathcal{S}^{n-1}. 2. Denote 𝐖t=MMWUη,𝝁(𝐋+(D1),,𝐋+(Dt))\mathbf{W}_{t}=\textup{{MMWU}}_{\eta,\bm{\mu}}\big{(}\mathbf{L}_{+}(D_{1}),\ldots,\mathbf{L}_{+}(D_{t})\big{)}, and compute {𝐯^i(t)}iVd\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\subseteq\mathbb{R}^{d} given by (𝐯^i(t))j=𝐖t𝐮j,𝐞iiVj[d]\big{(}\hat{\mathbf{v}}^{(t)}_{i}\big{)}_{j}=\langle\mathbf{W}_{t}\mathbf{u}_{j},\mathbf{e}_{i}\rangle\qquad\qquad\forall\,i\in V\;j\in[d] using the truncated Taylor series via [9]. 3. Execute round-cut(d,{𝐯^i(t)}iV)\textup{{round-cut}}\big{(}d,\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\big{)} to produce the cut (S,T)(S,T) 4. Execute directed-round-cut((S,T),{𝐯^i(t)}iV)\textup{{directed-round-cut}}\big{(}(S,T),\big{\{}\hat{\mathbf{v}}^{(t)}_{i}\big{\}}_{i\in V}\big{)} to produce the cut (At,Bt)(A_{t},B_{t}). Output: The cut (At,Bt)(A_{t},B_{t}).

Figure 3: The cut strategy 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}}^{\rightarrow} for asymmetric polymatroidal cut functions {δh}hE\{\delta_{h}\}_{h\in E}.
Lemma 18.

Let G=(V,E,𝐰,𝛍)G=(V,E,\mathbf{w},\bm{\mu}) be an input submodular hypergraph equipped with asymmetric polymatroidal cut functions {δh}hE\{\delta_{h}\}_{h\in E} such that |V|=n\lvert V\rvert=n. Then 𝒜𝖼𝗎𝗍\mathcal{A}^{\rightarrow}_{\mathsf{cut}} is (Ω(log2n),O(log3n))\Big{(}\Omega\big{(}\log^{2}n\big{)},O\big{(}\log^{3}n\big{)}\Big{)}-good with probability O(1)O(1).

Proof.

We demonstrate that ΨHΩ(log2n)\Psi^{*}_{H}\geq\Omega(\log^{2}n) after T=O(log3n)T=O\big{(}{\log^{3}n}\big{)} rounds of the cut-matching game. Since the feedback matrices 𝐋+(Dt)\mathbf{L}_{+}(D_{t}) are not positive semidefinite, we use regret bound (6) from Theorem 4. Lemma 15 guarantees 𝐌1/2𝐋+(Dt)𝐌1/2𝗌𝗉𝖾𝖼3\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}_{+}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}\leq 3 for every t>0t>0, thus

1T𝐋+(H),𝐔\displaystyle\big{\langle}\tfrac{1}{T}\cdot\mathbf{L}_{+}(H),\mathbf{U}\big{\rangle} 1Tt=1T𝐋+(Dt),𝐗tηTt=1T𝐌1/2𝐋+(Dt)𝐌1/2𝗌𝗉𝖾𝖼2lognηT\displaystyle\geq\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\rangle-\frac{\eta}{T}\sum_{t=1}^{T}\,\big{\lVert}\mathbf{M}^{-1/2}\mathbf{L}_{+}(D_{t})\mathbf{M}^{-1/2}\big{\rVert}_{\mathsf{spec}}^{2}-\frac{\log n}{\eta T}
1Tt=1T𝐋+(Dt),𝐗t9ηlognηT\displaystyle\geq\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\rangle-9\eta-\frac{\log n}{\eta T}

holding for every 0<η<10<\eta<1.

Next, consider the tt-th round, and let {𝐯i(t)}iV\big{\{}\mathbf{v}_{i}^{(t)}\}_{i\in V} be the vector embedding whose Gram matrix is 𝐗t\mathbf{X}_{t}. Theorem 3 implies round-cut outputs Ω(1logn)\Omega\big{(}\frac{1}{\log n}\big{)}-separated sets (S,T)(S,T). Consequently, Lemma 16 implies that directed-round-cut produces a cut action (At,Bt)(A_{t},B_{t}) such that DtD_{t} satisfies

𝐋+(Dt),𝐗t=(i,j)EDtwijDt(𝐯i(t)𝐯j(t)22+𝐯i(t)22𝐯j(t)22)C8logn\langle\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\rangle=\sum_{(i,j)\in E_{D_{t}}}w_{ij}^{D_{t}}\cdot\Big{(}\big{\lVert}\mathbf{v}_{i}^{(t)}-\mathbf{v}_{j}^{(t)}\big{\rVert}_{2}^{2}+\big{\lVert}\mathbf{v}_{i}^{(t)}\big{\rVert}_{2}^{2}-\big{\lVert}\mathbf{v}_{j}^{(t)}\big{\rVert}_{2}^{2}\Big{)}\geq\frac{C}{8\cdot\log n}

for some absolute constant C1>0C_{1}>0. Consequently, the regret bound becomes

1Tt=1T𝐋+(Dt),𝐗t9ηlognηTC18logn9ηlognηT.\frac{1}{T}\sum_{t=1}^{T}\langle\mathbf{L}_{+}(D_{t}),\mathbf{X}_{t}\rangle-9\eta-\frac{\log n}{\eta T}\geq\frac{C_{1}}{8\cdot\log n}-9\eta-\frac{\log n}{\eta T}\,.

Let C2C_{2} be a constant satisfying: C18>6/C2\frac{C_{1}}{8}>6/\sqrt{C_{2}}. Choosing η=13c2logn\eta=\frac{1}{3\sqrt{c_{2}}\log n} and T=C2log3nT=C_{2}\cdot\log^{3}n, we have that:

C18logn9ηlognηT=1logn(C186C2)Ω(1logn)\frac{C_{1}}{8\cdot\log n}-9\eta-\frac{\log n}{\eta T}={1\over\log n}\left({C_{1}\over 8}-{6\over\sqrt{C_{2}}}\right)\geq\Omega\left({1\over\log n}\right)

This implies that every 𝐔Δ𝝁n×n\mathbf{U}\in\Delta_{\bm{\mu}}^{n\times n} satisfies:

1T𝐋+(H),𝐔Ω(1logn)\big{\langle}\tfrac{1}{T}\cdot\mathbf{L}_{+}(H),\mathbf{U}\big{\rangle}\geq\Omega\left({1\over\log n}\right)

Now, choosing 𝐔=𝐗\mathbf{U}=\mathbf{X}^{*} to be the minimizer to (dir-cond-sdp), and 𝗈𝗉𝗍H\mathsf{opt}_{H} the optimal objective value for (dir-cond-sdp), we have

𝗈𝗉𝗍H=𝐋+(H),𝐗Ω(Tlogn)=Ω(log2n)\mathsf{opt}_{H}=\big{\langle}\mathbf{L}_{+}(H),\mathbf{X}^{*}\big{\rangle}\geq\Omega\left({T\over\log n}\right)=\Omega\left({\log^{2}n}\right)

Finally, Lemma 14 implies the required

ΨH𝗈𝗉𝗍H2Ω(log2n)\Psi_{H}^{*}\geq\frac{\mathsf{opt}_{H}}{2}\geq\Omega\left({\log^{2}n}\right)

These guarantees can be shown to hold with super-constant probability by following the same argument as in Lemma 13. ∎

7.3 Completing the Cut Strategy: Proof of Theorem 12

For the sake of completion, let us now prove Theorem 12.

See 12

Proof.

Items (1) and (2) of statement are given by Lemmas 13, and 18 respectively. To bound the running time, notice that by the sketch in Section 7.1.3, the matrix multiplicative weight update can be computed in time bounded by O~(𝗌𝗉(H))\tilde{O}\big{(}\mathsf{sp}(H)\big{)}. Meanwhile, the round-cut and directed-round-cut steps run in time O(nd+nlogn)O(nd+n\log n) as given by Theorem 3, and lemma 17. Since dO(logn)d\leq O(\log n) for both 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}}^{\leftrightarrow} and 𝒜𝖼𝗎𝗍\mathcal{A}_{\mathsf{cut}}^{\rightarrow}, the total runtime is dominated by O~(𝗌𝗉(H))\tilde{O}\big{(}\mathsf{sp}(H)\big{)} as required. ∎

8 Constructing Separated Sets: Proof of Theorem 3

This section is devoted to proving Theorem 3 on the construction of separated sets. We will show that the algorithm round-cut in Figure 4 satisfies the requirements of the theorem. We denote by V=[n]V=[n] the index set for the vector embedding {𝐯i}iVd\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d} in the theorem statement and define the following notions of expected vector and total variance of the embedding:

𝐯𝖺𝗏𝗀\displaystyle\mathbf{v}_{\mathsf{avg}} =def1μ(V)iVμi𝐯i\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{1}{\mu(V)}\sum_{i\in V}\mu_{i}\mathbf{v}_{i}
Var𝝁(𝐯i)\displaystyle\operatorname*{Var}_{\bm{\mu}}(\mathbf{v}_{i}) =def1μ(V)iVμi𝐯i𝐯𝖺𝗏𝗀22=1(μ(V))2{i,j}(V2)μiμj𝐯i𝐯j2.\displaystyle\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\frac{1}{\mu(V)}\sum_{i\in V}\mu_{i}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{\mathsf{avg}}\rVert_{2}^{2}=\frac{1}{(\mu(V))^{2}}\sum_{\{i,j\}\in\binom{V}{2}}\mu_{i}\mu_{j}\|\mathbf{v}_{i}-\mathbf{v}_{j}\|^{2}.

To simplify our proof, we will assume, without loss of generality, that our embedding is both centered, i.e., 𝐯𝖺𝗏𝗀=0,\mathbf{v}_{\mathsf{avg}}=0, and normalized, i.e., Var𝝁(𝐯i)=1.\operatorname*{Var}_{\bm{\mu}}(\mathbf{v}_{i})=1. Under these assumptions, we will make use of the following lemma, which relates the contribution to the total variance of a set S¯\bar{S} to the that of its complement SS.

Lemma 19 (Fact 5.19 from [53]).

Let {𝐯i}iV\{\mathbf{v}_{i}\}_{i\in V} be a centered and normalized vector embedding. We have:

iS¯μi𝐯i22μ(S)(11(μ(V))2i,j(S2)μiμj𝐯i𝐯j22).\sum_{i\in\overline{S}}\mu_{i}\cdot\lVert\mathbf{v}_{i}\rVert_{2}^{2}\geq\mu(S)\cdot\bigg{(}1-\frac{1}{(\mu(V))^{2}}\cdot\sum_{i,j\in\binom{S}{2}}\mu_{i}\mu_{j}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\bigg{)}.

Our algorithm handles two cases, depending on whether there exist a set of small measure that contributes most of the variance of the embedding or not. To distinguish these two cases, we consider long vectors, i.e. vectors with large contributions to the variance of the embedding, and in particular we define the set RtVR_{t}\subseteq V of vertices with long vector embeddings:

Rt=def{iV:𝐯i22t}R_{t}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\Big{\{}i\in V:\lVert\mathbf{v}_{i}\rVert_{2}^{2}\leq t\Big{\}}

We then say that an embedding is (t,b)(t,b)-balanced if the variance due to vertices in RtR_{t} captures a bb-fraction of the total embedding variance, as per the following definition.

Definition 14 ((t,b)(t,b)-Balanced Embedding).

Given t>0t>0, and b>0b>0, a centered normalized vector embedding {𝐯id}iV\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V} is (t,b)(t,b)-balanced if the set RtR_{t} satisfies:

1(μ(V))2i,j(Rt2)𝝁(i)𝝁(j)𝐯i𝐯j22b.\frac{1}{(\mu(V))^{2}}\cdot\sum_{i,j\in\binom{R_{t}}{2}}\bm{\mu}(i)\bm{\mu}(j)\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq b\,.

We distinguish the two cases based on whether the vector embedding is (3,1/10)(3,\nicefrac{{1}}{{10}})-balanced. We remark that, given tt and bb, one can check if the embedding is (t,b)(t,b)-balanced in O(nd)O(nd) time by computing distances to the average vector of RtR_{t}. We start by describing the unbalanced case, which is where our main technical contribution lies.

Algorithm 4 round-cut. Input: A centered normalized vector embedding {𝐯id}iV.\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V}. 1. If {𝐯i}iVd\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d} is (3,110)\big{(}3,\frac{1}{10}\big{)}-balanced: Sample 𝐠\mathbf{g} uniformly at random from 𝒮d1\mathcal{S}^{d-1} and compute ri=d𝐠,𝐯ir_{i}=\sqrt{d}\cdot\langle\mathbf{g},\mathbf{v}_{i}\rangle for each iVi\in V. Sort rir_{i}. Assume without loss of generality that r1rnr_{1}\geq\ldots\geq r_{n} and define sweep cuts Stup={iV:rirt} and Stdown={iV:rirt}S_{t}^{up}=\{i\in V:r_{i}\geq r_{t}\}\text{ and }S_{t}^{down}=\{i\in V:r_{i}\leq r_{t}\} Let tstart=min{t:3𝝁(Stup)𝝁(V)}t_{start}=\min\{t:3\cdot\bm{\mu}(S_{t}^{up})\geq\bm{\mu}(V)\} and tend=max{t:3𝝁(Stdown)𝝁(V)}t_{end}=\max\{t:3\cdot\bm{\mu}(S_{t}^{down})\geq\bm{\mu}(V)\} Output (Ststartup,Stenddown)\big{(}S_{t_{start}}^{up},S_{t_{end}}^{down}\big{)}. 2. If {𝐯i}iVd\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d} is not (3,110)\big{(}3,\frac{1}{10}\big{)}-balanced: Sort the vectors by their length. Relabel vertices so that 𝐯1𝐯2𝐯n\|\mathbf{v}_{1}\|\geq\|\mathbf{v}_{2}\|\geq\ldots\geq\|\mathbf{v}_{n}\|. Let T={iV:𝐯i232}.T=\{i\in V:\|\mathbf{v}_{i}\|^{2}\leq\frac{3}{2}\}. Define sweep cuts St=def{iV:𝐯i𝐯t}.S_{t}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in V:\|\mathbf{v}_{i}\|\geq\|\mathbf{v}_{t}\|\}. For each sweep cut StS_{t} with 𝐯t3\|\mathbf{v}_{t}\|\geq 3: if the following condition holds(see Equation 20 and Equation 21), 𝝁(St)𝝁(V)𝐯t283100log(𝝁(V)),\frac{\bm{\mu}(S_{t})}{\bm{\mu}(V)}\cdot\frac{\|\mathbf{v}_{t}\|^{2}}{8}\geq\frac{3}{100\log(\bm{\mu}(V))}, output 1100log(𝝁(V))\frac{1}{100\log(\bm{\mu}(V))}-separated sets (St,T).(S_{t},T).

Figure 4: The round-cut algorithm.
The unbalanced case

In the unbalanced case, the round-cut algorithm sorts the embedding vectors according to their length and outputs separated sets SS and TT from the set of threshold cuts in this ordering. In particular, the set TT is always chosen to equal R3/2R_{3/2}, while the set SS is chosen by sweeping through all cuts in the ordering to find one that satisfies the required separation guarantee. We show that such a cut always exists. The following lemma, which is implicit in previous works [10, 2], follows from Markov’s Inequality and Lemma 19. We prove it for completeness in Appendix A.

Lemma 20.

Let {𝐯id}iV\{\mathbf{v}_{i}\in\mathbb{R}^{d}\}_{i\in V} be a centered normalized vector embedding that is not (3,1/10)(3,\nicefrac{{1}}{{10}})-balanced. Then:

μ(R3/2)13μ(V) and iR3¯μi𝐯i235μ(V).\mu(R_{3/2})\geq\frac{1}{3}\cdot\mu(V)\,\textrm{ and }\,\sum_{i\in\overline{R_{3}}}\mu_{i}\cdot\|\mathbf{v}_{i}\|^{2}\geq\frac{3}{5}\cdot\mu(V).

Setting T=R3/2T=R_{3/2}, it remains to that there exists some t3t\geq 3 such that all pairs iRt¯i\in\overline{R_{t}}, jTj\in T satisfy

𝐯i𝐯j22σ3μ(V)μ(Rt¯),\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\sigma\cdot\frac{3\mu(V)}{\mu(\overline{R_{t}})}, (20)

where σ=1/100log(μ(V))=Ω(1/logn).\sigma=\nicefrac{{1}}{{100\log(\mu(V))}}=\Omega(\nicefrac{{1}}{{\log n}}). For sake of contradiction, assume that no such tt exists. Then, for every t3,t\geq 3, there exists a pair iRt¯,jTi\in\overline{R_{t}},j\in T such that the squared distance between 𝐯i\mathbf{v}_{i} and 𝐯j\mathbf{v}_{j} is less than σ3μ(V)/μ(Rt¯).\sigma\cdot\nicefrac{{3\mu(V)}}{{\mu(\overline{R_{t}})}}. At the same time, we can lower bound such squared distance as follows:

𝐯i𝐯j2(𝐯i𝐯j)2(t3/2)2t8.\|\mathbf{v}_{i}-\mathbf{v}_{j}\|^{2}\geq(\|\mathbf{v}_{i}\|-\|\mathbf{v}_{j}\|)^{2}\geq(\sqrt{t}-\sqrt{\nicefrac{{3}}{{2}}})^{2}\geq\frac{t}{8}\,. (21)

This yields, for all t3t\geq 3:

μ(Rt¯)24σtμ(V).\mu(\overline{R_{t}})\leq\frac{24\sigma}{t}\cdot\mu(V).

Integrating over t[3,M]t\in[3,M] where M=defmaxiV,μi>0𝐯i2M\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\max_{i\in V,\mu_{i}>0}\|\mathbf{v}_{i}\|^{2}, we have:

iR3¯μi𝐯i2=3Mμ(Rt¯)𝑑t+ 3μ(R3¯)24σμ(V)3Mdtt+8σμ(V)(24log(M)+8)σμ(V).\sum_{i\in\overline{R_{3}}}\mu_{i}\|\mathbf{v}_{i}\|^{2}=\int_{3}^{M}\mu(\overline{R_{t}})\,dt\,+\,3\cdot\mu(\overline{R_{3}})\leq 24\sigma\cdot\mu(V)\cdot\int_{3}^{M}\frac{dt}{t}\,+8\sigma\cdot\mu(V)\leq(24\log(M)+8)\cdot\sigma\mu(V).

At the same time, Lemma 20 lower bounds the right hand side:

35μ(V)iR3¯μi𝐯i232log(μ(V))σμ(V),\frac{3}{5}\cdot\mu(V)\leq\sum_{i\in\overline{R_{3}}}\mu_{i}\|\mathbf{v}_{i}\|^{2}\leq 32\cdot\log(\mu(V))\cdot\sigma\cdot\mu(V)\,,

which implies σ1/54log(μ(V)),\sigma\geq\nicefrac{{1}}{{54\log(\mu(V))}}, yielding the required contradiction.

On the right hand side, by the normalization assumption and the fact that 𝝁0n\bm{\mu}\in\mathbb{Z}^{n}_{\geq 0}, we have MiVμi𝐯i2μ(V)=𝗉𝗈𝗅𝗒(n).M\leq\sum_{i\in V}\mu_{i}\|\mathbf{v}_{i}\|^{2}\leq\mu(V)=\mathsf{poly}(n). This completes the proof that a set Rt¯\overline{R_{t}} with the required separation guarantee exists. To find such cut, the round-cut algorithm avoids computing the minimum distance between Rt¯\overline{R_{t}} and TT for each threshold cut by noticing that the guarantee holds if we replace such minimum distance with the quantity t/8\nicefrac{{t}}{{8}}, as we did in Equation 21.

The balanced case

In the balanced case, the set R3R_{3} of short vectors captures a constant fraction of the vector embedding’s variance and a large fraction of the total volume 𝝁(V)\bm{\mu}(V) via Markov’s inequality. One can subsequently expect that random projection coupled with a search for a balanced sweep cut will work as a bulk of the vectors will be well separated from one another. The balanced case analysis is thus very similar to the analysis of SDP-based approximation algorithms for balanced separator [53]. In fact, it is simpler as one does not need to bound the conductance of such a cut and so we leave the proof of the following lemma to Appendix A.6.

Lemma 21.

Given V=[n]V=[n], let {𝐯id}iVd\{\mathbf{v}_{i}\in\mathbb{R}_{d}\}_{i\in V}\subseteq\mathbb{R}^{d} be a centered normalized vector embedding. If {𝐯i}iV\{\mathbf{v}_{i}\}_{i\in V} is a (3,110)\big{(}3,\frac{1}{10}\big{)}-balanced embedding, then algorithm round-cut outputs a pair (S,T)(S,T) of Ω(1logn)\Omega\big{(}\frac{1}{\log n}\big{)}-separated sets with high probability. m

Running Time

To bound the running time, consider the following. Checking if the embedding is (3,1/10)(3,\nicefrac{{1}}{{10}})-balanced takes O(nd)O(nd) time. This can seen by noting that the inequality in Definition 14 can be rewritten as:

1μ(Rt)iRtμ(i)𝐯i1μ(Rt)jRtμ(j)𝐯j22b.{1\over\mu(R_{t})}\sum_{i\in R_{t}}\mu(i)\Big{\|}\mathbf{v}_{i}-{1\over\mu(R_{t})}\sum_{j\in R_{t}}\mu(j)\mathbf{v}_{j}\Big{\|}_{2}^{2}\geq b.

Computing 22\ell_{2}^{2}-distances and vector projections takes O(d)O(d) time per vector, of which there are nn. Sorting takes O(nlogn)O(n\log n) time. There are at most O(n)O(n) sweep cuts to check in both the balanced and unbalanced case, and finding the appropriate cut in both cases takes O(1)O(1) time per cut. Checking the condition:

μ(St)μ(V)𝐯t283100log(μ(V)),\frac{\mu(S_{t})}{\mu(V)}\cdot\frac{\|\mathbf{v}_{t}\|^{2}}{8}\geq\frac{3}{100\log(\mu(V))},

requires time O(d)O(d) per cut. Hence the overall running time is at most O(nd+nlogn)O(nd+n\log n).

This completes the proof of Theorem 3.

9 Acknowledgements

AC is supported by NSF DGE 2140001. Part of this work was also done while AC was a visiting student at Bocconi University. LO is supported by NSF CAREER 1943510. The authors would like to thank anonymous reviewers for suggesting related work on polymatroidal networks, Thatchaphol Saranurak for suggesting related work on the use of cut-matching games beyond partitioning graphs, and Jeffrey Negrea for assisting on topics related to online learning. AC would like to thank Luca Trevisan for discussion regarding HDX construction, flow embedding techniques for certifying expansion of random regular graphs, and support while visiting Bocconi.

References

  • [1] Dimitris Achlioptas. Database-friendly random projections. In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 274–281, 2001.
  • [2] Amit Agarwal, Moses Charikar, Konstantin Makarychev, and Yury Makarychev. 𝒪(logn)\mathcal{O}(\sqrt{\log n})-approximation algorithms for Min UnCut, Min 2CNF Deletion, and directed cut problems. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 573–581, 2005.
  • [3] Sameer Agarwal, Kristin Branson, and Serge Belongie. Higher order learning with graphs. In Proceedings of the 23rd international conference on machine learning, pages 17–24. Association for Computing Machinery, 2006.
  • [4] Ravindra K Ahujia, Thomas L Magnanti, and James B Orlin. Network Flows: Theory, Algorithms and Applications. Prentice-Hall, 1993.
  • [5] Zeyuan Allen-Zhu, Zhenyu Liao, and Lorenzo Orecchia. Spectral sparsification and regret minimization beyond matrix multiplicative updates. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 237–245, 2015.
  • [6] Konstantinos Ameranis, Antares Chen, Lorenzo Orecchia, and Erasmo Tani. Efficient flow-based approximation algorithms for submodular hypergraph partitioning via a generalized cut-matching game. arXiv preprint arXiv:2301.08920, 2023.
  • [7] Reid Andersen and Kevin J. Lang. An algorithm for improving graph partitions. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’08, page 651–660, USA, 2008. Society for Industrial and Applied Mathematics.
  • [8] Sanjeev Arora, Elad Hazan, and Satyen Kale. O(logn){{O}}(\sqrt{\log n})-Approximation to SPARSEST CUT in O~(n2)\tilde{O}(n^{2}) Time. SIAM Journal on Computing, 39(5):1748–1771, January 2010.
  • [9] Sanjeev Arora and Satyen Kale. A combinatorial, primal-dual approach to semidefinite programs. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 227–236, 2007.
  • [10] Sanjeev Arora, Satish Rao, and Umesh Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):1–37, 2009.
  • [11] Francis Bach et al. Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, 6(2-3):145–373, 2013.
  • [12] Austin R Benson, David F Gleich, and Jure Leskovec. Higher-order organization of complex networks. Science, 353(6295):163–166, 2016.
  • [13] Aaron Bernstein, Maximilian Probst Gutenberg, and Thatchaphol Saranurak. Deterministic decremental sssp and approximate min-cost flow in almost-linear time. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 1000–1008. IEEE, 2022.
  • [14] Avrim Blum, John Hopcroft, and Ravindran Kannan. Foundations of data science. Cambridge University Press, 2020.
  • [15] Umit V Catalyurek and Cevdet Aykanat. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on parallel and distributed systems, 10(7):673–693, 1999.
  • [16] T-H Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang. Spectral properties of hypergraph laplacian and approximation algorithms. Journal of the ACM (JACM), 65(3):1–48, 2018.
  • [17] Yuk Hei Chan and Lap Chi Lau. On linear and semidefinite programming relaxations for hypergraph matching. Mathematical programming, 135(1-2):123–148, 2012.
  • [18] Moses Charikar, Konstantin Makarychev, and Yury Makarychev. Directed metrics and directed graph partitioning problems. In SODA, volume 6, pages 51–60, 2006.
  • [19] Chandra Chekuri, Sreeram Kannan, Adnan Raja, and Pramod Viswanath. Multicommodity flows and cuts in polymatroidal networks. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages 399–408, 2012.
  • [20] Li Chen, Rasmus Kyng, Yang P Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum flow and minimum-cost flow in almost-linear time. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 612–623. IEEE, 2022.
  • [21] Julia Chuzhoy. A Distanced Matching Game, Decremental APSP in Expanders, and Faster Deterministic Algorithms for Graph Cut Problems. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Proceedings, pages 2122–2213. Society for Industrial and Applied Mathematics, January 2023.
  • [22] Julia Chuzhoy, Yu Gao, Jason Li, Danupon Nanongkai, Richard Peng, and Thatchaphol Saranurak. A deterministic algorithm for balanced cut with applications to dynamic connectivity, flows, and beyond. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1158–1167. IEEE, 2020.
  • [23] Julia Chuzhoy and Sanjeev Khanna. A new algorithm for decremental single-source shortest paths with applications to vertex-capacitated flow and cut problems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 389–400, 2019.
  • [24] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60–65, 2003.
  • [25] Karen D Devine, Erik G Boman, Robert T Heaphy, Rob H Bisseling, and Umit V Catalyurek. Parallel hypergraph partitioning for scientific computing. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, pages 10–pp. IEEE, 2006.
  • [26] Alina Ene and Huy Nguyen. Random coordinate descent methods for minimizing decomposable submodular functions. In International Conference on Machine Learning, pages 787–795. PMLR, 2015.
  • [27] Alina Ene, Huy Nguyen, and László A Végh. Decomposable submodular function minimization: discrete and continuous. Advances in neural information processing systems, 30, 2017.
  • [28] Song Feng, Emily Heath, Brett Jefferson, Cliff Joslyn, Henry Kvinge, Hugh D Mitchell, Brenda Praggastis, Amie J Eisfeld, Amy C Sims, Larissa B Thackray, et al. Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC bioinformatics, 22(1):1–21, 2021.
  • [29] Elad Hazan, Satyen Kale, and Shai Shalev-Shwartz. Near-optimal algorithms for online matrix prediction. In Conference on Learning Theory, pages 38–1. JMLR Workshop and Conference Proceedings, 2012.
  • [30] Shohei Hidaka and Masafumi Oizumi. Fast and exact search for the partition with minimal information loss. PLoS One, 13(9):e0201126, 2018.
  • [31] Arun Jambulapati, Yin Tat Lee, Jerry Li, Swati Padmanabhan, and Kevin Tian. Positive semidefinite programming: mixed, parallel, and width-independent. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 789–802, 2020.
  • [32] Haotian Jiang, Tarun Kathuria, Yin Tat Lee, Swati Padmanabhan, and Zhao Song. A faster interior point method for semidefinite programming. In 2020 IEEE 61st annual symposium on foundations of computer science (FOCS), pages 910–918. IEEE, 2020.
  • [33] William Johnson and J. Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Conference in Modern Analysis and Probability, 26:189–206, 01 1982.
  • [34] Satyen Kale. Efficient algorithms using the multiplicative weights update method. Princeton University, 2007.
  • [35] Rohit Khandekar, Subhash Khot, Lorenzo Orecchia, and Nisheeth K Vishnoi. On a cut-matching game for the sparsest cut problem. Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2007-177, 6(7):12, 2007.
  • [36] Rohit Khandekar, Satish Rao, and Umesh Vazirani. Graph partitioning using single commodity flows. Journal of the ACM (JACM), 56(4):1–15, 2009.
  • [37] Steffen Klamt, Utz-Uwe Haus, and Fabian Theis. Hypergraphs and cellular networks. PLoS computational biology, 5(5):e1000385, 2009.
  • [38] Tsz Chiu Kwok, Lap Chi Lau, and Kam Chuen Tung. Cheeger inequalities for vertex expansion and reweighted eigenvalues. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 366–377. IEEE, 2022.
  • [39] Lap Chi Lau, Kam Chuen Tung, and Robert Wang. Cheeger inequalities for directed graphs and hypergraphs using reweighted eigenvalues. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1834–1847, 2023.
  • [40] Lap Chi Lau, Kam Chuen Tung, and Robert Wang. Fast algorithms for directed graph partitioning using flows and reweighted eigenvalues. arXiv preprint arXiv:2306.09128, 2023.
  • [41] Eugene L Lawler and Charles U Martel. Computing maximal “polymatroidal” network flows. Mathematics of Operations Research, 7(3):334–347, 1982.
  • [42] Tom Leighton and Satish Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM (JACM), 46(6):787–832, 1999.
  • [43] Pan Li and Olgica Milenkovic. Inhomogeneous hypergraph clustering with applications. Advances in neural information processing systems, 30, 2017.
  • [44] Pan Li and Olgica Milenkovic. Submodular hypergraphs: p-laplacians, cheeger inequalities and spectral clustering. In International Conference on Machine Learning, pages 3014–3023. PMLR, 2018.
  • [45] Yaowei Long and Thatchaphol Saranurak. Near-optimal deterministic vertex-failure connectivity oracles. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 1002–1010. IEEE, 2022.
  • [46] Anand Louis. Cut-matching games on directed graphs. arXiv preprint arXiv:1010.1047, 2010.
  • [47] Anand Louis. Hypergraph markov operators, eigenvalues and approximation algorithms. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 713–722, 2015.
  • [48] Anand Louis and Yury Makarychev. Approximation algorithms for hypergraph small-set expansion and small-set vertex expansion. Theory of Computing, 12(1):1–25, 2016.
  • [49] Danupon Nanongkai and Thatchaphol Saranurak. Dynamic spanning forest with worst-case update time: adaptive, las vegas, and 𝒪(n1/2ε)\mathcal{O}(n^{1/2-\varepsilon})-time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1122–1129, 2017.
  • [50] Mukund Narasimhan, Nebojsa Jojic, and Jeff A Bilmes. Q-clustering. Advances in Neural Information Processing Systems, 18, 2005.
  • [51] Lorenzo Orecchia. Fast Approximation Algorithms for Graph Partitioning using Spectral and Semidefinite-Programming Techniques. PhD thesis, EECS Department, University of California, Berkeley, May 2011.
  • [52] Lorenzo Orecchia, Konstantinos Ameranis, Charalampos Tsourakakis, and Kunal Talwar. Practical almost-linear-time approximation algorithms for hybrid and overlapping graph clustering. In International Conference on Machine Learning, pages 17071–17093. PMLR, 2022.
  • [53] Lorenzo Orecchia, Sushant Sachdeva, and Nisheeth K Vishnoi. Approximating the exponential, the lanczos method and an ~(m)\tilde{\mathcal{o}}(m)-time spectral algorithm for balanced separator. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 1141–1160, 2012.
  • [54] Lorenzo Orecchia, Leonard J Schulman, Umesh V Vazirani, and Nisheeth K Vishnoi. On partitioning graphs via single commodity flows. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 461–470, 2008.
  • [55] Jonah Sherman. Breaking the multicommodity flow barrier for 𝒪(logn)\mathcal{O}(\sqrt{\log n})-approximations to sparsest cut. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 363–372. IEEE, 2009.
  • [56] Zoya Svitkina and Lisa Fleischer. Submodular approximation: Sampling-based algorithms and lower bounds. SIAM Journal on Computing, 40(6):1715–1737, 2011.
  • [57] Charalampos E Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. Scalable motif-aware graph clustering. In Proceedings of the 26th International Conference on World Wide Web, pages 1451–1460, 2017.
  • [58] Nate Veldt. Cut-matching games for generalized hypergraph ratio cuts. In Proceedings of the ACM Web Conference 2023, WWW ’23, page 694–704, New York, NY, USA, 2023. Association for Computing Machinery.
  • [59] Nate Veldt, Austin R Benson, and Jon Kleinberg. Approximate decomposable submodular function minimization for cardinality-based components. Advances in Neural Information Processing Systems, 34:3744–3756, 2021.
  • [60] Nate Veldt, Austin R Benson, and Jon Kleinberg. Hypergraph cuts with general splitting functions. SIAM Review, 64(3):650–685, 2022.
  • [61] Wenyin Yang, Guojun Wang, Md Zakirul Alam Bhuiyan, and Kim-Kwang Raymond Choo. Hypergraph partitioning for social networks based on information entropy modularity. Journal of Network and Computer Applications, 86:59–71, 2017.
  • [62] Yuichi Yoshida. Cheeger inequalities for submodular transformations. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2582–2601. SIAM, 2019.

Appendix A Omitted Proofs

In this section, we prove some technical lemmata used in the body of the paper.

A.1 Our Results

See 1

Proof of Lemma 1.

Observe that the denominator of the ratio-cut objective is a monotone submodular cut function applied over the entire vertex set

δ𝝁(S)=min{𝝁(S),𝝁(S¯)},\delta_{\bm{\mu}}(S)=\min\{\bm{\mu}(S),\bm{\mu}(\bar{S})\},

Consequently, for every SVS\subseteq V:

ΨG(S)=hEwhδh(Sh)min{𝝁(S),𝝁(S¯)}=δG(S)δ𝝁(S)=δ¯G(𝟏S)δ¯𝝁(𝟏S),\Psi_{G}(S)=\frac{\sum_{h\in E}w_{h}\delta_{h}(S\cap h)}{\min\{\bm{\mu}(S),\bm{\mu}(\bar{S})\}}=\frac{\delta_{G}(S)}{\delta_{\bm{\mu}}(S)}={\overline{\delta}_{G}(\boldsymbol{1}^{S})\over\overline{\delta}_{\bm{\mu}}(\boldsymbol{1}^{S})},

which gives:

ΨGmin𝐱nδ¯G(𝐱)δ¯𝝁(𝐱).\Psi_{G}^{*}\geq\min_{\mathbf{x}\in\mathbb{R}^{n}}{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}.

We now prove the opposite inequality by showing the second part of the lemma. Notice that the value of the ratio in the right-hand side of (RC-NonCvx) is invariant under shifting all the coordinates by a fixed constant. Furthermore, both the numerator and the denominator of the right-hand side of (RC-NonCvx) are homogeneous and hence their ratio is invariant under scaling 𝐱\mathbf{x} by a constant, i.e., for any uu\in\mathbb{R}:

δ¯G(u𝐱)δ¯𝝁(u𝐱)=uδ¯G(𝐱)uδ¯𝝁(𝐱)=δ¯G(𝐱)δ¯𝝁(𝐱).{\overline{\delta}_{G}(u\cdot\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(u\cdot\mathbf{x})}={u\cdot\overline{\delta}_{G}(\mathbf{x})\over u\cdot\overline{\delta}_{\bm{\mu}}(\mathbf{x})}={\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}.

Hence, for any 𝐱\mathbf{x}, by letting 𝐱^\hat{\mathbf{x}} be defined as:

xi^=ximinjxjmaxj,k(xjxk),\hat{x_{i}}={x_{i}-\min_{j}x_{j}\over\max_{j,k}(x_{j}-x_{k})},

we have maxix^i=1\max_{i}\hat{x}_{i}=1, minix^i=0\min_{i}\hat{x}_{i}=0, and:

δ¯G(𝐱)δ¯𝝁(𝐱)=δ¯G(𝐱^)δ¯𝝁(𝐱^).{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}={\overline{\delta}_{G}(\hat{\mathbf{x}})\over\overline{\delta}_{\bm{\mu}}(\hat{\mathbf{x}})}.

We then have, as per Propositions 3.1 and 3.1 in [11]:

δ¯G(𝐱)δ¯𝝁(𝐱)=δ¯G(𝐱^)δ¯𝝁(𝐱^)=𝔼t[0,1][δG(St(𝐱^))]𝔼t[0,1][δ𝝁(St(𝐱^))].{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}={\overline{\delta}_{G}(\hat{\mathbf{x}})\over\overline{\delta}_{\bm{\mu}}(\hat{\mathbf{x}})}={\operatorname*{\mathbb{E}}_{t\sim[0,1]}[\delta_{G}(S_{t}(\hat{\mathbf{x}}))]\over\operatorname*{\mathbb{E}}_{t\sim[0,1]}[\delta_{\bm{\mu}}(S_{t}(\hat{\mathbf{x}}))]}.

Hence, there must exist some t[0,1]t\in[0,1] such that:

δ¯G(𝐱)δ¯𝝁(𝐱)δG(St(𝐱^))δ𝝁(St(𝐱^)).{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}\geq{\delta_{G}(S_{t}(\hat{\mathbf{x}}))\over\delta_{\bm{\mu}}(S_{t}(\hat{\mathbf{x}}))}.

This gives:

min𝐱nδ¯G(𝐱)δ¯𝝁(𝐱)=ΨG,\min_{\mathbf{x}\in\mathbb{R}^{n}}{\overline{\delta}_{G}(\mathbf{x})\over\overline{\delta}_{\bm{\mu}}(\mathbf{x})}=\Psi_{G}^{*},

as needed for the first part of the lemma. Moreover, computing and checking all of the sets {St(𝐱)}t[0,1]\{S_{t}(\mathbf{x})\}_{t\in[0,1]} only requires time O(nlogn)O(n\log n) for sorting the entries of 𝐱\mathbf{x} and time O(𝗌𝗉(G)+|V|)O(\mathsf{sp}(G)+|V|) for evaluating the numerator and denominator. ∎

A.2 Polymatroidal Cut Functions and Hypergraph Flows

See 2

Proof.

Let S,TVS,T\subseteq V and denote with S¯\overline{S} and T¯\overline{T} their complements in VV. We consider four possible cases, corresponding to four ways of determining δ(S)\delta(S) and δ(T)\delta(T). In the first two cases, we exploit the submodularity of FF and GG.

Case 1:

F(S)G(S¯)F(S)\leq G(\overline{S}) and F(T)G(T¯)F(T)\leq G(\overline{T}), which implies δ(S)=F(S)\delta(S)=F(S) and δ(T)=F(T).\delta(T)=F(T). We have:

δ(ST)+δ(ST)\displaystyle\delta(S\cup T)+\delta(S\cap T) =min{F(ST),G(S¯T¯)}+min{F(ST),G(S¯T¯)}\displaystyle=\min\{F(S\cup T),\;G(\overline{S}\cap\overline{T})\}+\min\{F(S\cap T),G(\overline{S}\cup\overline{T})\}
F(ST)+F(ST)\displaystyle\leq F(S\cup T)+F(S\cap T)
F(S)+F(T)=δ(S)+δ(T).\displaystyle\leq F(S)+F(T)=\delta(S)+\delta(T).
Case 2:

G(S¯)F(S)G(\overline{S})\leq F(S) and G(T¯)F(T)G(\overline{T})\leq F(T), which implies δ(S)=G(S¯)\delta(S)=G(\overline{S}) and δ(T)=G(T¯).\delta(T)=G(\overline{T}). We have:

δ(ST)+δ(ST)\displaystyle\delta(S\cup T)+\delta(S\cap T) =min{F(ST),G(S¯T¯)}+min{F(ST),G(S¯T¯)}\displaystyle=\min\{F(S\cup T),\;G(\overline{S}\cap\overline{T})\}+\min\{F(S\cap T),G(\overline{S}\cup\overline{T})\}
G(S¯T¯)+G(S¯T¯)\displaystyle\leq G(\overline{S}\cap\overline{T})+G(\overline{S}\cup\overline{T})
G(S¯)+G(T¯)=δ(S)+δ(T).\displaystyle\leq G(\overline{S})+G(\overline{T})=\delta(S)+\delta(T).

In the last two cases, we rely on the monotonocity of FF and GG.

Case 3:

F(S)G(S¯)F(S)\leq G(\overline{S}) and G(T¯)F(T)G(\overline{T})\leq F(T), which implies δ(S)=F(S)\delta(S)=F(S) and δ(T)=G(T¯)\delta(T)=G(\overline{T}). We have:

δ(ST)+δ(ST)\displaystyle\delta(S\cup T)+\delta(S\cap T) =min{F(ST),G(S¯T¯)}+min{F(ST),G(S¯T¯)}\displaystyle=\min\{F(S\cup T),\;G(\overline{S}\cap\overline{T})\}+\min\{F(S\cap T),G(\overline{S}\cup\overline{T})\}
G(S¯T¯)+F(ST)\displaystyle\leq G(\overline{S}\cap\overline{T})+F(S\cap T)
G(T¯)+F(S)=δ(S)+δ(T).\displaystyle\leq G(\overline{T})+F(S)=\delta(S)+\delta(T).
Case 4:

G(S¯)F(S)G(\overline{S})\leq F(S) and F(T)G(T¯)F(T)\leq G(\overline{T}), which implies δ(S)=G(S¯)\delta(S)=G(\overline{S}) and δ(T)=F(T)\delta(T)=F(T). We have:

δ(ST)+δ(ST)\displaystyle\delta(S\cup T)+\delta(S\cap T) =min{F(ST),G(S¯T¯)}+min{F(ST),G(S¯T¯)}\displaystyle=\min\{F(S\cup T),\;G(\overline{S}\cap\overline{T})\}+\min\{F(S\cap T),G(\overline{S}\cup\overline{T})\}
G(S¯T¯)+F(ST)\displaystyle\leq G(\overline{S}\cap\overline{T})+F(S\cap T)
G(S¯)+F(T)=δ(S)+δ(T).\displaystyle\leq G(\overline{S})+F(T)=\delta(S)+\delta(T).

It follows that δ\delta is submodular, as we have shown that for every S,TVS,T\subseteq V:

δ(ST)+δ(ST)δ(S)+δ(T).\delta(S\cup T)+\delta(S\cap T)\leq\delta(S)+\delta(T).

See 3

Proof.

Since FhF^{-}_{h} and Fh+F^{+}_{h} are monotone, for any 𝐱n\mathbf{x}\in\mathbb{R}^{n} there exists a value z=z(𝐱)z^{*}=z^{*}(\mathbf{x}) such that:

Fh({iV𝐱(i)z})Fh+({iV𝐱(i)<z}),F^{-}_{h}(\{i\in V\mid\mathbf{x}(i)\geq z\})\leq F^{+}_{h}(\{i\in V\mid\mathbf{x}(i)<z\}),

for all zzz\geq z^{*}, and:

Fh({iV𝐱(i)z})Fh+({iV𝐱(i)<z}),F^{-}_{h}(\{i\in V\mid\mathbf{x}(i)\geq z\})\geq F^{+}_{h}(\{i\in V\mid\mathbf{x}(i)<z\}),

for all z<zz<z^{*}. We then have, by Proposition 3.1(c) in [11]:

δ¯h(𝐱)\displaystyle\overline{\delta}_{h}(\mathbf{x}) =δ({iV𝐱(i)z})𝑑z\displaystyle=\int_{-\infty}^{\infty}\delta(\{i\in V\mid\mathbf{x}(i)\geq z\})dz
=z(𝐱)Fh+({iV𝐱(i)<z})𝑑z+z(𝐱)Fh({iV𝐱(i)z})𝑑z\displaystyle=\int_{\infty}^{z^{*}(\mathbf{x})}F^{+}_{h}(\{i\in V\mid\mathbf{x}(i)<z\})dz+\int_{z^{*}(\mathbf{x})}^{\infty}F^{-}_{h}(\{i\in V\mid\mathbf{x}(i)\geq z\})dz
=minν0Fh+({iV𝐱(i)ν<z})𝑑z+0Fh({iV𝐱(i)νz})𝑑z\displaystyle=\min_{\nu\in\mathbb{R}}\int_{\infty}^{0}F^{+}_{h}(\{i\in V\mid\mathbf{x}(i)-\nu<z\})dz+\int_{0}^{\infty}F^{-}_{h}(\{i\in V\mid\mathbf{x}(i)-\nu\geq z\})dz
=minνF¯h((𝐱ν𝟏)+)+F¯h+((𝐱ν𝟏)),\displaystyle=\min_{\nu\in\mathbb{R}}\bar{F}^{-}_{h}((\mathbf{x}-\nu{\bf 1})_{+})+\bar{F}^{+}_{h}((\mathbf{x}-\nu{\bf 1})_{-}),

as needed. ∎

See 4

Proof.

Since FhF_{h} is submodular, its Lovász extension F¯h\bar{F}_{h} is convex, and hence, by Fact 3:

δ¯h(𝐱)\displaystyle\bar{\delta}_{h}(\mathbf{x}) =minνF¯h((𝐱ν𝟏h)+)+F¯h((𝐱ν𝟏h))=minν2(12F¯h((𝐱ν𝟏h)+)+12F¯h((𝐱ν𝟏h)))\displaystyle=\min_{\nu\in\mathbb{R}}\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{+})+\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{-})=\min_{\nu\in\mathbb{R}}2\cdot\left({1\over 2}\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{+})+{1\over 2}\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{-})\right)
minν2F¯h(12(𝐱ν𝟏h)++12(𝐱ν𝟏h))\displaystyle\geq\min_{\nu\in\mathbb{R}}2\cdot\bar{F}_{h}\left({1\over 2}(\mathbf{x}-\nu{\bf 1}_{h})_{+}+{1\over 2}(\mathbf{x}-\nu{\bf 1}_{h})_{-}\right)
=minν2F¯h(12|𝐱ν𝟏h|)\displaystyle=\min_{\nu\in\mathbb{R}}2\cdot\bar{F}_{h}\left({1\over 2}|\mathbf{x}-\nu{\bf 1}_{h}|\right)
=minνF¯h(|𝐱ν𝟏h|),\displaystyle=\min_{\nu\in\mathbb{R}}\bar{F}_{h}\left(|\mathbf{x}-\nu{\bf 1}_{h}|\right),

where in the above we applied Jensen’s Inequality and the fact that F¯h\bar{F}_{h} is positive homogenous (Proposition 3.1 (e) from [11]). On the other hand, since FhF_{h} is monotone, we have:

δ¯h(𝐱)\displaystyle\bar{\delta}_{h}(\mathbf{x}) =minνF¯h((𝐱ν𝟏h)+)+F¯h((𝐱ν𝟏h))minνF¯h(|𝐱ν𝟏h|)+F¯h(|𝐱ν𝟏h|)=2minνF¯h(|𝐱ν𝟏h|),\displaystyle=\min_{\nu\in\mathbb{R}}\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{+})+\bar{F}_{h}((\mathbf{x}-\nu{\bf 1}_{h})_{-})\leq\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|)+\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|)=2\cdot\min_{\nu\in\mathbb{R}}\bar{F}_{h}(|\mathbf{x}-\nu{\bf 1}_{h}|),

where the inequality follow from Fact 1. This completes the proof. ∎

See 5

Proof.

Consider the hypergraph flows {𝐘e}eEH\{\mathbf{Y}^{e}\}_{e\in E^{H}} associated with the embedding HρGH\preceq_{\rho}G and let 𝐮EG\mathbf{u}\in\mathbb{R}^{E_{G}} be an arbitrary vector over the hyperedges. As each 𝐘e\mathbf{Y}^{e} routes demand weHw^{H}_{e} between the endpoints of e={u,v}EHe=\{u,v\}\in E_{H}, we have

weH(𝐱u𝐱v)+\displaystyle w^{H}_{e}\cdot(\mathbf{x}_{u}-\mathbf{x}_{v})_{+} =(dem(𝐘e),𝐱)+=(hEG𝐲he,𝐱h)+=(hEG𝐲he,𝐱huh𝟏h)+\displaystyle=(\langle\operatorname{dem}(\mathbf{Y}^{e}),\mathbf{x}\rangle)_{+}=\left(\sum_{h\in E_{G}}\langle\mathbf{y}^{e}_{h},\mathbf{x}_{h}\rangle\right)_{+}=\left(\sum_{h\in E_{G}}\langle\mathbf{y}^{e}_{h},\mathbf{x}_{h}-u_{h}{\bf 1}_{h}\rangle\right)_{+}
hEG(𝐲he)+,(𝐱huh𝟏h)++(𝐲he),(𝐱huh𝟏h).\displaystyle\leq\sum_{h\in E_{G}}\langle(\mathbf{y}^{e}_{h})_{+},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{+}\rangle+\langle(\mathbf{y}^{e}_{h})_{-},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{-}\rangle.

where the last equality relies on the fact that 𝐲heh𝟏h.\mathbf{y}^{e}_{h}\in\mathbb{R}^{h}\perp{\bf 1}_{h}. Summing over all edges in EhE_{h} yields:

e={i,j}EHwijH(𝐱i𝐱j)+\displaystyle\sum_{e=\{i,j\}\in E_{H}}w^{H}_{ij}\cdot(\mathbf{x}_{i}-\mathbf{x}_{j})_{+}\leq eEHhEG(𝐲he)+,(𝐱huh𝟏h)++(𝐲he),(𝐱huh𝟏h),\displaystyle\sum_{e\in E_{H}}\sum_{h\in E_{G}}\langle(\mathbf{y}^{e}_{h})_{+},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{+}\rangle+\langle(\mathbf{y}^{e}_{h})_{-},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{-}\rangle,
=\displaystyle= hEGeEH(𝐲he)+,(𝐱huh𝟏h)++eEH(𝐲he),(𝐱huh𝟏h)\displaystyle\sum_{h\in E_{G}}\left\langle\sum_{e\in E_{H}}(\mathbf{y}^{e}_{h})_{+},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{+}\right\rangle+\left\langle\sum_{e\in E_{H}}(\mathbf{y}^{e}_{h})_{-},(\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{-}\right\rangle
\displaystyle\leq ρhEGwhG(F¯h((𝐱huh𝟏h)+)+F¯h+((𝐱huh𝟏h)))\displaystyle\rho\cdot\sum_{h\in E_{G}}w^{G}_{h}\cdot\left(\bar{F}_{h}^{-}((\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{+})+\bar{F}_{h}^{+}((\mathbf{x}_{h}-u_{h}{\bf 1}_{h})_{-})\right)

The second inequality follows from Definition 9 and the characterization of Lovász extensions by the positive submodular polytope (see Proposition 3.4 in [11]). Finally, the first statement in the theorem follows by setting the arbitrary 𝐮\mathbf{u} to be the required minimizer. The specialization to cuts follows by plugging in 𝐱=𝟏S\mathbf{x}={\bf 1}^{S}.

In the symmetric case, the capacity constraints in the definition become simply, for all hEGh\in E_{G}:

eEh(𝐲he)+,eEh(𝐲he)ρwh𝒫+(Fh).\sum_{e\in E_{h}}(\mathbf{y}^{e}_{h})_{+}\,,\sum_{e\in E_{h}}(\mathbf{y}^{e}_{h})_{-}\in\rho\cdot w_{h}\cdot\mathcal{P}_{+}(F_{h}).

As a result, the orientation of each arc e=(i,j)EHe=(i,j)\in E_{H} can be discarded in this case, as the flow 𝐘e\mathbf{Y}^{e} can be oriented arbitrarily while satisfying the capacity constraint. Therefore, we can repeat the same analysis for the flow embedding given by {𝐘e}eEH\{-\mathbf{Y}^{e}\}_{e\in E_{H}}. Summing the positive and negative parts, we obtain the required statement. ∎

See 6

Proof Theorem 6.

Consider the hypergraph flow as a flow over the factor graph G^.\hat{G}. By a standard application of dynamic trees (see for example Lemma 3.8 in [55]) to the computation of flow-path decompositions [4] over G^\hat{G} , we can compute, in time O~(|E^|G)=O~(𝗌𝗉(G))\tilde{O}(|\hat{E}|_{G})=\tilde{O}(\mathsf{sp}(G)), a graph H=(V,EH,𝐰H),H=(V,E_{H},\mathbf{w}^{H}), such that |EH|=O~(𝗌𝗉(G))|E_{H}|=\tilde{O}(\mathsf{sp}(G)) and there exist hypergraph flows {𝐘e}eEH\{\mathbf{Y}^{e}\}_{e\in E_{H}} in GG with the following properties:

  • eEH𝐘e=𝐘,\sum_{e\in E_{H}}\mathbf{Y}^{e}=\mathbf{Y},

  • For any eEHe\in E_{H}, for any factor graph edge {i,h}\{i,h\}, the flow (𝐘he)i(\mathbf{Y}^{e}_{h})_{i} has the same sign as (𝐘h)i(\mathbf{Y}_{h})_{i}. In particular, for all hEGh\in E_{G}:

    eEH(𝐲he)+=(𝐲h)+ and eEH(𝐲he)=(𝐲h)\sum_{e\in E_{H}}(\mathbf{y}^{e}_{h})_{+}=(\mathbf{y}_{h})_{+}\;\textrm{ and }\;\sum_{e\in E_{H}}(\mathbf{y}^{e}_{h})_{-}=(\mathbf{y}_{h})_{-} (22)
  • For all e=(i,j)EHe=(i,j)\in E^{H}, we have demi(𝐘)0\operatorname{dem}_{i}(\mathbf{Y})\geq 0 and demj(𝐘)<0.\operatorname{dem}_{j}(\mathbf{Y})<0. The flow 𝐘e\mathbf{Y}^{e} routes weHw^{H}_{e} units of flow between vertex ii and vertex jj, i.e.,

    dem(𝐘e)=weH(𝟏i𝟏j).\operatorname{dem}(\mathbf{Y}^{e})=w^{H}_{e}({\bf 1}_{i}-{\bf 1}_{j}).

The last point proves the required statements about the bipartiteness of HH and its degrees. To complete the proof, we verify that HcongG(Y)GH\preceq_{\operatorname{cong}_{G}(Y)}G. By the definition of congestion in Equation 9, we have that, for all hEG:h\in E_{G}:

𝐲hcongG(𝐘)wh(δh).\mathbf{y}_{h}\in\operatorname{cong}_{G(\mathbf{Y})}\cdot w_{h}\cdot\mathcal{B}(\delta_{h}).

By the characterization of the base polytope in Equation 4.2, this implies that:

(𝐲h)+congG(𝐘)wh𝒫(Fh) and (𝐲h)congG(𝐘)wh𝒫(Fh+).(\mathbf{y}_{h})_{+}\in\operatorname{cong}_{G(\mathbf{Y})}\cdot w_{h}\cdot\mathcal{P}(F^{-}_{h})\;\textrm{ and }\;(\mathbf{y}_{h})_{-}\in\operatorname{cong}_{G(\mathbf{Y})}\cdot w_{h}\cdot\mathcal{P}(F^{+}_{h}).

Together with Equation 22 and Definition 9, this completes the proof. ∎

A.3 An O(logn)O(\sqrt{\log n})-Approximation via 22\ell_{2}^{2}-Metric Embeddings

See 2

Proof.

Given any cut (S,S¯)(S,\overline{S}) with μ(S)μ(S¯)\mu(S)\leq\mu(\overline{S}), for each iVi\in V, set the vector embedding of the vertices to

𝐯i={(μ(V)μ(S)μ(S¯)00)if iS𝟎otherwise.\mathbf{v}_{i}=\begin{cases}\begin{pmatrix}\sqrt{\frac{\mu(V)}{\mu(S)\mu(\bar{S})}}&0&\ldots&0\end{pmatrix}^{\top}&\mbox{if }i\in S\\ {\bf 0}&\mbox{otherwise.}\end{cases}

And for each hyperedge hEh\in E:

𝐯h={𝟎if δh(S)=Fh(S),(μ(V)μ(S)μ(S¯),0,,0)if δh(S)=Fh(S¯).\mathbf{v}_{h}=\begin{cases}\bm{0}&\text{if }\delta_{h}(S)=F_{h}(S),\\ \left(\sqrt{\mu(V)\over\mu(S)\mu(\overline{S})},0,\cdots,0\right)^{\top}&\text{if }\delta_{h}(S)=F_{h}(\overline{S}).\end{cases}

We then have:

𝐝ih=𝐯i𝐯h2.\mathbf{d}_{i}^{h}=\lVert\mathbf{v}_{i}-\mathbf{v}_{h}\rVert^{2}.

It is easy to see that these vector embedding satisfies the triangle inequality constraints. For the variance constraint, we have:

{i,j}Vμiμjμ(V)𝐯i𝐯j2=iSjS¯μiμjμ(V)μ(V)μ(S)μ(S¯)=1.\sum_{\{i,j\}\subseteq V}{\mu_{i}\mu_{j}\over\mu(V)}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}=\sum_{\begin{subarray}{c}i\in S\\ j\in\overline{S}\end{subarray}}{\mu_{i}\mu_{j}\over\mu(V)}\cdot{\mu(V)\over\mu(S)\cdot\mu(\overline{S})}=1.

Furthermore, this solution has objective value:

hEwhF¯h(𝐝h)=μ(V)μ(S)μ(S¯)hEwhδh(Sh)2hEwhδh(Sh)min{μ(S),μ(S¯)}=2ΨG(S).\sum_{h\in E}w_{h}\bar{F}_{h}(\mathbf{d}^{h})=\frac{\mu(V)}{\mu(S)\mu(\bar{S})}\cdot\sum_{h\in E}w_{h}\delta_{h}(S\cap h)\leq 2\cdot\frac{\sum_{h\in E}w_{h}\delta_{h}(S\cap h)}{\min\{\mu(S),\mu(\bar{S})\}}=2\Psi_{G}(S).

This shows that the combinatorial solution (S,S¯)(S,\overline{S}) can be mapped to a feasible solution to the RC-Metric while preserving the objective to within a factor of two. ∎

See 4

Proof.

Let (S,S¯)(S,\overline{S}) be a candidate solution to the minimum hypergraph ratio-cut problem satisfying μ(S)μ(S¯)\mu(S)\leq\mu(\overline{S}), with objective value κ\kappa, i.e.:

ΨG(S)=hEwhδh(Sh)μ(S)=κ.\Psi_{G}(S)=\frac{\sum_{h\in E}w_{h}\delta_{h}(S\cap h)}{\mu(S)}=\kappa.

We can construct a solution to the RC-Directed-Semimetric program with a objective value 4κ4\kappa as follows. Partition EE into E+EE^{+}\cup E^{-} where:

E+=def{hEδh(S)=Fh+(S¯)}andE=def{hEδh(S)=Fh(S)}.E^{+}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{h\in E\mid\delta_{h}(S)=F_{h}^{+}(\overline{S})\}\hskip 14.22636pt\text{and}\hskip 14.22636ptE^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{h\in E\mid\delta_{h}(S)=F_{h}^{-}(S)\}.

We can consider:

𝐯i={(μ(V)μ(S)μ(S¯),0,,0)if iS,𝟎if iS¯,and𝐯h={𝟎if hE,(μ(V)μ(S)μ(S¯),0,,0)if hE+.\mathbf{v}_{i}=\begin{cases}\left(\sqrt{\mu(V)\over\mu(S)\mu(\overline{S})},0,\cdots,0\right)^{\top}&\text{if }i\in S,\\ \bm{0}&\text{if }i\in\overline{S},\end{cases}\hskip 28.45274pt\text{and}\hskip 28.45274pt\mathbf{v}_{h}=\begin{cases}\bm{0}&\text{if }h\in E^{-},\\ \left(\sqrt{\mu(V)\over\mu(S)\mu(\overline{S})},0,\cdots,0\right)^{\top}&\text{if }h\in E^{+}.\end{cases}

and for any hEh\in E let 𝐝h+\mathbf{d}_{h}^{+} and 𝐝h\mathbf{d}_{h}^{-} be defined in terms of {𝐯i}iV\{\mathbf{v}_{i}\}_{i\in V} and {𝐯h}hE\{\mathbf{v}_{h}\}_{h\in E} so as to satisfy the equality constraints in the RC-Directed-Semimetric program. One can check that:

𝐝h+(i)={2μ(V)μ(S)μ(S¯)if iS¯ and hE+,0otherwise, and 𝐝h(i)={2μ(V)μ(S)μ(S¯)if iS and hE,0otherwise.\mathbf{d}_{h}^{+}(i)=\begin{cases}2\cdot{\mu(V)\over\mu(S)\mu(\overline{S})}&\text{if }i\in\overline{S}\text{ and }h\in E^{+},\\ 0&\text{otherwise,}\end{cases}\hskip 14.22636pt\text{ and }\hskip 14.22636pt\mathbf{d}_{h}^{-}(i)=\begin{cases}2\cdot{\mu(V)\over\mu(S)\mu(\overline{S})}&\text{if }i\in S\text{ and }h\in E^{-},\\ 0&\text{otherwise.}\end{cases}

It easy to see that all the triangle inequality constraints are satisfied. We can also verify that the variance constraint is satisfied:

{i,j}Vμ(i)μ(j)μ(V)𝐯i𝐯j22\displaystyle\sum_{\{i,j\}\subseteq V}{\mu(i)\mu(j)\over\mu(V)}\cdot\|\mathbf{v}_{i}-\mathbf{v}_{j}\|_{2}^{2} =iSjS¯μ(i)μ(j)μ(V)μ(V)μ(S)μ(S¯)=1.\displaystyle=\sum_{\begin{subarray}{c}i\in S\\ j\in\overline{S}\end{subarray}}{\mu(i)\mu(j)\over\mu(V)}\cdot{\mu(V)\over\mu(S)\cdot\mu(\overline{S})}=1.

Hence, the vectors defined above form a feasible solution to the RC-Directed-Semimetric program. The objective value of this solution is:

hEwhFh¯(𝐝h)+hE+whFh+¯(𝐝h+)\displaystyle\sum_{h\in E^{-}}w_{h}\overline{F_{h}^{-}}(\mathbf{d}_{h}^{-})+\sum_{h\in E^{+}}w_{h}\overline{F_{h}^{+}}(\mathbf{d}_{h}^{+}) =2(hEwhμ(V)μ(S)μ(S¯)Fh¯(𝟏S)+hE+whμ(V)μ(S)μ(S¯)Fh+¯(𝟏S¯))\displaystyle=2\cdot\left(\sum_{h\in E^{-}}w_{h}{\mu(V)\over{\mu(S)\mu(\overline{S})}}\overline{F_{h}^{-}}({\bf 1}_{S})+\sum_{h\in E^{+}}w_{h}{\mu(V)\over{\mu(S)\mu(\overline{S})}}\overline{F_{h}^{+}}({\bf 1}_{\overline{S}})\right)
=2μ(V)μ(S)μ(S¯)(hEwhFh(Sh)+hE+whFh+(S¯h))\displaystyle={2\mu(V)\over{\mu(S)\mu(\overline{S})}}\cdot\left(\sum_{h\in E^{-}}w_{h}{F_{h}^{-}}(S\cap h)+\sum_{h\in E^{+}}w_{h}{F_{h}^{+}}(\overline{{S}}\cap h)\right)
=2μ(V)μ(S)μ(S¯)(hEwhδh(Sh))\displaystyle={2\mu(V)\over{\mu(S)\mu(\overline{S})}}\cdot\left(\sum_{h\in E}w_{h}{\delta_{h}}(S\cap h)\right)
4hEwhδh(Sh)μ(S)\displaystyle\leq 4\cdot{\sum_{h\in E}w_{h}{\delta_{h}}(S\cap h)\over{\mu(S)}}
=4κ.\displaystyle=4\kappa.

completing the proof. ∎

See 8

Proof.

Since the relaxation is invariant to translations, we will assume, without loss of generality, that:

iVμ(i)𝐯i=0,\sum_{i\in V}\mu(i)\mathbf{v}_{i}=0, (23)

and that:

i,jVμ(i)μ(j)μ(V)𝐯i𝐯j22=1.\sum_{i,j\in V}{\mu(i)\mu(j)\over\mu(V)}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}=1. (24)

We divide the proof in two cases corresponding to the case in which the embedding is balanced and unbalanced respectively. In Case 1, the embedding {𝐯i}iV\{\mathbf{v}_{i}\}_{i\in V} satisfies:

{i,j}(R32)μ(i)μ(j)μ(V)𝐯i𝐯j21/10,\sum_{\{i,j\}\in\binom{R_{3}}{2}}{\mu(i)\mu(j)\over\mu(V)}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}\geq 1/10,

where:

Rt=def{iV𝐯i2tμ(V)}.R_{t}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\left\{i\in V\mid\lVert\mathbf{v}_{i}\rVert^{2}\leq{t\over\mu(V)}\right\}.

In this case, it is implicit in the results of Arora Rao and Vazirani [10], that there exists polynomial-time computable sets S,TVS,T\subseteq V satisfying μ(S),μ(T)=Ω(μ(V))\mu(S),\mu(T)=\Omega(\mu(V)), and:

miniSjT𝐯i𝐯j22Ω(1μ(V)logn).\min_{\begin{subarray}{c}i\in S\\ j\in T\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\Omega\left({1\over\mu(V)\sqrt{\log n}}\right). (25)

We construct the map ϕ\phi explicitly as follows. Let SS and TT be the sets satisfying (25). Let rr\in\mathbb{R} be a value such that:

μ({iS𝐯i2r})=μ({iS𝐯i2r}),\mu(\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\})=\mu(\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\geq r\}),

and define:

S=def{iS𝐯i2r},S+=def{iS𝐯i2r},S^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\},\hskip 28.45274ptS^{+}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\},

and:

T=def{iS𝐯i2r},T+=def{iS𝐯i2r}.T^{-}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\},\hskip 28.45274ptT^{+}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\{i\in S\mid\lVert\mathbf{v}_{i}\rVert_{2}\leq r\}.

Suppose, without loss of generality, that μ(T+)μ(T)\mu(T^{+})\geq\mu(T^{-})444If this condition does not hold, the rest of the argument still holds by exchanging the roles of T+T^{+} with TT^{-} and S+S^{+} with SS^{-}.. Note, in particular, that this implies that μ(S)μ(S)/2\mu(S^{-})\geq\mu(S)/2 and μ(T)μ(T)/2\mu(T^{-})\geq\mu(T)/2. Let:

ϕ(𝐯i)=minjT+𝐯i𝐯j22+𝐯i22𝐯j22.\phi(\mathbf{v}_{i})=\min_{j\in T^{+}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}.

As a simple consequence of the triangle inequality constraints, we have, for any i,kVEi,k\in V\cup E:

ϕ(𝐯i)ϕ(𝐯k)𝐯i𝐯j22+𝐯i22𝐯j22.\phi(\mathbf{v}_{i})-\phi(\mathbf{v}_{k})\leq\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}.

Let 𝐱V\mathbf{x}\in\mathbb{R}^{V} be the vector given by 𝐱(i)=ϕ(𝐯i)\mathbf{x}(i)=\phi(\mathbf{v}_{i}). Then:

{i,j}Vμ(i)μ(j)|𝐱(i)𝐱(j)|\displaystyle\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)|\mathbf{x}(i)-\mathbf{x}(j)| {i,j}Vμ(i)μ(j)(𝐱(i)𝐱(j))+iSjT+μ(i)μ(j)(𝐱(i)𝐱(j))+\displaystyle\geq\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)(\mathbf{x}(i)-\mathbf{x}(j))_{+}\geq\sum_{\begin{subarray}{c}i\in S^{-}\\ j\in T^{+}\end{subarray}}\mu(i)\mu(j)(\mathbf{x}(i)-\mathbf{x}(j))_{+}
=iSjT+μ(i)μ(j)(𝐱(i))+=μ(T+)iSμ(i)(𝐱(i))+\displaystyle=\sum_{\begin{subarray}{c}i\in S^{-}\\ j\in T^{+}\end{subarray}}\mu(i)\mu(j)(\mathbf{x}(i))_{+}=\mu(T^{+})\sum_{i\in S^{-}}\mu(i)(\mathbf{x}(i))_{+}
=μ(T+)iSμ(i)(minjT+𝐯i𝐯j22+𝐯i22𝐯j22)+\displaystyle=\mu(T^{+})\sum_{i\in S^{-}}\mu(i)\left(\min_{j\in T^{+}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+\lVert\mathbf{v}_{i}\rVert_{2}^{2}-\lVert\mathbf{v}_{j}\rVert_{2}^{2}\right)_{+}
μ(T+)iSμ(i)(minjT+𝐯i𝐯j22+r2r2)+\displaystyle\geq\mu(T^{+})\sum_{i\in S^{-}}\mu(i)\left(\min_{j\in T^{+}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}+r^{2}-r^{2}\right)_{+}
μ(T+)iSμ(i)(minjT+𝐯i𝐯j22)+\displaystyle\geq\mu(T^{+})\sum_{i\in S^{-}}\mu(i)\left(\min_{j\in T^{+}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\right)_{+}
μ(T+)μ(S)miniSjT+(𝐯i𝐯j22)+\displaystyle\geq\mu(T^{+})\mu(S^{-})\cdot\min_{\begin{subarray}{c}i\in S^{-}\\ j\in T^{+}\end{subarray}}\left(\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\right)_{+}
=μ(T+)μ(S)miniSjT+𝐯i𝐯j22\displaystyle=\mu(T^{+})\mu(S^{-})\cdot\min_{\begin{subarray}{c}i\in S^{-}\\ j\in T^{+}\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}
μ(T+)μ(S)miniSjT𝐯i𝐯j22\displaystyle\geq\mu(T^{+})\mu(S^{-})\cdot\min_{\begin{subarray}{c}i\in S\\ j\in T\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}
14μ(T)μ(S)miniSjT𝐯i𝐯j22\displaystyle\geq{1\over 4}\mu(T)\mu(S)\cdot\min_{\begin{subarray}{c}i\in S\\ j\in T\end{subarray}}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}
Ω(μ(V)2)Ω(1μ(V)logn)\displaystyle\geq\Omega(\mu(V)^{2})\cdot\Omega\left({1\over\mu(V)\sqrt{\log n}}\right)
Ω(μ(V)logn).\displaystyle\geq\Omega\left(\mu(V)\over\sqrt{\log n}\right).

Giving that:

{i,j}Vμ(i)μ(j)|𝐱(i)𝐱(j)|{i,j}Vμ(i)μ(j)𝐯i𝐯j22=1μ(V){i,j}Vμ(i)μ(j)|𝐱(i)𝐱(j)|Ω(1logn).{\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)|\mathbf{x}(i)-\mathbf{x}(j)|\over\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}}={1\over\mu(V)}\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)|\mathbf{x}(i)-\mathbf{x}(j)|\geq\Omega\left({1\over\sqrt{\log n}}\right).

Hence, the map ϕ\phi satisfies conditions 1 and 2 in the statement of the lemma. Moreover, it is easy to see that this map can be computed in polynomial time. This completes the proof for Case 1.

For Case 2, we consider what happens when Case 1 doesn’t hold, i.e.:

{i,j}(R32)μ(i)μ(j)μ(V)𝐯i𝐯j2<1/10.\sum_{\{i,j\}\in\binom{R_{3}}{2}}{\mu(i)\mu(j)\over\mu(V)}\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}<1/10.

Here, we have:

μ(R3/2)μ(V)13andiVR3μi𝐯i235.{\mu(R_{3/2})\over\mu(V)}\geq{1\over 3}\hskip 28.45274pt\text{and}\hskip 28.45274pt\sum_{i\in V\setminus R_{3}}\mu_{i}\cdot\lVert\mathbf{v}_{i}\rVert^{2}\geq{3\over 5}.

Where the first inequality follows by (23), (24) and Markov’s inequality. Let T=R3/2T=R_{3/2} and S=VR3S=V\setminus R_{3}. Consider the embedding:

ϕ(𝐯)=min𝐮T𝐯𝐮22+𝐯22𝐮22.\phi(\mathbf{v})=\min_{\mathbf{u}\in T}\lVert\mathbf{v}-\mathbf{u}\rVert^{2}_{2}+\lVert\mathbf{v}\rVert_{2}^{2}-\lVert\mathbf{u}\rVert_{2}^{2}.

Again, it is easy to see that the map ϕ\phi satisfies condition 1 of the lemma. Moreover, note that, for any 𝐯iS\mathbf{v}_{i}\in S:

|ϕ(𝐯i)||𝐯i22𝐮22|,|\phi(\mathbf{v}_{i})|\geq|\lVert\mathbf{v}_{i}\rVert^{2}_{2}-\lVert\mathbf{u}\rVert_{2}^{2}|,

for some 𝐮T\mathbf{u}\in T and hence:

|ϕ(𝐯i)||𝐯i22𝐮22|12𝐯i22.|\phi(\mathbf{v}_{i})|\geq|\lVert\mathbf{v}_{i}\rVert^{2}_{2}-\lVert\mathbf{u}\rVert_{2}^{2}|\geq{1\over 2}\lVert\mathbf{v}_{i}\rVert^{2}_{2}.

Let 𝐱V\mathbf{x}\in\mathbb{R}^{V} be the vector given by 𝐱(i)=ϕ(𝐯i)\mathbf{x}(i)=\phi(\mathbf{v}_{i}). We then have:

{i,j}Vμ(i)μ(j)|𝐱(i)𝐱(j)|{i,j}Vμ(i)μ(j)𝐯i𝐯j22\displaystyle{\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)|\mathbf{x}(i)-\mathbf{x}(j)|\over\sum_{\{i,j\}\subseteq V}\mu(i)\mu(j)\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert^{2}_{2}} =1μ(V){i,j}(V2)μ(i)μ(j)|𝐱(i)𝐱(j)|=μ(T)μ(V)iSμ(i)|𝐱(i)|\displaystyle={1\over\mu(V)}\sum_{\{i,j\}\in\binom{V}{2}}{\mu(i)\mu(j)}|\mathbf{x}(i)-\mathbf{x}(j)|={\mu(T)\over\mu(V)}\sum_{i\in S}\mu(i)|\mathbf{x}(i)|
μ(T)2μ(V)iSμi𝐯i22\displaystyle\geq{\mu(T)\over 2\mu(V)}\sum_{i\in S}\mu_{i}\lVert\mathbf{v}_{i}\rVert_{2}^{2}
121335=Ω(1).\displaystyle\geq{1\over 2}\cdot{1\over 3}\cdot{3\over 5}=\Omega(1).

Note that the bound obtained in this case is tighter, i.e. Ω(1/logn)\Omega(\nicefrac{{1}}{{\sqrt{\log n}}}) has been replaced by a constant. ∎

A.4 Localized Convex Formulations and Ratio-Cut Improvement

See 8

Proof of Lemma 8.

Let us proceed with taking the conjugate dual. Starting with (CI-Primal):

min𝐱V𝐬,𝐱μ=1hEwhδ¯h(𝐱)\displaystyle\min_{\begin{subarray}{c}\mathbf{x}\in\mathbb{R}^{V}\\ \langle\mathbf{s},\mathbf{x}\rangle_{\mu}=1\end{subarray}}\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x}) =min𝐱VhEwhδ¯h(𝐱)+maxα(1𝐬,𝐱μ)\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x})+\max_{\alpha\in\mathbb{R}}\Big{(}1-\langle\mathbf{s},\mathbf{x}\rangle_{\mu}\Big{)}
=min𝐱Vmaxα{α+hEwhδ¯h(𝐱)α(μ𝐬),𝐱}.\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\alpha\in\mathbb{R}}\bigg{\{}\alpha+\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x})-\big{\langle}\alpha\big{(}\mu\circ\mathbf{s}\big{)},\mathbf{x}\big{\rangle}\bigg{\}}\,.

Recall that the Lovász extension is given by

δ¯h(𝐱)=max𝐟(δh)𝐟,𝐱,\bar{\delta}_{h}(\mathbf{x})=\max_{\mathbf{f}\in\mathcal{B}(\delta_{h})}\langle\mathbf{f},\mathbf{x}\rangle\,,

and hence:

min𝐱Vmaxα{α+hEwhδ¯h(𝐱)α(μ𝐬),𝐱}\displaystyle\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\alpha\in\mathbb{R}}\bigg{\{}\alpha+\sum_{h\in E}w_{h}\cdot\bar{\delta}_{h}(\mathbf{x})-\big{\langle}\alpha\big{(}\mu\circ\mathbf{s}\big{)},\mathbf{x}\big{\rangle}\bigg{\}} =min𝐱Vmaxα{α+hEmax𝐟hwh(δh)𝐟h,𝐱α(μ𝐬),𝐱}\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\alpha\in\mathbb{R}}\bigg{\{}\alpha+\sum_{h\in E}\max_{\mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})}\langle\mathbf{f}_{h},\mathbf{x}\rangle-\big{\langle}\alpha\big{(}\mu\circ\mathbf{s}\big{)},\mathbf{x}\big{\rangle}\bigg{\}}
=min𝐱Vmaxα,𝐟hV𝐟hwh(δh){α+hE𝐟h,𝐱α(μ𝐬),𝐱}\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\begin{subarray}{c}\alpha\in\mathbb{R},\mathbf{f}_{h}\in\mathbb{R}^{V}\\ \mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})\end{subarray}}\bigg{\{}\alpha+\sum_{h\in E}\langle\mathbf{f}_{h},\mathbf{x}\rangle-\big{\langle}\alpha\big{(}\mu\circ\mathbf{s}\big{)},\mathbf{x}\big{\rangle}\bigg{\}}
=min𝐱Vmaxα,𝐟hV𝐟hwh(δh){α+hE𝐟hα(μ𝐬),𝐱}.\displaystyle=\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\begin{subarray}{c}\alpha\in\mathbb{R},\mathbf{f}_{h}\in\mathbb{R}^{V}\\ \mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})\end{subarray}}\bigg{\{}\alpha+\bigg{\langle}\sum_{h\in E}\mathbf{f}_{h}-\alpha\big{(}\mu\circ\mathbf{s}\big{)},\,\mathbf{x}\bigg{\rangle}\bigg{\}}\,.

Strong duality always holds when the feasible region is polyhedron. Hence, we have:

min𝐱Vmaxα,𝐟hV𝐟hwh(δh){α+hE𝐟hα(μ𝐬),𝐱}=maxα,𝐟hV𝐟hwh(δh)α+min𝐱VhE𝐟hα(μ𝐬),𝐱,\min_{\mathbf{x}\in\mathbb{R}^{V}}\max_{\begin{subarray}{c}\alpha\in\mathbb{R},\mathbf{f}_{h}\in\mathbb{R}^{V}\\ \mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})\end{subarray}}\bigg{\{}\alpha+\bigg{\langle}\sum_{h\in E}\mathbf{f}_{h}-\alpha\big{(}\mu\circ\mathbf{s}\big{)},\,\mathbf{x}\bigg{\rangle}\bigg{\}}=\max_{\begin{subarray}{c}\alpha\in\mathbb{R},\mathbf{f}_{h}\in\mathbb{R}^{V}\\ \mathbf{f}_{h}\in w_{h}\cdot\mathcal{B}(\delta_{h})\end{subarray}}\alpha+\min_{\mathbf{x}\in\mathbb{R}^{V}}\bigg{\langle}\sum_{h\in E}\mathbf{f}_{h}-\alpha\big{(}\mu\circ\mathbf{s}\big{)},\,\mathbf{x}\bigg{\rangle},

which is equivalent to the required linear program by the definition of congestion (Equation 9) and demand (Equation 10)for hypergraph flows. ∎

See 9

Proof.

Let D=1αH.D=\frac{1}{\alpha}\cdot H. The sparsity of DD also follows directly from Theorem 6. By the same theorem, we have H1GH\preceq_{1}G, as congG(𝐘)1\operatorname{cong}_{G}(\mathbf{Y})\leq 1 by Property 1 of approximate dual solutions. Hence, the scaled graph DD embeds into GG with congestion 1α,\frac{1}{\alpha}, as required. Taking into account the same scaling by 1α\frac{1}{\alpha}, the statements on the bipartiteness and degree bounds of DD follow directly from Property 2 and 3 of approximate dual solutions via Theorem 6. Similarly, the largeness of DD follows form Property 2. ∎

A.5 An O(logn)O(\log n)-Approximation via Generalized Cut-Matching Games

See 1

See 2

Proof.

By Definition 12, each dual graph certificate DtD_{t} played by the matching player has O~(𝗌𝗉(G))\tilde{O}(\mathsf{sp}(G)) edges. Hence, HtH_{t} has at most O~(𝗌𝗉(G))\tilde{O}(\mathsf{sp}(G)) edges for all iterations tg(n).t\leq g(n). It follows that the running time of the cut strategy at every round is bound by O~(𝗌𝗉(G))\tilde{O}(\mathsf{sp}(G)). On the matching side, the running time of the approximate primal-dual oracle is given by Theorem 9 and  10. Both dominate or are asymptotically equivalent to O~(𝗌𝗉(G))\tilde{O}(\mathsf{sp}(G)). ∎

See 15

Proof.

Fix any 𝐱V\mathbf{x}\in\mathbb{R}^{V} such that 𝐱,𝐌𝐱=1\big{\langle}\mathbf{x},\mathbf{M}\mathbf{x}\big{\rangle}=1. We can write the quadratic form 𝐱,𝐋+(D)𝐱\big{\langle}\mathbf{x},\mathbf{L}_{+}(D)\mathbf{x}\big{\rangle} as

𝐱,𝐋+(D)𝐱=(i,j)EDwijD((xixj)2+xi2xj2)\big{\langle}\mathbf{x},\mathbf{L}_{+}(D)\mathbf{x}\big{\rangle}=\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}(x_{i}-x_{j})^{2}+x_{i}^{2}-x_{j}^{2}\big{)}

For the lower bound, note that

(i,j)EDwijD((xixj)2+xi2xj2)(i,j)EDwijDxj2=jBxj2i(j)wijD=jBxj2degD(j)\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}(x_{i}-x_{j})^{2}+x_{i}^{2}-x_{j}^{2}\big{)}\geq-\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot x_{j}^{2}=-\sum_{j\in B}x_{j}^{2}\sum_{i\in\partial^{-}(j)}w_{ij}^{D}=-\sum_{j\in B}x_{j}^{2}\cdot\deg_{D}(j)

Using the fact that the degrees of iVi\in V in DD is bounded above by μi\mu_{i}, we have

jBxj2degD(j)jBxj2μj𝐱,𝐌𝐱=1-\sum_{j\in B}x_{j}^{2}\cdot\deg_{D}(j)\leq-\sum_{j\in B}x_{j}^{2}\cdot\mu_{j}\leq-\big{\langle}\mathbf{x},\mathbf{M}\mathbf{x}\big{\rangle}=-1

For the upper bound, note that

(i,j)EDwijD((xixj)2+xi2xj2)(i,j)EDwijD((xixj)2+xi2)(i,j)EDwijD(2(xi2+xj2)+xi2)\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}(x_{i}-x_{j})^{2}+x_{i}^{2}-x_{j}^{2}\big{)}\leq\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}(x_{i}-x_{j})^{2}+x_{i}^{2}\big{)}\leq\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}2\cdot(x_{i}^{2}+x_{j}^{2})+x_{i}^{2}\big{)}

Let’s then exactract the dependence on the degree of each vertex.

(i,j)EDwijD(2(xi2+xj2)+xi2)\displaystyle\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot\big{(}2\cdot(x_{i}^{2}+x_{j}^{2})+x_{i}^{2}\big{)} =2(i,j)EDwijD(xi2+xj2)+(i,j)EDwijDxi2\displaystyle=2\cdot\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot(x_{i}^{2}+x_{j}^{2})+\sum_{(i,j)\in E_{D}}w_{ij}^{D}\cdot x_{i}^{2}
=2(iAxi2j+(i)wijD+jBxj2i(j)wijD)+iAxi2j+(i)wijD\displaystyle=2\cdot\bigg{(}\sum_{i\in A}x_{i}^{2}\sum_{j\in\partial^{+}(i)}w_{ij}^{D}+\sum_{j\in B}x_{j}^{2}\sum_{i\in\partial^{-}(j)}w_{ij}^{D}\bigg{)}+\sum_{i\in A}x_{i}^{2}\sum_{j\in\partial^{+}(i)}w_{ij}^{D}
=2(iAxi2degD(i)+jBxj2degD(j))+iAxi2degD(i)\displaystyle=2\cdot\bigg{(}\sum_{i\in A}x_{i}^{2}\cdot\deg_{D}(i)+\sum_{j\in B}x_{j}^{2}\cdot\deg_{D}(j)\bigg{)}+\sum_{i\in A}x_{i}^{2}\cdot\deg_{D}(i)
=2iVxi2degD(i)+iAxi2degD(i)\displaystyle=2\sum_{i\in V}x_{i}^{2}\cdot\deg_{D}(i)+\sum_{i\in A}x_{i}^{2}\cdot\deg_{D}(i)

Finally, using the fact that the degrees are bounded by 𝝁\bm{\mu}, we have

2iVxi2degD(i)+iAxi2degD(i)2iVxi2μi+iAxi2μi3iVxi2μi=32\sum_{i\in V}x_{i}^{2}\cdot\deg_{D}(i)+\sum_{i\in A}x_{i}^{2}\cdot\deg_{D}(i)\leq 2\sum_{i\in V}x_{i}^{2}\cdot\mu_{i}+\sum_{i\in A}x_{i}^{2}\cdot\mu_{i}\leq 3\sum_{i\in V}x_{i}^{2}\cdot\mu_{i}=3

thus establishing the required bound. ∎

See 20

Proof.

Because the embedding is normalized, the expect squared length of a vector is 11. Markov’s inequality then yields:

μ(R3/2)(113/2)μ(V)=13μ(V) and μ(R3)(113)μ(V)=23μ(V).\mu(R_{3/2})\geq\bigg{(}1-\frac{1}{3/2}\bigg{)}\cdot\mu(V)=\frac{1}{3}\cdot\mu(V)\,\textrm{ and }\,\mu(R_{3})\geq\bigg{(}1-\frac{1}{3}\bigg{)}\cdot\mu(V)=\frac{2}{3}\cdot\mu(V)\,.

For the second part, by Lemma 19 and the definition of balanced embedding, we have that:

iR3¯μi𝐯i22𝝁(R3)(11(𝝁(V))2i,j(R32)μiμj𝐯i𝐯j22)23μ(V)910=35μ(V).\sum_{i\in\overline{R_{3}}}\mu_{i}\cdot\lVert\mathbf{v}_{i}\rVert_{2}^{2}\geq\bm{\mu}(R_{3})\cdot\bigg{(}1-\frac{1}{(\bm{\mu}(V))^{2}}\cdot\sum_{i,j\in\binom{R_{3}}{2}}\mu_{i}\mu_{j}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\bigg{)}\geq\frac{2}{3}\cdot\mu(V)\cdot\frac{9}{10}=\frac{3}{5}\cdot\mu(V).

A.6 Constructing Separated Sets: Proof of Theorem 3

We now complete the argument that round-cut outputs separated sets when the given vector embedding is balanced. We begin by recalling a fact about uniformly sampled vectors on the sphere.

Lemma 22.

Given 𝐯d\mathbf{v}\in\mathbb{R}^{d}, and 𝐠\mathbf{g} a uniformly sampled random vector in 𝒮d1\mathcal{S}^{d-1}, the following hold.

  1. 1.

    𝔼𝐠𝐠,𝐯2=𝐯2d\operatorname*{\mathbb{E}}_{\mathbf{g}}\langle\mathbf{g},\mathbf{v}\rangle^{2}=\frac{\lVert\mathbf{v}\rVert^{2}}{d}

  2. 2.

    For any δ>d16\delta>\frac{d}{16}, we have Pr𝐠(𝐠,𝐯2δ𝐯2d)eδ/4\Pr_{\mathbf{g}}\Big{(}\langle\mathbf{g},\mathbf{v}\rangle^{2}\geq\delta\cdot\frac{\lVert\mathbf{v}\rVert^{2}}{d}\Big{)}\leq e^{-\delta/4}.

We also require a lemma proved in [51] which states that, with constant probability, one can extract two balanced partitions where projection lengths are well separated when projecting along a random direction.

Lemma 23 (Lemma 5.5.6 item (3) of [51]).

Given a set VV such that |V|=n\lvert V\rvert=n, and a (3,110)\big{(}3,\frac{1}{10}\big{)}-balanced embedding {𝐯i}iVd\{\mathbf{v}_{i}\}_{i\in V}\subseteq\mathbb{R}^{d}, let ri=d𝐠,𝐯ir_{i}=\sqrt{d}\cdot\langle\mathbf{g},\mathbf{v}_{i}\rangle for a 𝐠𝒮d1\mathbf{g}\in\mathcal{S}^{d-1} and each iVi\in V. Assume, without loss of generality, that r1rnr_{1}\geq\ldots\geq r_{n}. Then, there exists an absolute constant r>0r>0 such that, with O(1)O(1) probability over a uniformly random choice of 𝐠𝒮d1\mathbf{g}\in\mathcal{S}^{d-1}, there exists 1a<bn1\leq a<b\leq n satisfying the following:

  1. 1.

    μ({1,,a})13μ(V)\mu(\{1,\ldots,a\})\geq\frac{1}{3}\cdot\mu(V),

  2. 2.

    μ({b,,n})13μ(V)\mu(\{b,\ldots,n\})\geq\frac{1}{3}\cdot\mu(V),

  3. 3.

    (rarb)2rVarμ(𝐯i)\big{(}r_{a}-r_{b}\big{)}^{2}\geq r\cdot\operatorname*{Var}_{\mu}(\mathbf{v}_{i}).

Using the above two lemmata, we can now prove that round-cut produces a robustly separated set in the balanced case.

See 21

Proof.

Define 1\mathcal{E}_{1} to be the event that there exist 1a<bn1\leq a<b\leq n such that the following hold simultaneously: (1) μ({1,,a})13μ(V)\mu(\{1,\ldots,a\})\geq\frac{1}{3}\cdot\mu(V), (2) μ({b,,n})13μ(V)\mu(\{b,\ldots,n\})\geq\frac{1}{3}\cdot\mu(V), and (3) (rarb)2rμ(V)(r_{a}-r_{b})^{2}\geq\frac{r}{\mu(V)}. Next, for a constant c1>0c_{1}>0 to be chosen subsequently, and any i,jVi,j\in V, define the event 2(i,j)\mathcal{E}_{2}(i,j) as

2(i,j)=def{(rirj)2<(4c1logn)𝐯i𝐯j22}and2=defi,jV2(i,j).\mathcal{E}_{2}(i,j)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\Big{\{}(r_{i}-r_{j})^{2}<\big{(}4c_{1}\cdot\log n\big{)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\Big{\}}\qquad\textup{and}\qquad\mathcal{E}_{2}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\bigcap_{i,j\in V}\mathcal{E}_{2}(i,j).

To begin, we observe that if 1\mathcal{E}_{1} and 2\mathcal{E}_{2} occur jointly, then the cuts S={1,,a}S=\{1,\ldots,a\} and T={b,,n}T=\{b,\ldots,n\} are Ω(1logn)\Omega\big{(}\frac{1}{\log n}\big{)}-separated set and, consequently, round-cut outputs the required separated set in the balanced case. Event 2\mathcal{E}_{2} implies that

𝐯i𝐯j22>(rirj)24c1logn\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}>\frac{(r_{i}-r_{j})^{2}}{4c_{1}\cdot\log n}

holds for all iSi\in S, and jTj\in T. Item (3) defining event 1\mathcal{E}_{1} then determines:

(rirj)24c1lognrVarμ(𝐯i)14c1logn=r4c1logn.\frac{(r_{i}-r_{j})^{2}}{4c_{1}\cdot\log n}\geq r\cdot\operatorname*{Var}_{\mu}(\mathbf{v}_{i})\cdot\frac{1}{4c_{1}\cdot\log n}=\frac{r}{4c_{1}\cdot\log n}\,.

Finally, item (1) in event 1\mathcal{E}_{1} implies that μ(S),μ(T)μ(V)3\mu(S),\mu(T)\geq\frac{\mu(V)}{3} and hence, for all iS,jTi\in S,j\in T:

μ(S)μ(T)(μ(V))2𝐯i𝐯j22r36c1logn=Ω(1logn),\frac{\mu(S)\mu(T)}{(\mu(V))^{2}}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\geq\frac{r}{36\cdot c_{1}\cdot\log n}=\Omega\bigg{(}\frac{1}{\log n}\bigg{)}\,,

which establishes the separation guarantee.

Let us next bound the probability that 1\mathcal{E}_{1} and 2\mathcal{E}_{2} occur jointly. By Lemma 23, there exists a constant p>0p>0 such that Pr𝐠(1)=p\Pr_{\mathbf{g}}(\mathcal{E}_{1})=p and hence Pr𝐠(1¯)=1p\Pr_{\mathbf{g}}\big{(}\overline{\mathcal{E}_{1}}\big{)}=1-p. By Lemma 22, we have

Pr𝐠(2(i,j)¯)\displaystyle\Pr_{\mathbf{g}}\big{(}\overline{\mathcal{E}_{2}(i,j)}\big{)} =Pr𝐠((rirj)2(4c1logn)𝐯i𝐯j22)\displaystyle=\Pr_{\mathbf{g}}\Big{(}(r_{i}-r_{j})^{2}\geq\big{(}4c_{1}\cdot\log n\big{)}\cdot\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}\Big{)}
=Pr𝐠(𝐠,𝐯i𝐯j2(4c1logn)𝐯i𝐯j22d)1nc1\displaystyle=\Pr_{\mathbf{g}}\bigg{(}\langle\mathbf{g},\mathbf{v}_{i}-\mathbf{v}_{j}\rangle^{2}\geq\big{(}4c_{1}\cdot\log n\big{)}\cdot\frac{\lVert\mathbf{v}_{i}-\mathbf{v}_{j}\rVert_{2}^{2}}{d}\bigg{)}\leq\frac{1}{n^{c_{1}}}

Performing a union bound over 1¯\overline{\mathcal{E}_{1}} and 2(i,j)¯\overline{\mathcal{E}_{2}(i,j)} for each i,jVi,j\in V derives

Pr𝐠(12)p(n2)1nc1p1nc12=Ω(1)\Pr_{\mathbf{g}}\big{(}\mathcal{E}_{1}\cap\mathcal{E}_{2}\big{)}\geq p-\binom{n}{2}\frac{1}{n^{c_{1}}}\geq p-\frac{1}{n^{c_{1}-2}}=\Omega(1)

for c1c_{1} large enough. We conclude that with constant probability, a single execution of step (1) in round-cut outputs Ω(1logn)\Omega\big{(}\frac{1}{\log n}\big{)}-separated sets.

Note that upon computing candidate sets, it is possible to check whether these are in fact Ω(1/logn)\Omega(1/\log n)-separated. One can therefore boost the probability of success and obtain a high probability guarantee by repeating this procedure up to O(logn)O(\log n) many times. Note that this clearly affects the runtime of this procedure by at most a logarithmic factor. ∎

Appendix B Comparison to the cut-Improvement of Andersen & Lang

In this section, we briefly show that our formulation of the ratio-cut improvement problem in Definition 10 generalizes the definition of the modified quotient cut score, introduced by Andersen and Lang [7], to potentially non-integral seed vectors (rather than cut seeds) and to arbitrary hypergraphs (rather than graphs). Consider the following seed vector. Let AVA\subseteq V be a cut, and define the vector 𝐬A,A¯V\mathbf{s}^{A,\bar{A}}\in\mathbb{R}^{V} to be

𝐬A,A¯=def𝟏Aμ(A)/μ(A¯)𝟏A¯.\mathbf{s}^{A,\bar{A}}\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\boldsymbol{1}^{A}-\nicefrac{{\mu(A)}}{{\mu(\bar{A})}}\cdot\boldsymbol{1}^{\bar{A}}\,. (26)

Assuming μ(A)μ(A¯)\mu(A)\leq\mu(\bar{A}), note that 𝐬A,A¯,𝟏μ=0\langle\mathbf{s}^{A,\bar{A}},\boldsymbol{1}\rangle_{\mu}=0 and 𝐬A,A¯1\lVert\mathbf{s}^{A,\bar{A}}\rVert_{\infty}\leq 1, so that 𝐬A,A¯\mathbf{s}^{A,\bar{A}} is a valid seed vector. With this choice of seed vector, the ratio-cut improvement objective on a graph GG gives:

min{ΨG,𝐬A,A¯(S),ΨG,𝐬A,A¯(S¯)}=eEwhδe(S)|μ(AS)μ(A)μ(A¯)μ(A¯S)|,\min\left\{\Psi_{G,\mathbf{s}^{A,\bar{A}}}(S),\Psi_{G,\mathbf{s}^{A,\bar{A}}}(\bar{S})\right\}=\frac{\sum_{e\in E}w_{h}\cdot\delta_{e}(S)}{\lvert\mu(A\cap S)-\frac{\mu(A)}{\mu(\bar{A})}\cdot\mu(\bar{A}\cap S)\rvert},

which is exactly equal to the modified quotient cut score of cut (S,S¯)(S,\bar{S}) as considered by Andersen and Lang.

Appendix C Proofs of Theorems 9 and 10

Suppose we have an algorithm 𝒜\mathcal{A} that, on input α,\alpha, either outputs a cut with a cut SVS\subseteq V with ΨG,𝟏A,B(S)2α\Psi_{G,{\bf 1}^{A,B}}(S)\leq 2\alpha or an approximate dual solution of value α.\alpha. Then, 𝒜\mathcal{A} can be used to construct an approximate primal-dual oracle by performing binary search on α\alpha and converting the final approximate dual solution to an approximate dual graph certificate via the flow decomposition of Theorem 6. Since the range of possible values for α\alpha is [1/μ(V),hEwh][1/\mu(V),\sum_{h\in E}w_{h}], which we assume is polynomial in |V|,|V|, it suffices to run 𝒜\mathcal{A} for O(log|V|)O(\log|V|) iterations. Hence, in the rest of the section, we focus on implementing 𝒜\mathcal{A} for both cases in the theorems.

General Polymatroidal Cut Functions

For a fixed α0,\alpha\geq 0, we consider the integral version of the feasibility problem for (CI-Primal):

SV:hEwhδh(S)α𝟏S,𝟏A,B𝝁0.\exists S\subseteq V:\sum_{h\in E}w_{h}\delta_{h}(S)-\alpha\cdot\langle{\bf 1}_{S},{\bf 1}^{A,B}\rangle_{\bm{\mu}}\leq 0.

The left-hand side of this equation can be minimized exactly by the proximal algorithms for decomposable submodular minimization (DSM) of Ene et al.  [26, 27] when run with ϵ=1n\epsilon=\frac{1}{n} and rounded to integral solutions. If the resulting minimum is less than 0, the minimizer SS must have ΨG,𝟏A,B(S)α,\Psi_{G,{\bf 1}^{A,B}}(S)\leq\alpha, as required. Otherwhise, the DSM algorithm produces a matching dual solution, which takes exactly the form of a feasible solution to (CI-Dual) with value α\alpha. Applying the best result of Ene et al. , achieved by the accelerated coordinated descent method under the assumption that a quadratic optimization oracle is given for each cut function δh,\delta_{h}, the running time to implement an approximate primal-dual oracle becomes O~(|V|hEΘh),\tilde{O}\big{(}|V|\cdot\sum_{h\in E}\Theta_{h}\big{)}, where Θh\Theta_{h} is the running time for a quadratic optimization oracle for δh\delta_{h}.

Graph-Reducible Polymatroidal Cut Functions

As we have shown in Section 4, for directed and standard hypergraph cut functions, the congestion constraint in the hyperflow problem CI-Dual corresponds to vertex capacities on the hyperedge nodes of the factor graph G~.\tilde{G}. To implement the primitive 𝒜\mathcal{A} described above for a fixed α\alpha, we construct a network flow problem from this vertex-capacitated factor graph by adding an auxiliary source vertex ss and an auxiliary vertex tt. The vertex ss is connected to every vertex in iAi\in A with an arc of capacity αμi\alpha\mu_{i}, while the vertex tt is connected to every vertex jBj\in B with an arc of capacity αμjμ(A)/μ(B).\alpha\cdot\mu_{j}\cdot\nicefrac{{\mu(A)}}{{\mu(B)}}. It is easy to see that the capacity of a cut of VV in this new network is, for every SVS\subseteq V:

cap({s}S)=defhEwhδh(S)α𝟏S,𝟏A,Bμ+αμ(A)\operatorname{cap}(\{s\}\cup S)\stackrel{{\scriptstyle\mathrm{\scriptscriptstyle def}}}{{=}}\sum_{h\in E}w_{h}\delta_{h}(S)-\alpha\langle{\bf 1}_{S},{\bf 1}^{A,B}\rangle_{\mu}+\alpha\cdot\mu(A) (27)

Now, apply a 12\frac{1}{2}-approximate maximum flow solver to this network flow problem to obtain a flow a value FF and an ss-tt cut CC of capacity 2F.2F. We have two cases:

  1. 1.

    If Fα2μ(A),F\geq\frac{\alpha}{2}\cdot\mu(A), then the flow routed is an approximate dual solution of value α\alpha, as required.

  2. 2.

    If F<α2μ(A),F<\frac{\alpha}{2}\cdot\mu(A), the ss-tt cut has capacity strictly less then αμ(A).\alpha\cdot\mu(A). As a result, we have by Equation 27:

    hEwhδh(S)<α𝟏S,𝟏A,Bμ.\sum_{h\in E}w_{h}\delta_{h}(S)<\alpha\langle{\bf 1}_{S},{\bf 1}^{A,B}\rangle_{\mu}.

    This implies that ΨG,𝟏A,B(S)α2α,\Psi_{G,{\bf 1}^{A,B}}(S)\leq\alpha\leq 2\alpha, are required for 𝒜.\mathcal{A}.

The almost-linear time result follows by applying the almost-linear time algorithm of Chen et al.  [20] as our approximate maximum flow solver. For the standard hypergraph cut function, as the edges in the factor graph are undirected, it suffices to use the almost-linear time solver of Bernstein et al.  [13] for undirected vertex-capacitated graphs.

Appendix D Proof of Theorem 11: the cut-matching game reduction

See 11

Proof of Theorem 11.

First, we check that the dual graph certificates DtD_{t} output by 𝒜ci\mathcal{A}_{ci} conform to the definition of an approximate matching response. Then, the bipartiteness of DtD_{t} across (At,Bt)(A_{t},B_{t}) follows from requirement 3 in Definition 12, while the degree and largeness properties follow from requirements 4 and 5 in the same definition. In the symmetric case, we may assume each DtD_{t} is undirected.

In the following, let αt\alpha_{t} the value of α\alpha achieved by the tt-invocation of 𝒜ci\mathcal{A}_{ci} and

α=min{αt}t=1g(n).\alpha^{*}=\min\{\alpha_{t}\}_{t=1}^{g(n)}.

By Lemma 6 and the definition of approximate primal-dual oracle and, we have that

min{ΨG(Ct)}t=1g(n)min{ΨG,𝟏At,Bt(Ct)}t=1g(n)3min{αt}t=1g(n)=3α.\min\bigg{\{}\Psi_{G}(C_{t})\bigg{\}}_{t=1}^{g(n)}\leq\min\bigg{\{}\Psi_{G,{\bf 1}^{A_{t},B_{t}}}(C_{t})\bigg{\}}_{t=1}^{g(n)}\leq 3\cdot\min\bigg{\{}\alpha_{t}\bigg{\}}_{t=1}^{g(n)}=3\cdot\alpha^{*}.

By property 1 in Definition 12, we have that αtDt\alpha_{t}D_{t} embeds in GG with congestion 11 for all t.t. Averaging this guarantee over all tt, we have:

αg(n)Hg(n)11g(n)t=1g(n)αtDt1G\frac{\alpha^{*}}{g(n)}\cdot H_{g(n)}\preceq_{1}\frac{1}{g(n)}\cdot\sum_{t=1}^{g(n)}\alpha_{t}D_{t}\preceq_{1}G

Applying Theorem 5 and the definition of good strategy, we obtain the following lower bound on cuts of GG, for all SV:S\subseteq V:

αf(n)g(n)αg(n)ΨHg(n)(S)2ΨG(S)\alpha^{*}\cdot\frac{f(n)}{g(n)}\leq\frac{\alpha^{*}}{g(n)}\cdot\Psi_{H_{g(n)}}(S)\leq 2\cdot\Psi_{G}(S)

In particular, this applies to the minimum-ratio cut in GG, so that we have:

min{ΨG(Ct)}t=1g(n)3α6g(n)f(n)ΨG.\min\bigg{\{}\Psi_{G}(C_{t})\bigg{\}}_{t=1}^{g(n)}\leq 3\cdot\alpha^{*}\leq 6\cdot\frac{g(n)}{f(n)}\cdot\Psi_{G}^{*}.

This completes the proof. ∎