This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\newtheoremrep

lemmaLemma \newtheoremreptheorem[lemma]Theorem \newtheoremrepproposition[lemma]Lemma

A Framework to Design Approximation Algorithms for Finding Diverse Solutions in Combinatorial Problems

Tesshu Hanaka1    Masashi Kiyomi2    Yasuaki Kobayashi3    Yusuke Kobayashi3    Kazuhiro Kurita4&Yota Otachi1 1Nagoya University, Nagoya, Japan
2Seikei University, Tokyo, Japan
3Kyoto University, Kyoto, Japan
4National Institute of Informatics, Tokyo, Japan
Abstract

Finding a single best solution is the most common objective in combinatorial optimization problems. However, such a single solution may not be applicable to real-world problems as objective functions and constraints are only “approximately” formulated for original real-world problems. To solve this issue, finding multiple solutions is a natural direction, and diversity of solutions is an important concept in this context. Unfortunately, finding diverse solutions is much harder than finding a single solution. To cope with difficulty, we investigate the approximability of finding diverse solutions. As a main result, we propose a framework to design approximation algorithms for finding diverse solutions, which yields several outcomes including constant-factor approximation algorithms for finding diverse matchings in graphs and diverse common bases in two matroids and PTASes for finding diverse minimum cuts and interval schedulings.

1 Introduction

One way to solve a real-world problem is to formulate the problem as a mathematical optimization problem and find a solution with an optimization algorithm. However, it is not always easy to formulate an appropriate optimization problem as real-world problems often include intricate constraints and implicit preferences, which are usually simplified in order to solve optimization problems. Hence, an optimal solution obtained in this way is not guaranteed to be a “good solution” to the original real-world problem. To cope with this underlying inconsistency, the following two-stage approach would be promising: algorithms find multiple solutions and then users find what they like from these solutions. One may think that top-kk enumeration algorithms (see (Eppstein, 2008) for a survey) can be used for this purpose. However, this is not always the case since top-kk enumeration algorithms may output solutions similar to one another. (See (Wang et al., 2013; Yuan et al., 2016; Hao et al., 2020), for example). Such a set of solutions are not useful as a “catalog” of solutions provided to users.

As a way to solve this issue, algorithms are expected to find “diverse” solutions, and algorithms for finding “diverse” solutions have received considerable attention in many fields such as artificial intelligence (Hanaka et al., 2021b, a; Baste et al., 2022), machine learning (Gillenwater et al., 2015), and data mining (Wang et al., 2013; Yuan et al., 2016). There are many directions in the research on finding diverse solutions, depending on definitions of solutions and diversity measures. Given these rich applications, the diverse X paradigm was proposed by Fellows and Rosamond in Dagstuhl Seminar 18421 (Fernau et al., 2019). In this paradigm, “X” is a placeholder that represents solutions we are looking for, and they asked for theoretical investigations of finding diverse solutions. Since the problem of finding diverse solutions is much harder than that of finding a single solution for some “X”, it would be reasonable to consider the problem from the perspective of fixed-parameter tractability. From this proposition, several fixed-parameter tractable (FPT) algorithms are developed. Baste et al. gave algorithms for finding diverse solutions related to hitting sets (Baste et al., 2019) and those on bounded-treewidth graphs (Baste et al., 2022). Hanaka et al. (Hanaka et al., 2021b) proposed a framework to obtain FPT algorithms for finding diverse solutions in various combinatorial problems. Fomin et al. (Fomin et al., 2020, 2021) investigated the fixed-parameter tractability of finding diverse solutions related to matchings and matroids. In these results, the running time bounds of these FPT algorithms exponentially depend on the number of solutions we are looking for, which would be prohibitive for computing moderate numbers of solutions.

In this paper, we aim to develop theoretically efficient algorithms for finding moderate numbers of diverse solutions. As we mentioned, the problem of finding diverse solutions is harder than that of finding a single solution. For example, the problem of computing a maximum matching in a graph is known to be solvable in polynomial time, whereas that of computing two maximum matchings M1M_{1} and M2M_{2} maximizing |M1M2||M_{1}\mathbin{\triangle}M_{2}| is known to be NP-hard (Fomin et al., 2020). In this paper, we aim to develop theoretically efficient algorithms for finding moderate numbers of diverse solutions.

Our main result is a framework for designing efficient approximation algorithms with constant approximation factors for finding diverse solutions in combinatorial problems. We employ the sum of pairwise weighted Hamming distances among solutions as our diversity measure (see Section 2 for its definition), while some previous work (Fomin et al., 2021) employs the minimum of weighted Hamming distances. Roughly speaking, our approximation framework says that if we can enumerate top-kk weighted solutions in polynomial time, then we can obtain in polynomial time unweighted solutions maximizing our diversity measure with constant approximation factors. Moreover, suppose that we can exactly maximize our diversity of solutions in polynomial time when the number of solutions we are looking for is bounded by a constant. Then, our framework yields a polynomial-time approximation scheme (PTAS), meaning that factor-(1ε)(1-\varepsilon) approximation in polynomial time for every constant ε>0\varepsilon>0. By applying our framework, we obtain efficient constant-factor approximation algorithms for finding diverse solutions of matchings in a graph and of common bases of two matroids, while PTASes for finding diverse solutions of minimum cuts and of interval schedulings. Let us note that these diversity maximization problems are unlikely to be solvable in polynomial time, which will be discussed later.

2 Preliminaries

We denote the set of real numbers, the set of non-negative real numbers, and the set of positive real numbers as \mathbb{R}, 0\mathbb{R}_{\geq 0}, and >0\mathbb{R}_{>0}, respectively. Let EE be a set. We denote the set of all subsets of EE as 2E2^{E}.

A function d:E×E0d\colon E\times E\to\mathbb{R}_{\geq 0} is called a metric (on EE) if it satisfies the following conditions: for x,y,zEx,y,z\in E, (1) d(x,y)=0d(x,y)=0 if and only if x=yx=y; (2) d(x,y)=d(y,x)d(x,y)=d(y,x); (3) d(x,z)d(x,y)+d(y,z)d(x,z)\leq d(x,y)+d(y,z). Suppose that EmE\subseteq\mathbb{R}^{m} for some integer mm. For xEx\in E, we denote by xix_{i} the iith component of xx. If d(x,y)=1im|xiyi|d(x,y)=\sum_{1\leq i\leq m}|x_{i}-y_{i}| holds for x,yEx,y\in E, then dd is called an 1\ell_{1}-metric.

Let EE be a finite set. For X,YEX,Y\subseteq E, the symmetric difference between XX and YY is denoted by XYX\mathbin{\triangle}Y (i.e., XY=(XY)(YX)X\mathbin{\triangle}Y=(X\setminus Y)\cup(Y\setminus X)). Let w:E>0w:E\to\mathbb{R}_{>0}. A weighted Hamming distance is a function d:2E×2E0d:2^{E}\times 2^{E}\to\mathbb{R}_{\geq 0} such that for X,YEX,Y\subseteq E, dw(X,Y)=w(XY)d_{w}(X,Y)=w(X\mathbin{\triangle}Y), where w(Z)=xZw(x)w(Z)=\sum_{x\in Z}w(x) for ZEZ\subseteq E. Suppose that E={1,2,,m}E=\{1,2,\ldots,m\}. We can regard each subset XEX\subseteq E as an mm-dimensional vector x=(x1,,xm)x=(x_{1},\ldots,x_{m}) defined by xi=w(i)x_{i}=w(i) if iXi\in X and xi=0x_{i}=0 otherwise, for 1im1\leq i\leq m. It is easy to observe that for X,YEX,Y\subseteq E, dw(X,Y)=1im|xiyi|d_{w}(X,Y)=\sum_{1\leq i\leq m}|x_{i}-y_{i}|, where xx and yy are the vectors corresponding to XX and YY, respectively. Thus, the weighted Hamming distance dwd_{w} can be considered as an 1\ell_{1}-metric.

In this paper, we focus on the following diversity measure dsum()d_{\rm sum}(\cdot), called the sum diversity. Let 𝒴={Y1,,Yk}{\mathcal{Y}}=\{Y_{1},\dots,Y_{k}\} be a collection of subsets of EE and w:E0w\colon E\to\mathbb{R}_{\geq 0} be a weight function. We define

dsum(𝒴)=1i<jkdw(Yi,Yj).d_{\rm sum}(\mathcal{Y})=\displaystyle\sum_{1\leq i<j\leq k}d_{w}(Y_{i},Y_{j}).

Our problem Max-Sum Diverse Solutions is defined as follows.

Definition 1 (Max-Sum Diverse Solutions).

Given a finite set EE, an integer kk, a weight function w:E0w\colon E\to\mathbb{R}_{\geq 0}, and a membership oracle for 𝒳2E\mathcal{X}\subseteq 2^{E}, the task of Max-Sum Diverse Solutions is to find a set 𝒴={Y1,Y2,,Yk}\mathcal{Y}=\{Y_{1},Y_{2},\ldots,Y_{k}\} of kk distinct subsets Y1,Y2,,Yk𝒳Y_{1},Y_{2},\ldots,Y_{k}\in\mathcal{X} that maximizes the sum diversity dsum(𝒴)d_{\rm sum}(\mathcal{Y}).

Each set in 𝒳\mathcal{X} is called a feasible solution. In Max-Sum Diverse Solutions, the set 𝒳\mathcal{X} of feasible solutions is not given explicitly, while we can test whether a set XEX\subseteq E belongs to 𝒳\mathcal{X}. Our problem Max-Sum Diverse Solutions is highly related to the problem of packing disjoint feasible solutions.

Observation 1.

Suppose that all sets in 𝒳\mathcal{X} have the same cardinality rr and w(e)=1w(e)=1 for eEe\in E. Let Y1,Y2,,Yk𝒳Y_{1},Y_{2},\ldots,Y_{k}\in\mathcal{X} be kk distinct subsets. Then, dsum({Y1,,Yk})kr(k1)d_{\rm sum}(\{Y_{1},\ldots,Y_{k}\})\geq kr(k-1) if and only if YiYj=Y_{i}\cap Y_{j}=\emptyset for 1i<jk1\leq i<j\leq k.

This observation implies several hardness results of Max-Sum Diverse Solutions, which will be discussed in Section 4.

We particularly focus on the approximability of Max-Sum Diverse Solutions for specific sets of feasible solutions. For a maximization problem, we say that an approximation algorithm has factor 0<α10<\alpha\leq 1 if given an instance II, the algorithm outputs a solution with objective value ALG(I){\rm ALG}(I) such that ALG(I)/OPT(I)α{\rm ALG}(I)/{\rm OPT}(I)\geq\alpha, where OPT(I){\rm OPT}(I) is the optimal value for II. A polynomial-time approximation scheme is an approximation algorithm that takes an instance II and a constant ε>0\varepsilon>0, the algorithm outputs a solution with ALG(I)/OPT(I)1ε{\rm ALG(I)/{\rm OPT}(I)}\geq 1-\varepsilon in polynomial time.

2.1 A technique for Max-Sum Diversification

Our framework is based on approximation algorithms for a similar problem Max-Sum Diversification. Let XX be a set and let d:X×X0d\colon X\times X\to\mathbb{R}_{\geq 0} be a metric. In what follows, for YXY\subseteq X, we denote x,yYd(x,y)\sum_{x,y\in Y}d(x,y) as d(Y)d(Y).

Definition 2 (Max-Sum Diversification).

Given a metric d:X×X0d\colon X\times X\to\mathbb{R}_{\geq 0} on a finite set XX and an integer kk, the task of Max-Sum Diversification is to find a subset YXY\subseteq X with |Y|=k|Y|=k that maximizes d(Y)d(Y).

Max-Sum Diversification is studied under various names such as MAX-AVG Facility Dispersion and Remote-clique (Cevallos et al., 2019; Ravi et al., 1994). Max-Sum Diversification is known to be \NP-hard (Ravi et al., 1994). Cevallos et al. (Cevallos et al., 2019) devised a PTAS for Max-Sum Diversification. Their algorithm is based on a rather simple local search technique, but their analysis of the approximation factor and the iteration bound are highly nontrivial. Our framework for Max-Sum Diverse Solutions is based on their algorithm, which is briefly sketched below.

1 Procedure LocalSearch(X,d,kX,d,k)
2       YY\leftarrow arbitrary kk elements in XX
3       for i=1,,k(k1)(k+1)ln((k+2)(k1)24)i=1,\ldots,\lceil\frac{k(k-1)}{(k+1)}\ln(\frac{(k+2)(k-1)^{2}}{4})\rceil do
4             if \exists pair (x,y)(XY)×Y(x,y)\in(X\setminus Y)\times Y such that d(Yy+x)>d(Y)d(Y-y+x)>d(Y) then
5                   (x,y)argmax(x,y)(XY)×Yd(Yy+x)(x,y)\leftarrow\underset{(x,y)\in(X\setminus Y)\times Y}{\arg\max}d(Y-y+x)
6                   YYy+xY\leftarrow Y-y+x
7                  
8      Output YY
9      
Algorithm 1 A (12/k)(1-2/k)-approximation algorithm for Max-Sum Diversification (Cevallos et al., 2019).

A pseudocode of the algorithm due to (Cevallos et al., 2019) is given in Algorithm 1. In this algorithm, we first pick an arbitrary set of kk elements in XX, which is denoted by YXY\subseteq X. Then, we find a pair of elements xXYx\in X\setminus Y and yYy\in Y that maximizes d(Yy+x)d(Y-y+x) and update YY by Yy+xY-y+x if d(Yy+x)>d(Y)d(Y-y+x)>d(Y). We repeat this update procedure k(k1)(k+1)ln((k+2)(k1)24)=O(klogk)\lceil\frac{k(k-1)}{(k+1)}\ln(\frac{(k+2)(k-1)^{2}}{4})\rceil=O(k\log k) times. Since we can find a pair (x,y)(x,y) in O(|X|kτ)O(\left|X\right|k\tau) time, where τ\tau is the running time to evaluate the distance function d(x,y)d(x,y) for x,yXx,y\in X, the following lemma holds.

Lemma 3.

Algorithm 1 runs in time O(|X|k2τlogk)O(\left|X\right|k^{2}\tau\log k).

They showed that if the metric dd is negative type, then the approximation ratio of Algorithm 1 is at least 12/k1-2/k (Cevallos et al., 2019). Here, we do not give the precise definition of a negative type metric but mention that every 1\ell_{1}-metric is negative type (Deza and Laurent, 1997; Cevallos et al., 2016).

Theorem 4 ((Cevallos et al., 2019)).

If d:X×X0d\colon X\times X\to\mathbb{R}_{\geq 0} is a negative type metric, then the approximation ratio of Algorithm 1 is 12/k1-2/k.

They further observed that the above theorem implies that Max-Sum Diversification admits a PTAS as follows. Let ϵ\epsilon be a positive constant. When ϵ<2/k\epsilon<2/k, that is, k<2/ϵk<2/\epsilon, then kk is constant. Thus, we can solve Max-Sum Diversification in time |X|O(1/ϵ)\left|X\right|^{O(1/\epsilon)} by using a brute-force search. Otherwise, the above (12/k)(1-2/k)-approximation algorithm achieves factor 1ϵ1-\epsilon. Thus, Max-Sum Diversification admits a PTAS, provided that dd is a negative type metric.

Corollary 5 ((Cevallos et al., 2019)).

If d:X×X0d\colon X\times X\to\mathbb{R}_{\geq 0} is a negative type metric, then Max-Sum Diversification admits a PTAS.

3 A framework for finding diverse solutions

In this section, we propose a framework for designing approximation algorithms for Max-Sum Diverse Solutions. The basic strategy to our framework is the local search algorithm described in the previous section. Let EE be a finite set and let 𝒳2E\mathcal{X}\subseteq 2^{E} be a set of feasible solutions. We set X𝒳X\coloneqq\mathcal{X} and apply the local search algorithm for Max-Sum Diversification to (X,dw,k)(X,d_{w},k). Recall that our diversity measure dsumd_{\rm sum} is the sum of weighted Hamming distances dwd_{w}. Moreover, dwd_{w} is an 1\ell_{1}-metric, as observed in the previous section. By Theorem 4, the local search algorithm for Max-Sum Diversification has approximation factor 12/k1-2/k. However, the running time of a straightforward application of Lemma 3 is O(|𝒳||E|k2logk)O(\left|\mathcal{X}\right|\cdot\left|E\right|k^{2}\log k) even if the feasible solutions in 𝒳\mathcal{X} can be enumerated in O(|𝒳||E|)O(\left|\mathcal{X}\right|\cdot\left|E\right|) total time, which may be exponential in the input size |E|\left|E\right|.

A main obstacle to applying the local search algorithm is that from a current set 𝒴={Y1,,Yk}\mathcal{Y}=\{Y_{1},\ldots,Y_{k}\} of feasible solutions, we need to find a pair of feasible solutions (X,Y)(𝒳𝒴)×𝒴(X,Y)\in(\mathcal{X}\setminus\mathcal{Y})\times\mathcal{Y} maximizing dsum(𝒴Y+X)d_{\rm sum}(\mathcal{Y}-Y+X). To overcome this obstacle, we exploit top-kk enumeration algorithms. Let w:Ew^{\prime}\colon E\to\mathbb{R} be a weight function. An algorithm 𝒜\mathcal{A} is called a top-kk enumeration algorithm for (E,𝒳,w,k)(E,\mathcal{X},w^{\prime},k) if for a positive integer kk, 𝒜\mathcal{A} finds kk feasible solutions Y1,,Yk𝒳Y_{1},\ldots,Y_{k}\in\mathcal{X} such that for any Y{Y1,,Yk}Y\in\{Y_{1},\ldots,Y_{k}\} and X𝒳{Y1,,Yk}X\in\mathcal{X}\setminus\{Y_{1},\ldots,Y_{k}\}, w(X)w(Y)w^{\prime}(X)\leq w^{\prime}(Y) holds. By using 𝒜\mathcal{A}, we can compute the pair (X,Y)(X,Y) as follows.

We first guess Y𝒴Y\in\mathcal{Y} in the pair (X,Y)(X,Y) and let 𝒴=𝒴{Y}\mathcal{Y}^{\prime}=\mathcal{Y}\setminus\{Y\}. To find the pair (X,Y)(X,Y), it suffices to find X𝒳𝒴X\in\mathcal{X}\setminus\mathcal{Y}^{\prime} that maximizes Y𝒴w(XY)\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\triangle Y^{\prime}). For an element eEe\in E, we define a new weight w(e)w(e)(𝐸𝑥(e,𝒴)𝐼𝑛(e,𝒴))w^{\prime}(e)\coloneqq w(e)(\mathit{Ex}(e,\mathcal{Y}^{\prime})-\mathit{In}(e,\mathcal{Y}^{\prime})), where 𝐼𝑛(e,𝒴)\mathit{In}(e,\mathcal{Y}^{\prime}) (resp. 𝐸𝑥(e,𝒴)\mathit{Ex}(e,\mathcal{Y}^{\prime})) is the number of feasible solutions in 𝒴\mathcal{Y}^{\prime} that contain ee (resp. do not contain ee). For notational convenience, we fix 𝒴\mathcal{Y}^{\prime} and write 𝐼𝑛(e)\mathit{In}(e) and 𝐸𝑥(e)\mathit{Ex}(e) to denote 𝐼𝑛(e,𝒴)\mathit{In}(e,\mathcal{Y}^{\prime}) and 𝐸𝑥(e,𝒴)\mathit{Ex}(e,\mathcal{Y}^{\prime}), respectively. The following lemma shows that a feasible solution XX that maximizes w(X)w^{\prime}(X) also maximizes Y𝒴w(XY)\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime}).

Lemma 6.

For any feasible solution X𝒳X\in\mathcal{X}, Y𝒴w(XY)=w(X)+eEw(e)𝐼𝑛(e)\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})=w^{\prime}(X)+\sum_{e\in E}w(e)\cdot\mathit{In}(e).

Proof.

The contribution of eXe\in X to w(XY)w(X\mathbin{\triangle}Y^{\prime}) is w(e)w(e) if eYe\not\in Y^{\prime}, and 0 otherwise. Thus, eXe\in X contributes w(e)𝐸𝑥(e)w(e)\cdot\mathit{Ex}(e) to Y𝒴w(XY)\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime}). Similarly, eEXe\in E\setminus X contributes w(e)𝐼𝑛(e)w(e)\cdot\mathit{In}(e) to Y𝒴w(XY)\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime}). This gives us Y𝒴w(XY)=w(X)+eEw(e)𝐼𝑛(e)\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})=w^{\prime}(X)+\sum_{e\in E}w(e)\cdot\mathit{In}(e) as follows.

Y𝒴w(XY)\displaystyle\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})
=eXw(e)𝐸𝑥(e)+eEXw(e)𝐼𝑛(e)\displaystyle=\sum_{e\in X}w(e)\cdot\mathit{Ex}(e)+\sum_{e\in E\setminus X}w(e)\cdot\mathit{In}(e)
=eXw(e)𝐸𝑥(e)+eEw(e)𝐼𝑛(e)eXw(e)𝐼𝑛(e)\displaystyle=\sum_{e\in X}w(e)\cdot\mathit{Ex}(e)+\sum_{e\in E}w(e)\cdot\mathit{In}(e)-\sum_{e\in X}w(e)\cdot\mathit{In}(e)
=eXw(e)(𝐸𝑥(e)𝐼𝑛(e))+eEw(e)𝐼𝑛(e)\displaystyle=\sum_{e\in X}w(e)(\mathit{Ex}(e)-\mathit{In}(e))+\sum_{e\in E}w(e)\cdot\mathit{In}(e)
=w(X)+eEw(e)𝐼𝑛(e).\displaystyle=w^{\prime}(X)+\sum_{e\in E}w(e)\cdot\mathit{In}(e).\qed

From the above lemma, we can find the pair (X,Y)(X,Y) with a top-kk enumeration algorithm 𝒜\mathcal{A} for (E,𝒳,w,k)(E,\mathcal{X},w^{\prime},k) as follows. By Lemma 6, for any feasible solution X𝒳X\in\mathcal{X}, Y𝒴w(XY)=w(X)+eEw(e)𝐼𝑛(e)\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})=w^{\prime}(X)+\sum_{e\in E}w(e)\cdot\mathit{In}(e). Since the second term does not depend on XX, to find a feasible solution XX maximizing the left-hand side, it suffices to maximize w(X)w^{\prime}(X) subject to X𝒳𝒴X\in\mathcal{X}\setminus\mathcal{Y}^{\prime}. The algorithm 𝒜\mathcal{A} allows us to find kk feasible solution Z1,,ZkZ_{1},\ldots,Z_{k} such that w(Z1)w(Zk)w(Z)w^{\prime}(Z_{1})\geq\cdots\geq w^{\prime}(Z_{k})\geq w^{\prime}(Z) for any feasible solution ZZ other than Z1,,ZkZ_{1},\ldots,Z_{k}. As |𝒴|<k|\mathcal{Y}^{\prime}|<k, at least one of these solutions provides such a solution XX.

The entire algorithm is as follows. We first find a set of kk distinct feasible solutions in 𝒳\mathcal{X} using the enumeration algorithm 𝒜\mathcal{A}. Then, we repeat the local update procedure described above O(klogk)O(k\log k) times. Suppose that 𝒜\mathcal{A} enumerates kk feasible solutions in time O((|E|+k)c)O((|E|+k)^{c}) for some constant cc. Then, the entire algorithm runs in time O((|E|+k)c|E|k2logk)O((|E|+k)^{c}|E|k^{2}\log k) as we can compute the pair (X,Y)(X,Y) in time O((|E|+k)c|E|k)O((|E|+k)^{c}|E|k) by simply guessing Y𝒴Y\in\mathcal{Y}.

Note that the approximation factor 12/k1-2/k does not give a reasonable bound for k=2k=2. In this case, however, we still have an approximation factor 1/21/2 with a greedy algorithm for Max-Sum Diversification (Birnbaum and Goldman, 2009), which is described as follows. Initially, we set 𝒴={Y1}\mathcal{Y}=\{Y_{1}\} with arbitrary Y1𝒳Y_{1}\in\mathcal{X}. Then, we compute a feasible solution Y2𝒳𝒴Y_{2}\in\mathcal{X}\setminus\mathcal{Y} maximizing Y𝒴w(Y2Y)\sum_{Y\in\mathcal{Y}}w(Y_{2}\triangle Y). By Lemma 6 and the above discussion, we can find such a solution Y2Y_{2} with a top-kk enumeration algorithm for (E,𝒳,w,k)(E,\mathcal{X},w^{\prime},k), where w(e)w(e)(𝐸𝑥(e,𝒴)𝐼𝑛(e,𝒴))w^{\prime}(e)\coloneqq w(e)\cdot(\mathit{Ex}(e,\mathcal{Y})-\mathit{In}(e,\mathcal{Y})) for eEe\in E. We repeat this k1k-1 times so that 𝒴\mathcal{Y} contains kk feasible solutions. As discussed in this section, the approximation factor of this algorithm is 1/21/2 as in (Birnbaum and Goldman, 2009). Thus, the following theorem holds.

Theorem 7.

Let EE be a finite set, 𝒳2E\mathcal{X}\subseteq 2^{E}, and w:E>0w\colon E\to\mathbb{R}_{>0}. Suppose that there is a top-kk enumeration algorithm for (E,𝒳,w,k)(E,\mathcal{X},w^{\prime},k) that runs in O((|E|+k)c)O((\left|E\right|+k)^{c}) time for a constant, where w:Ew^{\prime}\colon E\to\mathbb{R} is an arbitrary weight function. Then, there is an α\alpha-approximation algorithm for Max-Sum Diverse Solutions that runs in O((|E|+k)c|E|k2logk)O((\left|E\right|+k)^{c}|E|k^{2}\log k) time, where α=max(12/k,1/2)\alpha=\max(1-2/k,1/2). Moreover, if there is a polynomial-time exact algorithm for Max-Sum Diverse Solutions for constant kk, then it admits a PTAS.

4 Applications of the framework

To complete the description of approximation algorithms based on our framework, we need to develop top-kk enumeration algorithms for specific problems. In what follows, we design top-kk enumeration algorithms for matchings, common bases of two matroids, and interval schedulings.

Our top-kk enumeration algorithms are based on a well-known technique used in (Lawler, 1972) (also discussed in (Eppstein, 2008)). The key to enumeration algorithms is the following Weighted Extension.

Definition 8 (Weighted Extension).

Given a finite set EE, a set of feasible solutions 𝒳2E\mathcal{X}\subseteq 2^{E} as a membership oracle, a weight function w:Ew^{\prime}\colon E\to\mathbb{R}, and a pair of disjoint subsets 𝐼𝑛\mathit{In} and 𝐸𝑥\mathit{Ex} of EE, the task is to find a feasible solution X𝒳X\in\mathcal{X} that satisfies 𝐼𝑛X\mathit{In}\subseteq X and X𝐸𝑥=X\cap\mathit{Ex}=\emptyset maximizing w(X)w^{\prime}(X) subject to these conditions.

If we can solve the above problem in O(|E|c)O(\left|E\right|^{c}) time, then we can obtain a top-kk enumeration algorithm for (E,𝒳,w,k)(E,\mathcal{X},w^{\prime},k) that runs in O(k|E|c+1)O(k\left|E\right|^{c+1}) time.

Lemma 9 ((Lawler, 1972)).

Suppose that Weighted Extension for (E,𝒳,w,k)(E,\mathcal{X},w^{\prime},k) can be solved in O(|E|c)O(\left|E\right|^{c}) time. Then, there is an O(k|E|c+1)O(k\left|E\right|^{c+1})-time top-kk enumeration algorithm for (E,𝒳,w,k)(E,\mathcal{X},w^{\prime},k).

4.1 Matchings

Matching is one of the most fundamental combinatorial objects in graphs, and the polynomial-time algorithm for computing a maximum weight matching due to (Edmonds, 1965) is a cornerstone result in this context. Finding diverse matchings has also been studied so far (Hanaka et al., 2021b, a; Fomin et al., 2020, 2021). Let G=(V,E)G=(V,E) be a graph. A set of edges MM is a matching of GG if MM has no pair of edges that share a common endpoint. A matching MM is called a perfect matching of GG if every vertex in GG is incident to an edge in MM. By using our framework, we design an approximation algorithm for finding diverse matchings. The formal definition of the problem is as follows.

Definition 10 (Diverse Matchings).

Given a graph G=(V,E)G=(V,E), a weight function w:E>0w\colon E\to\mathbb{R}_{>0}, and integers kk and rr, the task of Diverse Matchings is to find kk distinct matchings M1,,MkM_{1},\ldots,M_{k} of size rr that maximize dsum({M1,,Mk})d_{\rm sum}(\{M_{1},\ldots,M_{k}\}).

To apply our framework, it suffices to show that Weighted Extension for matchings can be solved in polynomial time. Our method is similar to a reduction from the maximum weight perfect matching problem to the maximum weight matching problem (Duan and Pettie, 2014). Let 𝐼𝑛,𝐸𝑥E\mathit{In},\mathit{Ex}\subseteq E be disjoint subsets of edges and let w:Ew^{\prime}\colon E\to\mathbb{R}. Then, our goal is to find a matching MM of GG with |M|=r|M|=r such that InM\textit{In}\subseteq M and ExM=\textit{Ex}\cap M=\emptyset, and MM maximizes w(M)w^{\prime}(M) subject to these constraints. This problem can be reduced to that of finding a maximum weight perfect matching as follows. We assume that In is a matching of GG as otherwise there is no matching containing it. Let G=(V,E)G^{\prime}=(V^{\prime},E^{\prime}) be the graph obtained from GG by removing (1) all edges in Ex and (2) all end vertices of edges in In. Then, it is easy to see that MM is a matching of GG with InM\textit{In}\subseteq M and ExM=\textit{Ex}\cap M=\emptyset if and only if MInM\setminus\textit{In} is a matching of GG^{\prime}. Thus, it suffices to find a maximum weight matching of size exactly r=r|In|r^{\prime}=r-\left|\textit{In}\right| in GG^{\prime}. To this end, we add |V|2r|V^{\prime}|-2r^{\prime} vertices UU to GG^{\prime} and add all possible edges between vertices vVv\in V^{\prime} and uUu\in U. The graph obtained in this way is denoted by H=(VU,EF)H=(V^{\prime}\cup U,E\cup F), where F={{u,v}:uU,vV}F=\{\{u,v\}:u\in U,v\in V^{\prime}\}. We extend the weight function ww^{\prime} by setting w(f)=0w^{\prime}(f)=0 for fFf\in F. Then, the following lemma holds.

Lemma 11.

Let MM^{*} be a maximum weight perfect matching in HH. Then, MFM^{*}\setminus F is a matching of size rr^{\prime} in GG^{\prime} such that for every size-rr^{\prime} matching MM^{\prime} in GG^{\prime}, it holds that w(M)w(MF)w^{\prime}(M^{\prime})\leq w^{\prime}(M^{*}\setminus F).

Proof.

Since MM^{*} is a perfect matching and any edge incident to UU is contained in FF, MM^{*} must contain exactly |U||U| edges of FF. This implies that the perfect matching MM^{*} contains exactly rr^{\prime} edges of GG^{\prime}. Suppose that there is a size-rr^{\prime} matching MM^{\prime} in GG^{\prime} such that w(M)>w(MF)w^{\prime}(M^{\prime})>w^{\prime}(M^{*}\setminus F). As every vertex in UU is adjacent to VV^{\prime}, we can choose exactly a set NFN\subseteq F of |U||U| edges between UU and VV^{\prime} so that MNM^{\prime}\cup N forms a perfect matching in HH. Then, we have w(MN)=w(M)+w(N)>w(MF)+w(MF)=w(M)w^{\prime}(M^{\prime}\cup N)=w^{\prime}(M^{\prime})+w^{\prime}(N)>w^{\prime}(M^{*}\setminus F)+w^{\prime}(M^{*}\cap F)=w^{\prime}(M^{*}) as w(N)=w(MF)=0w^{\prime}(N)=w^{\prime}(M^{*}\cap F)=0, contradicting the fact that MM^{*} is a maximum weight perfect matching of HH. ∎

Thus, we can solve Weighted Extension for a size-rr matching in polynomial time (Edmonds, 1965). By Theorem 7 and Lemma 9, we have the following theorem.

Theorem 12.

There is a polynomial-time approximation algorithm for Diverse Matchings with approximation factor max(12/k,1/2)\max(1-2/k,1/2).

4.2 Common bases of two matroids

Let EE be a finite set and let a non-empty family of subsets \mathcal{I} of EE. The pair =(E,)\mathcal{M}=(E,\mathcal{I}) is a matroid if (1) for each XX\in\mathcal{I}, every subset of XX is included in \mathcal{I} and (2) if X,YX,Y\in\mathcal{I} and |X|<|Y|\left|X\right|<\left|Y\right|, then there exists an element eYXe\in Y\setminus X such that X{e}X\cup\{e\}\in\mathcal{I}. Each set in \mathcal{I} is called an independent set of \mathcal{M}. An inclusion-wise maximal independent set II of \mathcal{M} is a base of \mathcal{M}. Because of condition (2), all bases in \mathcal{M} have the same cardinality. For two matroids 1=(E,1)\mathcal{M}_{1}=(E,\mathcal{I}_{1}) and 2=(E,2)\mathcal{M}_{2}=(E,\mathcal{I}_{2}), a subset XEX\subseteq E is a common base of 1\mathcal{M}_{1} and 2\mathcal{M}_{2} if XX is a base of both 1\mathcal{M}_{1} and 2\mathcal{M}_{2}. In this subsection, we give an approximation algorithm for diverse common bases of two matroids.

Definition 13 (Diverse Matroid Common Bases).

Given two matroids 1=(E,1)\mathcal{M}_{1}=(E,\mathcal{I}_{1}) and 2=(E,2)\mathcal{M}_{2}=(E,\mathcal{I}_{2}) as membership oracles, a weight function w:E>0w\colon E\to\mathbb{R}_{>0}, and an integer kk, the task of Diverse Matroid Common Bases is to find kk distinct common bases B1,,BkB_{1},\ldots,B_{k} of 1\mathcal{M}_{1} and 2\mathcal{M}_{2} that maximize dsum(B1,,Bk)d_{\rm sum}(B_{1},\ldots,B_{k}).

Given two matroids 1=(E,1)\mathcal{M}_{1}=(E,\mathcal{I}_{1}) and 2=(E,2)\mathcal{M}_{2}=(E,\mathcal{I}_{2}) as membership oracles, the problem of partitioning EE into kk common bases of 1\mathcal{M}_{1} and 2\mathcal{M}_{2} is a notoriously hard problem, which requires an exponential number of membership queries (Bérczi and Schwarcz, 2021). This fact together with 1 implies that Diverse Matroid Common Bases cannot be solved with polynomial number of membership queries in our problem setting. Given this fact, we develop a constant-factor approximation algorithm for Diverse Matroid Common Bases. To this end, we show that Weighted Extension for common bases of two matroids can be solved in polynomial time.

Similarly to the case of matchings, we can find a maximum weight common base B12B\in\mathcal{I}_{1}\cap\mathcal{I}_{2} subject to 𝐼𝑛B\mathit{In}\subseteq B and 𝐸𝑥B=\mathit{Ex}\cap B=\emptyset for given disjoint 𝐼𝑛,𝐸𝑥E\mathit{In},\mathit{Ex}\subseteq E, which is as follows. Let =(E,)\mathcal{M}=(E,\mathcal{I}) be a matroid. For XEX\subseteq E, we let X=(EX,𝒥)\mathcal{M}\setminus X=(E\setminus X,\mathcal{J}), where 𝒥={JX:J}\mathcal{J}=\{J\setminus X:J\in\mathcal{I}\}. Then, X\mathcal{M}\setminus X is a matroid (see (Oxley, 2006)). Similarly, for XEX\subseteq E, we let /X=(EX,𝒥)\mathcal{M}/X=(E\setminus X,\mathcal{J}^{\prime}), where 𝒥={J:JX,JEX}\mathcal{J}^{\prime}=\{J:J\cup X\in\mathcal{I},J\subseteq E\setminus X\}. Then (E,𝒥)(E,\mathcal{J}) is also a matroid (see (Oxley, 2006)). For two matroids 1\mathcal{M}_{1} and 2\mathcal{M}_{2}, we consider two matroids 1=(1𝐸𝑥)/𝐼𝑛\mathcal{M}^{\prime}_{1}=(\mathcal{M}_{1}\setminus\mathit{Ex})/\mathit{In} and 2=(2𝐸𝑥)/𝐼𝑛\mathcal{M}^{\prime}_{2}=(\mathcal{M}_{2}\setminus\mathit{Ex})/\mathit{In}. For every independent set XX in 1\mathcal{M}^{\prime}_{1} and 2\mathcal{M}^{\prime}_{2}, XX does not contain any element in 𝐸𝑥\mathit{Ex} and X𝐼𝑛X\cup\mathit{In} is an independent set in both 1\mathcal{M}_{1} and 2\mathcal{M}_{2}. Thus, Weighted Extension can be solved by computing a maximum weight common base in 1\mathcal{M}^{\prime}_{1} and 2\mathcal{M}^{\prime}_{2}, which can be solved in polynomial time (see Theorem 41.7 in (Schrijver, 2003)). By Theorem 7 and Lemma 9, the following theorem holds.

Theorem 14.

There is a polynomial-time approximation algorithm for Diverse Matroid Common Bases with approximation factor max(12/k,1/2)\max(1-2/k,1/2), provided that the membership oracles for 1\mathcal{M}_{1} and 2\mathcal{M}_{2} can be evaluated in polynomial time.

4.3 Minimum cuts

Let G=(V,E)G=(V,E) be a graph. A partition of VV into two non-empty sets AA and BB is called a cut of GG. For a cut (A,B)(A,B) of GG, the set of edges having one end in AA and the other end in BB is denoted by E(A,B)E(A,B). When no confusion arises, we may refer to E(A,B)E(A,B) as a cut of GG. The size of a cut C=E(A,B)C=E(A,B) is defined by |E(A,B)||E(A,B)|. A cut CC is called a minimum cut of GG if there is no cut CC^{\prime} of GG with |C|<|C||C^{\prime}|<|C|. In this section, we consider the following problem.

Definition 15 (Diverse Minimum Cuts).

Given a graph G=(V,E)G=(V,E) with an edge-weight function w:E0w\colon E\to\mathbb{R}_{\geq 0} and an integer kk, the task of Diverse Minimum Cuts is to find kk distinct minimum cuts C1,,CkEC_{1},\ldots,C_{k}\subseteq E of GG that maximize dsum({C1,,Ck})d_{\rm sum}(\{C_{1},\ldots,C_{k}\}).

An important observation for this problem is that the number of minimum cuts of any graph GG is O(|V|2)O(|V|^{2}) (Karger, 2000). Moreover, we can enumerate all minimum cuts in a graph in polynomial time (Yeh et al., 2010; Vazirani and Yannakakis, 1992). Thus, we can solve both Weighted Extension for minimum cuts and Diverse Minimum Cuts for constant kk in polynomial time, yielding a PTAS for Diverse Minimum Cuts.

Theorem 16.

Diverse Minimum Cuts admits a PTAS.

Given this, it is natural to ask whether Diverse Minimum Cuts admits a polynomial-time algorithm. However, we show that Diverse Minimum Cuts is \NP-hard even if GG has a cut of size 33. Let λ(G)\lambda(G) be the size of a minimum cut of GG.

Theorem 17.

Diverse Minimum Cuts is \NP-hard even if λ(G)=3\lambda(G)=3.

The \NP-hardness is shown by performing a polynomial-time reduction from the maximum independent set problem on cubic graphs, which is known to be \NP-complete (Garey et al., 1974). For a graph HH, we denote by α(H)\alpha(H) the maximum size of an independent set of HH. Let HH be a graph in which every vertex has degree exactly 33. Let HH^{\prime} be the graph obtained from HH by subdividing each edge twice, that is, each edge is replaced by a path of three edges. The set of vertices in HH^{\prime} that do not appear in HH is denoted by DD. The following folklore lemma ensures that the value of α\alpha increases exactly by mm.

Lemma 18 (folklore).

Let mm be the number of edges in HH. Then, α(H)=α(H)+m\alpha(H^{\prime})=\alpha(H)+m.

We construct a graph G=(V,E)G=(V,E) from HH^{\prime} by adding a new vertex vv^{*} and adding an edge between vv^{*} and each vertex in DD. Note that the degree of vv^{*} in HH^{\prime} is more than 33.

Lemma 19.

GG has kk edge-disjoint cuts of size 33 if and only if HH^{\prime} has an independent set of size kk.

Proof.

Suppose first that HH^{\prime} has an independent set SS of size kk. Since every vertex in SS appears also in GG, we can construct a cut of the form Ci=EG({vi},V{vi})C_{i}=E_{G}(\{v_{i}\},V\setminus\{v_{i}\}) for each viSv_{i}\in S. As SS is an independent set of GG, these kk cuts are edge-disjoint. Moreover, these cuts have exactly three edges since every vertex in SS has degree 33 in GG. Thus, GG has kk edge-disjoint cuts of size 33.

Conversely, suppose GG has kk edge-disjoint cuts C1,C2,,CkEC_{1},C_{2},\ldots,C_{k}\subseteq E with |Ci|=3|C_{i}|=3 for 1ik1\leq i\leq k. It suffices to prove that each of these cuts forms Ci=EG({v},V{v})C_{i}=E_{G}(\{v\},V\setminus\{v\}) for some vV{v}v\in V\setminus\{v^{*}\}. Let Ci=EG(X,VX)C_{i}=E_{G}(X,V\setminus X) for some XVX\subseteq V. Without loss of generality, we assume that vVXv^{*}\in V\setminus X. In the following, we show that XX contains exactly one vertex. Since every vertex in DD is adjacent to vv^{*}, XX contains at most three vertices of DD. Suppose first that |XD|=3|X\cap D|=3. Since every vertex of DD has a neighbor in DD, VXV\setminus X has a vertex in DD that has a neighbor in XDX\cap D. However, as every vertex of XDX\cap D is adjacent to vv^{*}, there are at least four edges between XX and VXV\setminus X, contradicting to the fact that |Ci|=3|C_{i}|=3.

Suppose next that |XD|=2|X\cap D|=2. Let u,vXDu,v\in X\cap D be distinct. If uu is not adjacent to vv, there are two vertices uu^{\prime} and vv^{\prime} in (VX)D(V\setminus X)\cap D that are adjacent to uu and vv, respectively. This implies CiC_{i} contains four edges {u,u},{v,v},{u,v},{v,v}\{u,u^{\prime}\},\{v,v^{\prime}\},\{u,v^{*}\},\{v,v^{*}\}, yielding a contradiction. Thus, uu is adjacent to vv. Let uu^{\prime} and vv^{\prime} be the vertices in V(D{v})V\setminus(D\cup\{v^{*}\}) that are adjacent to uu and vv, respectively. Observe that at least one of uu^{\prime} and vv^{\prime}, say uu^{\prime}, belongs to XX as otherwise there are four edges ({u,u},{v,v},{u,v},{v,v}\{u,u^{\prime}\},\{v,v^{\prime}\},\{u,v^{*}\},\{v,v^{*}\}) between XX and VXV\setminus X. Since |XD|=2|X\cap D|=2 and uu^{\prime} has three neighbors in DD, the other two neighbors of uu^{\prime} belongs to VDV\setminus D, which ensures at least four edges between XX and VXV\setminus X.

Suppose that |XD|=1|X\cap D|=1. Let uXDu\in X\cap D. In this case, we show that X={u}X=\{u\}. To see this, consider the neighbors of uu. Since vVXv^{*}\in V\setminus X and |XD|=1|X\cap D|=1, at least two neighbors of uu, which are vv^{*} and a vertex in DD, belong to VXV\setminus X. If the other neighbor vv is in XX, then by the assumption that |XD|=1|X\cap D|=1, the two neighbors of vv other than uu belong to VXV\setminus X, which implies there are at least four edges between XX and VXV\setminus X. Thus, all the neighbors of uu belong to VXV\setminus X. Since GG is connected, all the vertices except for uu belong to VXV\setminus X as well. Thus, we have Ci=EG({u},V{u})C_{i}=E_{G}(\{u\},V\setminus\{u\}).

Finally, suppose that XD=X\cap D=\emptyset. In this case, at least one vertex of V(D{v})V\setminus(D\cup\{v^{*}\}) is included in XX. Let uX(D{v})u\in X\setminus(D\cup\{v^{*}\}). Since XD=X\cap D=\emptyset, every neighbor of uu belongs to VDV\setminus D. Similarly to the previous case, we have X={u}X=\{u\}, which completes the proof. ∎

Note that the proof of Theorem 17 also shows that the graph constructed in the reduction has no cut of size at most two. Therefore, by Lemmas 18 and 19, Theorem 17 follows.

When λ(G)=1\lambda(G)=1, then Diverse Minimum Cuts is trivially solvable in linear time as the problem can be reduced to finding all bridges in GG. If λ(G)=2\lambda(G)=2, the problem is slightly nontrivial, which in fact is solvable in polynomial time as well.

Theorem 20.

Diverse Minimum Cuts can be solved in |V|O(1)\left|V\right|^{O(1)} time, provided that λ(G)2\lambda(G)\leq 2.

We reduce the problem to that of finding a subgraph of prescribed size with maximizing the sum of convex functions on their degrees of vertices.

Theorem 21 ((Apollonio and Sebö, 2009)).

Given an undirected graph HH, an integer kk, and convex functions fv:0f_{v}:\mathbb{N}_{\geq 0}\to\mathbb{R} for vV(H)v\in V(H), the problem of finding kk-edge subgraph HH^{\prime} of HH maximizing vV(H)fv(dH(v))\sum_{v\in V(H)}f_{v}(d_{H^{\prime}}(v)) is solvable in polynomial time, where dH(v)d_{H^{\prime}}(v) is the degree of vv in HH^{\prime}.

We first enumerate all minimum cuts of GG in polynomial time. If GG has no kk minimum cuts, then the instance is trivially infeasible. Suppose otherwise. We construct a graph HH whose vertex set corresponds to EE, and the edge set of HH is defined as follows. For each pair e,fEe,f\in E, we add an edge between ee and ff to HH if {e,f}\{e,f\} is a cut of GG. Obviously, the graph HH can be constructed in polynomial time. For each eEe\in E, we let fe(i):=w(e)i(ki)f_{e}(i):=w(e)\cdot i\cdot(k-i) for 0ik0\leq i\leq k and fe(i)=f_{e}(i)=\infty for i>ki>k. Clearly, the function fef_{e} is convex. Let C1,,CkEC_{1},\ldots,C_{k}\subseteq E be kk minimum cuts of GG. For each ee, we denote by m(e)m(e) the number of occurrences of ee among C1,,CkC_{1},\ldots,C_{k}. Since each edge in EE contributes w(e)m(e)(km(e))w(e)\cdot m(e)\cdot(k-m(e)) to dsum({C1,,Ck})d_{\rm sum}(\{C_{1},\ldots,C_{k}\}), we immediately have the following lemma.

Lemma 22.

HH has a subgraph HH^{\prime} of kk edges such that vV(H)fe(dH(e))t\sum_{v\in V(H)}f_{e}(d_{H^{\prime}}(e))\geq t if and only if there are kk edge cuts C1,,CkEC_{1},\ldots,C_{k}\subseteq E of GG with |Ci|=2|C_{i}|=2 for 1ik1\leq i\leq k such that dsum({C1,,Ck})td_{\rm sum}(\{C_{1},\ldots,C_{k}\})\geq t.

By Lemma 22 and Theorem 21, Diverse Minimum Cuts can be solved in |V|O(1)\left|V\right|^{O(1)} time, proving Theorem 20.

4.4 Interval schedulings

For a pair of integers aa and bb with aba\leq b, the set of all numbers between aa and bb is denoted by [a,b][a,b]. We call I=[a,b]I=[a,b] an interval. For a pair of intervals I=[a,b]I=[a,b] and J=[c,d]J=[c,d], we say that II overlaps JJ if IJI\cap J\neq\emptyset. For a set of intervals 𝒮={I1,,Ir}\mathcal{S}=\{I_{1},\ldots,I_{r}\}, we say that 𝒮\mathcal{S} is a valid scheduling (or simply a scheduling) if for any pair of intervals Ii,Ij𝒮I_{i},I_{j}\in\mathcal{S}, IiI_{i} does not overlap IjI_{j}. In particular, we call 𝒮\mathcal{S} an rr-scheduling if |𝒮|=r\left|\mathcal{S}\right|=r for rr\in\mathbb{N}. In this section, we deal with the following problem.

Definition 23 (Diverse Interval Schedulings).

Given a set of intervals ={I1,,In}\mathcal{I}=\{I_{1},\ldots,I_{n}\}, a weight function w:>0w\colon\mathcal{I}\to\mathbb{R}_{>0}, and integers kk and rr, the task of Diverse Interval Schedulings is to find kk distinct rr-schedulings 𝒮1,,𝒮k\mathcal{S}_{1},\ldots,\mathcal{S}_{k}\subseteq\mathcal{I} that maximize dsum({𝒮1,,𝒮k})d_{\rm sum}(\{\mathcal{S}_{1},\ldots,\mathcal{S}_{k}\}).

Since the problem of partitioning a set of intervals \mathcal{I} into kk scheduling 𝒮1,,𝒮k\mathcal{S}_{1},\ldots,\mathcal{S}_{k} such that each 𝒮i\mathcal{S}_{i} has exactly rr intervals is known to be NP-hard (Bodlaender and Jansen, 1995; Gardi, 2009)111Note that the NP-hardness is proven for the case that each 𝒮i\mathcal{S}_{i} has at most rr intervals, but a simple reduction proves the NP-hardness of this variant., by 1, we have the following theorem.

Theorem 24.

Diverse Interval Schedulings is \NP-hard.

To apply Theorem 7 to Diverse Interval Schedulings, it suffices to give a polynomial-time algorithm for Weighted Extension for interval schedulings. Observe that if 𝐼𝑛\mathit{In} is not a scheduling, then there is no scheduling containing 𝐼𝑛\mathit{In}. Observe also that we can remove all intervals included in 𝐸𝑥\mathit{Ex} or overlapping some interval in 𝐼𝑛\mathit{In}. Thus, the problem can be reduced to the one for finding a maximum weight scheduling with cardinality r=r|𝐼𝑛|r^{\prime}=r-\left|\mathit{In}\right|. This problem can be solved in polynomial time by using a simple dynamic programming approach.

Lemma 25.

Given a set \mathcal{I} and w:w^{\prime}\colon\mathcal{I}\to\mathbb{R} and rr^{\prime}\in\mathbb{N}, there is a polynomial-time algorithm finding a maximum weight rr^{\prime}-scheduling in O(||2r)O(\left|\mathcal{I}\right|^{2}r^{\prime}) time.

Proof.

The algorithm is analogous to that to find a maximum weight independent set on interval graphs, which is roughly sketched as follows. We assume that ={I1,I2,,In}\mathcal{I}=\{I_{1},I_{2},\ldots,I_{n}\} is sorted with respect to their right end points. We define opt(p,q){\rm opt}(p,q) as the maximum total weight of a qq-scheduling 𝒮\mathcal{S} in {I1,I2,,Ip}\{I_{1},I_{2},\ldots,I_{p}\} such that Ip𝒮I_{p}\in\mathcal{S} for 0pn0\leq p\leq n and 0qr0\leq q\leq r^{\prime}. Then, the values of opt(p,q){\rm opt}(p,q) for all pp and qq can be computed by a standard dynamic programming algorithm in time O(||2r)O(\left|\mathcal{I}\right|^{2}r^{\prime}). ∎

By Theorem 7 and Lemma 9, we obtain a polynomial-time approximation algorithm for Diverse Interval Schedulings with factor max(12/k,1/2)\max(1-2/k,1/2).

Finally, we show that Diverse Interval Schedulings can be solved in polynomial time for fixed kk using a dynamic programming approach, which implies a PTAS for Diverse Interval Schedulings.

Similarly to the proof of Lemma 25, assume that ={I1,I2,,In}\mathcal{I}=\{I_{1},I_{2},\ldots,I_{n}\} is sorted with respect to their right end points. Let [k]={1,2,,k}[k]=\{1,2,\ldots,k\}. For each 0p||0\leq p\leq\left|\mathcal{I}\right|, we consider a tuple T=(p,L,R,Γ)T=(p,L,R,\Gamma), where LL and RR are vectors in ([n]{0})k([n]\cup\{0\})^{k} and ([r]{0})k([r]\cup\{0\})^{k}, respectively, and Γ\Gamma is a subset of ([k]2)\binom{[k]}{2}. Clearly, the number of tuples is O(n(n+1)k(r+1)k2(k2))O(n(n+1)^{k}(r+1)^{k}2^{\binom{k}{2}}), which is polynomial when kk is a constant. We denote by i\ell_{i} and rir_{i} the iith component of LL and RR, respectively. For a tuple T=(p,L,R,Γ)T=(p,L,R,\Gamma), the value opt(T){\rm opt}(T) is the maximum value of dsum({𝒮1,,𝒮k})d_{\rm sum}(\{\mathcal{S}_{1},\ldots,\mathcal{S}_{k}\}) for kk schedulings under the following four conditions: (1) the maximum index of an interval in 1ik𝒮i\bigcup_{1\leq i\leq k}\mathcal{S}_{i} is pp (p=0p=0 if 1ik𝒮i=\bigcup_{1\leq i\leq k}\mathcal{S}_{i}=\emptyset); (2) for 1ik1\leq i\leq k, the maximum index of an interval in 𝒮i\mathcal{S}_{i} is i\ell_{i} (i=0\ell_{i}=0 if 𝒮i=\mathcal{S}_{i}=\emptyset); (3) for 1ik1\leq i\leq k, |𝒮i|=ri\left|\mathcal{S}_{i}\right|=r_{i}; and (4) for 1i<jk1\leq i<j\leq k, {i,j}Γ\{i,j\}\in\Gamma if and only if 𝒮i\mathcal{S}_{i} and 𝒮j\mathcal{S}_{j} are distinct.

We define opt(T)={\rm opt}(T)=-\infty if no such a set of scheduings exists. When R=(r,r,,r)R=(r,r,\ldots,r) and Γ=([k]2)\Gamma=\binom{[k]}{2}, there is a set of kk distinct rr-schedulings that have the sum diversity opt(T){\rm opt}(T) unless opt(T)={\rm opt}(T)=-\infty. For a tuple TT, we say that a set of kk schedulings is valid for TT if it satisfies the above four conditions. Hence, among the tuples of the form (p,L,R,Γ)(p,L,R,\Gamma) with R=(r,,r)R=(r,\ldots,r) and Γ=([k]2)\Gamma=\binom{[k]}{2}, opt(T){\rm opt}(T) is the optimal value for Diverse Interval Schedulings. We next explain the outline of our dynamic programming algorithm to compute opt(T){\rm opt}(T) for any TT.

As a base case, p=0p=0, L=(0,,0)L=(0,\ldots,0), R=(0,,0)R=(0,\ldots,0), and Γ=\Gamma=\emptyset if and only if opt(T)=0{\rm opt}(T)=0. Let TT^{\prime} be a tuple (p,L,R,Γ)(p^{\prime},L^{\prime},R^{\prime},\Gamma^{\prime}) that satisfies the following conditions: (1)p<pp^{\prime}<p; (2) for any 1ik1\leq i\leq k, ii\ell^{\prime}_{i}\leq\ell_{i} and ririr^{\prime}_{i}\leq r_{i}; and (3) ΓΓ\Gamma^{\prime}\subseteq\Gamma. We say that a tuple TT^{\prime} satisfying the above conditions is dominated by TT. We denote the set of tuples dominated by TT as D(T)D(T). Let C(T)={i:i=p}C(T)=\{i:\ell_{i}=p\}. A tuple TT^{\prime} is valid for TT if TT^{\prime} satisfies the following conditions: (1) TD(T)T^{\prime}\in D(T); (2) if iC(T)i\in C(T) and i>0\ell_{i}>0, then interval IiI_{\ell_{i}} does not overlap with IpI_{p}; (3) if iC(T)i\in C(T), ri=ri1r^{\prime}_{i}=r_{i}-1, otherwise, ri=rir^{\prime}_{i}=r_{i}; and (4) Γ=ΓP(T)\Gamma=\Gamma^{\prime}\cup P(T) with P(T){{i,j}([k]2):|{i,j}C(T)|=1}P(T)\coloneqq\{\{i,j\}\in\binom{[k]}{2}:\left|\{i,j\}\cap C(T)\right|=1\}. We denote the set of valid tuples for TT as V(T)V(T). We compute opt(T){\rm opt}(T) using the following lemma.

Lemma 26.

For a tuple TT,

opt(T)=maxTV(T)(opt(T)+w(Ip)|C(T)|(k|C(T)|)).\displaystyle{\rm opt}(T)=\underset{T^{\prime}\in V(T)}{\max}({\rm opt}(T^{\prime})+w(I_{p})\cdot\left|C(T)\right|\cdot(k-\left|C(T)\right|)).
Proof.

Let T=(p,L,R,Γ)T=(p,L,R,\Gamma). Let 𝒮={𝒮1,,𝒮k}\mathcal{S}=\{\mathcal{S}_{1},\ldots,\mathcal{S}_{k}\} be a valid set of schedulings with dsum({𝒮1,,𝒮j})=opt(T)d_{\rm sum}(\{\mathcal{S}_{1},\ldots,\mathcal{S}_{j}\})={\rm opt}(T). Then, 𝒮=(𝒮1{Ip},,𝒮k{Ip})\mathcal{S}^{\prime}=(\mathcal{S}_{1}\setminus\{I_{p}\},\ldots,\mathcal{S}_{k}\setminus\{I_{p}\}) is a valid set of scheduings for TV(T)T^{\prime}\in V(T). Moreover, dsum(𝒮)=dsum(𝒮)+w(Ip)|C(T)|(k|C(T)|)d_{\rm sum}(\mathcal{S})=d_{\rm sum}(\mathcal{S}^{\prime})+w(I_{p})\cdot|C(T)|\cdot(k-|C(T)|) as IpI_{p} contributes w(Ip)|C(T)|(k|C(T)|)w(I_{p})\cdot|C(T)|\cdot(k-|C(T)|) to the diversity. Thus, the left-hand side is at most the right-hand side.

Conversely, let TT^{\prime} be a tuple maximizing the left-hand side and let 𝒮={𝒮1,,𝒮k}\mathcal{S}^{\prime}=\{\mathcal{S}^{\prime}_{1},\ldots,\mathcal{S}^{\prime}_{k}\} be a valid set of schedulings for TT^{\prime}. For each 1ik1\leq i\leq k, we set 𝒮i=𝒮i{Ip}\mathcal{S}_{i}=\mathcal{S}^{\prime}_{i}\cup\{I_{p}\} if iC(T)i\in C(T) and 𝒮i=𝒮i\mathcal{S}_{i}=\mathcal{S}^{\prime}_{i} otherwise. By condition (2) in the definition of a valid tuple, each interval in 𝒮i\mathcal{S}^{\prime}_{i} does not overlap with IpI_{p}, meaning that 𝒮i\mathcal{S}_{i} is a scheduling. Thus, the right-hand side is at most the left-hand side. ∎

From the above lemma, we can compute opt(T){\rm opt}(T) for any TT in polynomial time when kk is a constant. Moreover, from opt(T){\rm opt}(T), we can find kk schedulings with the maximum sum diversity by a standard trace back technique. Combining the approximation algorithm and the above algorithm, we obtain a PTAS.

Theorem 27.

Diverse Interval Schedulings admits a PTAS.

5 Conclusion

In this paper, we give a framework for designing approximation algorithms for Max-Sum Diverse Solutions. This framework runs in poly(|E|+k){\rm poly}(\left|E\right|+k) time and versatile, which can be applied to the diverse version of several well-studied combinatorial problems, i.e., Diverse Matchings, Diverse Matroid Common Bases, Diverse Minimum Cuts, and Diverse Interval Schedulings. The key to applying our framework is the polynomial-time solvability of Weighted Extension, which yields constant-factor approximation algorithms for Diverse Matchings and Diverse Matroid Common Bases. Moreover, we obtain a PTAS for Max-Sum Diverse Solutions if we can solve the problem in polynomial time for fixed kk, yielding PTASes for Diverse Minimum Cuts and Diverse Interval Schedulings.

References

  • Apollonio and Sebö [2009] Nicola Apollonio and András Sebö. Minconvex factors of prescribed size in graphs. SIAM J. Discret. Math., 23(3):1297–1310, 2009.
  • Baste et al. [2019] Julien Baste, Lars Jaffke, Tomás Masarík, Geevarghese Philip, and Günter Rote. FPT algorithms for diverse collections of hitting sets. Algorithms, 12(12):254, 2019.
  • Baste et al. [2022] Julien Baste, Michael R. Fellows, Lars Jaffke, Tomáš Masařík, Mateus de Oliveira Oliveira, Geevarghese Philip, and Frances A. Rosamond. Diversity of solutions: An exploration through the lens of fixed-parameter tractability theory. Artificial Intelligence, 303:103644, 2022.
  • Bérczi and Schwarcz [2021] Kristóf Bérczi and Tamás Schwarcz. Complexity of packing common bases in matroids. Math. Program., 188(1):1–18, 2021.
  • Birnbaum and Goldman [2009] Benjamin E. Birnbaum and Kenneth J. Goldman. An improved analysis for a greedy remote-clique algorithm using factor-revealing LPs. Algorithmica, 55(1):42–59, 2009.
  • Bodlaender and Jansen [1995] Hans L. Bodlaender and Klaus Jansen. Restrictions of graph partition problems. part I. Theor. Comput. Sci., 148(1):93–109, 1995.
  • Cevallos et al. [2016] Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. Max-Sum Diversity Via Convex Programming. In 32nd International Symposium on Computational Geometry (SoCG 2016), volume 51 of Leibniz International Proceedings in Informatics (LIPIcs), pages 26:1–26:14, Dagstuhl, Germany, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
  • Cevallos et al. [2019] Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. An improved analysis of local search for max-sum diversification. Math. Oper. Res., 44(4):1494–1509, 2019.
  • Deza and Laurent [1997] Michel Marie Deza and Monique Laurent. Geometry of cuts and metrics, volume 2. Springer, 1997.
  • Duan and Pettie [2014] Ran Duan and Seth Pettie. Linear-time approximation for maximum weight matching. J. ACM, 61(1), jan 2014.
  • Edmonds [1965] Jack Edmonds. Paths, trees, and flowers. Canadian J. Math., 17:449–467, 1965.
  • Eppstein [2008] David Eppstein. k-best enumeration. In Encyclopedia of Algorithms, pages 1–4. Springer US, Boston, MA, 2008.
  • Fernau et al. [2019] Henning Fernau, Petr Golovach, Marie-France Sagot, et al. Algorithmic enumeration: Output-sensitive, input-sensitive, parameterized, approximative (dagstuhl seminar 18421). In Dagstuhl Reports, volume 8. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
  • Fomin et al. [2020] Fedor V. Fomin, Petr A. Golovach, Lars Jaffke, Geevarghese Philip, and Danil Sagunov. Diverse pairs of matchings. In 31st International Symposium on Algorithms and Computation, ISAAC 2020, December 14-18, 2020, Hong Kong, China (Virtual Conference), volume 181 of LIPIcs, pages 26:1–26:12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
  • Fomin et al. [2021] Fedor V. Fomin, Petr A. Golovach, Fahad Panolan, Geevarghese Philip, and Saket Saurabh. Diverse collections in matroids and graphs. In 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), volume 187 of LIPIcs, pages 31:1–31:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
  • Gardi [2009] Frédéric Gardi. Mutual exclusion scheduling with interval graphs or related classes, part I. Discret. Appl. Math., 157(1):19–35, 2009.
  • Garey et al. [1974] M. R. Garey, David S. Johnson, and Larry J. Stockmeyer. Some simplified np-complete problems. In Robert L. Constable, Robert W. Ritchie, Jack W. Carlyle, and Michael A. Harrison, editors, Proceedings of the 6th Annual ACM Symposium on Theory of Computing, April 30 - May 2, 1974, Seattle, Washington, USA, pages 47–63. ACM, 1974.
  • Gillenwater et al. [2015] Jennifer Gillenwater, Rishabh K. Iyer, Bethany Lusch, Rahul Kidambi, and Jeff A. Bilmes. Submodular hamming metrics. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 3141–3149, 2015.
  • Hanaka et al. [2021a] Tesshu Hanaka, Yasuaki Kobayashi, Kazuhiro Kurita, See Woo Lee, and Yota Otachi. Computing diverse shortest paths efficiently: A theoretical and experimental study, 2021.
  • Hanaka et al. [2021b] Tesshu Hanaka, Yasuaki Kobayashi, Kazuhiro Kurita, and Yota Otachi. Finding diverse trees, paths, and more. Proceedings of the AAAI Conference on Artificial Intelligence, 35(5):3778–3786, May 2021.
  • Hao et al. [2020] Fei Hao, Zheng Pei, and Laurence T. Yang. Diversified top-k maximal clique detection in social internet of things. Future Generation Computer Systems, 107:408–417, 2020.
  • Karger [2000] David R. Karger. Minimum cuts in near-linear time. J. ACM, 47(1):46–76, jan 2000.
  • Lawler [1972] Eugene L. Lawler. A procedure for computing the kk best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401–405, 1972.
  • Oxley [2006] James G. Oxley. Matroid Theory (Oxford Graduate Texts in Mathematics). Oxford University Press, Inc., USA, 2006.
  • Ravi et al. [1994] S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299–310, 1994.
  • Schrijver [2003] A. Schrijver. Combinatorial Optimization - Polyhedra and Efficiency. Springer, 2003.
  • Vazirani and Yannakakis [1992] Vijay V. Vazirani and Mihalis Yannakakis. Suboptimal cuts: Their enumeration, weight and number. In W. Kuich, editor, Automata, Languages and Programming, pages 366–377, Berlin, Heidelberg, 1992. Springer Berlin Heidelberg.
  • Wang et al. [2013] Jia Wang, James Cheng, and Ada Wai-Chee Fu. Redundancy-aware maximal cliques. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013, pages 122–130. ACM, 2013.
  • Yeh et al. [2010] Li-Pu Yeh, Biing-Feng Wang, and Hsin-Hao Su. Efficient algorithms for the problems of enumerating cuts by non-decreasing weights. Algorithmica, 56(3):297–312, mar 2010.
  • Yuan et al. [2016] Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjie Zhang. Diversified top-k clique search. VLDB J., 25(2):171–196, 2016.