\newtheoremrep

lemmaLemma \newtheoremreptheorem[lemma]Theorem \newtheoremrepproposition[lemma]Lemma

A Framework to Design Approximation Algorithms for Finding Diverse Solutions in Combinatorial Problems

Tesshu Hanaka¹ Masashi Kiyomi² Yasuaki Kobayashi³ Yusuke Kobayashi³ Kazuhiro Kurita⁴&Yota Otachi¹ ¹Nagoya University, Nagoya, Japan
²Seikei University, Tokyo, Japan
³Kyoto University, Kyoto, Japan
⁴National Institute of Informatics, Tokyo, Japan

Abstract

Finding a single best solution is the most common objective in combinatorial optimization problems. However, such a single solution may not be applicable to real-world problems as objective functions and constraints are only “approximately” formulated for original real-world problems. To solve this issue, finding multiple solutions is a natural direction, and diversity of solutions is an important concept in this context. Unfortunately, finding diverse solutions is much harder than finding a single solution. To cope with difficulty, we investigate the approximability of finding diverse solutions. As a main result, we propose a framework to design approximation algorithms for finding diverse solutions, which yields several outcomes including constant-factor approximation algorithms for finding diverse matchings in graphs and diverse common bases in two matroids and PTASes for finding diverse minimum cuts and interval schedulings.

1 Introduction

One way to solve a real-world problem is to formulate the problem as a mathematical optimization problem and find a solution with an optimization algorithm. However, it is not always easy to formulate an appropriate optimization problem as real-world problems often include intricate constraints and implicit preferences, which are usually simplified in order to solve optimization problems. Hence, an optimal solution obtained in this way is not guaranteed to be a “good solution” to the original real-world problem. To cope with this underlying inconsistency, the following two-stage approach would be promising: algorithms find multiple solutions and then users find what they like from these solutions. One may think that top- $k$ enumeration algorithms (see (Eppstein, 2008) for a survey) can be used for this purpose. However, this is not always the case since top- $k$ enumeration algorithms may output solutions similar to one another. (See (Wang et al., 2013; Yuan et al., 2016; Hao et al., 2020), for example). Such a set of solutions are not useful as a “catalog” of solutions provided to users.

As a way to solve this issue, algorithms are expected to find “diverse” solutions, and algorithms for finding “diverse” solutions have received considerable attention in many fields such as artificial intelligence (Hanaka et al., 2021b, a; Baste et al., 2022), machine learning (Gillenwater et al., 2015), and data mining (Wang et al., 2013; Yuan et al., 2016). There are many directions in the research on finding diverse solutions, depending on definitions of solutions and diversity measures. Given these rich applications, the diverse X paradigm was proposed by Fellows and Rosamond in Dagstuhl Seminar 18421 (Fernau et al., 2019). In this paradigm, “X” is a placeholder that represents solutions we are looking for, and they asked for theoretical investigations of finding diverse solutions. Since the problem of finding diverse solutions is much harder than that of finding a single solution for some “X”, it would be reasonable to consider the problem from the perspective of fixed-parameter tractability. From this proposition, several fixed-parameter tractable (FPT) algorithms are developed. Baste et al. gave algorithms for finding diverse solutions related to hitting sets (Baste et al., 2019) and those on bounded-treewidth graphs (Baste et al., 2022). Hanaka et al. (Hanaka et al., 2021b) proposed a framework to obtain FPT algorithms for finding diverse solutions in various combinatorial problems. Fomin et al. (Fomin et al., 2020, 2021) investigated the fixed-parameter tractability of finding diverse solutions related to matchings and matroids. In these results, the running time bounds of these FPT algorithms exponentially depend on the number of solutions we are looking for, which would be prohibitive for computing moderate numbers of solutions.

In this paper, we aim to develop theoretically efficient algorithms for finding moderate numbers of diverse solutions. As we mentioned, the problem of finding diverse solutions is harder than that of finding a single solution. For example, the problem of computing a maximum matching in a graph is known to be solvable in polynomial time, whereas that of computing two maximum matchings $M_{1}$ and $M_{2}$ maximizing $|M_{1}\mathbin{\triangle}M_{2}|$ is known to be NP-hard (Fomin et al., 2020). In this paper, we aim to develop theoretically efficient algorithms for finding moderate numbers of diverse solutions.

Our main result is a framework for designing efficient approximation algorithms with constant approximation factors for finding diverse solutions in combinatorial problems. We employ the sum of pairwise weighted Hamming distances among solutions as our diversity measure (see Section 2 for its definition), while some previous work (Fomin et al., 2021) employs the minimum of weighted Hamming distances. Roughly speaking, our approximation framework says that if we can enumerate top- $k$ weighted solutions in polynomial time, then we can obtain in polynomial time unweighted solutions maximizing our diversity measure with constant approximation factors. Moreover, suppose that we can exactly maximize our diversity of solutions in polynomial time when the number of solutions we are looking for is bounded by a constant. Then, our framework yields a polynomial-time approximation scheme (PTAS), meaning that factor- $(1-\varepsilon)$ approximation in polynomial time for every constant $\varepsilon>0$ . By applying our framework, we obtain efficient constant-factor approximation algorithms for finding diverse solutions of matchings in a graph and of common bases of two matroids, while PTASes for finding diverse solutions of minimum cuts and of interval schedulings. Let us note that these diversity maximization problems are unlikely to be solvable in polynomial time, which will be discussed later.

2 Preliminaries

We denote the set of real numbers, the set of non-negative real numbers, and the set of positive real numbers as $\mathbb{R}$ , $\mathbb{R}_{\geq 0}$ , and $\mathbb{R}_{>0}$ , respectively. Let $E$ be a set. We denote the set of all subsets of $E$ as $2^{E}$ .

A function $d\colon E\times E\to\mathbb{R}_{\geq 0}$ is called a metric (on $E$ ) if it satisfies the following conditions: for $x,y,z\in E$ , (1) $d(x,y)=0$ if and only if $x=y$ ; (2) $d(x,y)=d(y,x)$ ; (3) $d(x,z)\leq d(x,y)+d(y,z)$ . Suppose that $E\subseteq\mathbb{R}^{m}$ for some integer $m$ . For $x\in E$ , we denote by $x_{i}$ the $i$ th component of $x$ . If $d(x,y)=\sum_{1\leq i\leq m}|x_{i}-y_{i}|$ holds for $x,y\in E$ , then $d$ is called an $\ell_{1}$ -metric.

Let $E$ be a finite set. For $X,Y\subseteq E$ , the symmetric difference between $X$ and $Y$ is denoted by $X\mathbin{\triangle}Y$ (i.e., $X\mathbin{\triangle}Y=(X\setminus Y)\cup(Y\setminus X)$ ). Let $w:E\to\mathbb{R}_{>0}$ . A weighted Hamming distance is a function $d:2^{E}\times 2^{E}\to\mathbb{R}_{\geq 0}$ such that for $X,Y\subseteq E$ , $d_{w}(X,Y)=w(X\mathbin{\triangle}Y)$ , where $w(Z)=\sum_{x\in Z}w(x)$ for $Z\subseteq E$ . Suppose that $E=\{1,2,\ldots,m\}$ . We can regard each subset $X\subseteq E$ as an $m$ -dimensional vector $x=(x_{1},\ldots,x_{m})$ defined by $x_{i}=w(i)$ if $i\in X$ and $x_{i}=0$ otherwise, for $1\leq i\leq m$ . It is easy to observe that for $X,Y\subseteq E$ , $d_{w}(X,Y)=\sum_{1\leq i\leq m}|x_{i}-y_{i}|$ , where $x$ and $y$ are the vectors corresponding to $X$ and $Y$ , respectively. Thus, the weighted Hamming distance $d_{w}$ can be considered as an $\ell_{1}$ -metric.

In this paper, we focus on the following diversity measure $d_{\rm sum}(\cdot)$ , called the sum diversity. Let ${\mathcal{Y}}=\{Y_{1},\dots,Y_{k}\}$ be a collection of subsets of $E$ and $w\colon E\to\mathbb{R}_{\geq 0}$ be a weight function. We define

d_{\rm sum}(\mathcal{Y})=\displaystyle\sum_{1\leq i<j\leq k}d_{w}(Y_{i},Y_{j}).

Our problem Max-Sum Diverse Solutions is defined as follows.

Definition 1 (Max-Sum Diverse Solutions).

Given a finite set $E$ , an integer $k$ , a weight function $w\colon E\to\mathbb{R}_{\geq 0}$ , and a membership oracle for $\mathcal{X}\subseteq 2^{E}$ , the task of Max-Sum Diverse Solutions is to find a set $\mathcal{Y}=\{Y_{1},Y_{2},\ldots,Y_{k}\}$ of $k$ distinct subsets $Y_{1},Y_{2},\ldots,Y_{k}\in\mathcal{X}$ that maximizes the sum diversity $d_{\rm sum}(\mathcal{Y})$ .

Each set in $\mathcal{X}$ is called a feasible solution. In Max-Sum Diverse Solutions, the set $\mathcal{X}$ of feasible solutions is not given explicitly, while we can test whether a set $X\subseteq E$ belongs to $\mathcal{X}$ . Our problem Max-Sum Diverse Solutions is highly related to the problem of packing disjoint feasible solutions.

Observation 1.

Suppose that all sets in $\mathcal{X}$ have the same cardinality $r$ and $w(e)=1$ for $e\in E$ . Let $Y_{1},Y_{2},\ldots,Y_{k}\in\mathcal{X}$ be $k$ distinct subsets. Then, $d_{\rm sum}(\{Y_{1},\ldots,Y_{k}\})\geq kr(k-1)$ if and only if $Y_{i}\cap Y_{j}=\emptyset$ for $1\leq i<j\leq k$ .

This observation implies several hardness results of Max-Sum Diverse Solutions, which will be discussed in Section 4.

We particularly focus on the approximability of Max-Sum Diverse Solutions for specific sets of feasible solutions. For a maximization problem, we say that an approximation algorithm has factor $0<\alpha\leq 1$ if given an instance $I$ , the algorithm outputs a solution with objective value ${\rm ALG}(I)$ such that ${\rm ALG}(I)/{\rm OPT}(I)\geq\alpha$ , where ${\rm OPT}(I)$ is the optimal value for $I$ . A polynomial-time approximation scheme is an approximation algorithm that takes an instance $I$ and a constant $\varepsilon>0$ , the algorithm outputs a solution with ${\rm ALG(I)/{\rm OPT}(I)}\geq 1-\varepsilon$ in polynomial time.

2.1 A technique for Max-Sum Diversification

Our framework is based on approximation algorithms for a similar problem Max-Sum Diversification. Let $X$ be a set and let $d\colon X\times X\to\mathbb{R}_{\geq 0}$ be a metric. In what follows, for $Y\subseteq X$ , we denote $\sum_{x,y\in Y}d(x,y)$ as $d(Y)$ .

Definition 2 (Max-Sum Diversification).

Given a metric $d\colon X\times X\to\mathbb{R}_{\geq 0}$ on a finite set $X$ and an integer $k$ , the task of Max-Sum Diversification is to find a subset $Y\subseteq X$ with $|Y|=k$ that maximizes $d(Y)$ .

Max-Sum Diversification is studied under various names such as MAX-AVG Facility Dispersion and Remote-clique (Cevallos et al., 2019; Ravi et al., 1994). Max-Sum Diversification is known to be \NP-hard (Ravi et al., 1994). Cevallos et al. (Cevallos et al., 2019) devised a PTAS for Max-Sum Diversification. Their algorithm is based on a rather simple local search technique, but their analysis of the approximation factor and the iteration bound are highly nontrivial. Our framework for Max-Sum Diverse Solutions is based on their algorithm, which is briefly sketched below.

1 Procedure LocalSearch( $X,d,k$ )

Y\leftarrow

arbitrary

k

elements in

X

3 for $i=1,\ldots,\lceil\frac{k(k-1)}{(k+1)}\ln(\frac{(k+2)(k-1)^{2}}{4})\rceil$ do

4 if $\exists$ pair $(x,y)\in(X\setminus Y)\times Y$ such that $d(Y-y+x)>d(Y)$ then

(x,y)\leftarrow\underset{(x,y)\in(X\setminus Y)\times Y}{\arg\max}d(Y-y+x)

Y\leftarrow Y-y+x

8 Output

Y

Algorithm 1 A

(1-2/k)

-approximation algorithm for Max-Sum Diversification (Cevallos et al., 2019).

A pseudocode of the algorithm due to (Cevallos et al., 2019) is given in Algorithm 1. In this algorithm, we first pick an arbitrary set of $k$ elements in $X$ , which is denoted by $Y\subseteq X$ . Then, we find a pair of elements $x\in X\setminus Y$ and $y\in Y$ that maximizes $d(Y-y+x)$ and update $Y$ by $Y-y+x$ if $d(Y-y+x)>d(Y)$ . We repeat this update procedure $\lceil\frac{k(k-1)}{(k+1)}\ln(\frac{(k+2)(k-1)^{2}}{4})\rceil=O(k\log k)$ times. Since we can find a pair $(x,y)$ in $O(\left|X\right|k\tau)$ time, where $\tau$ is the running time to evaluate the distance function $d(x,y)$ for $x,y\in X$ , the following lemma holds.

Lemma 3.

Algorithm 1 runs in time $O(\left|X\right|k^{2}\tau\log k)$ .

They showed that if the metric $d$ is negative type, then the approximation ratio of Algorithm 1 is at least $1-2/k$ (Cevallos et al., 2019). Here, we do not give the precise definition of a negative type metric but mention that every $\ell_{1}$ -metric is negative type (Deza and Laurent, 1997; Cevallos et al., 2016).

Theorem 4 ((Cevallos et al., 2019)).

If $d\colon X\times X\to\mathbb{R}_{\geq 0}$ is a negative type metric, then the approximation ratio of Algorithm 1 is $1-2/k$ .

They further observed that the above theorem implies that Max-Sum Diversification admits a PTAS as follows. Let $\epsilon$ be a positive constant. When $\epsilon<2/k$ , that is, $k<2/\epsilon$ , then $k$ is constant. Thus, we can solve Max-Sum Diversification in time $\left|X\right|^{O(1/\epsilon)}$ by using a brute-force search. Otherwise, the above $(1-2/k)$ -approximation algorithm achieves factor $1-\epsilon$ . Thus, Max-Sum Diversification admits a PTAS, provided that $d$ is a negative type metric.

Corollary 5 ((Cevallos et al., 2019)).

If $d\colon X\times X\to\mathbb{R}_{\geq 0}$ is a negative type metric, then Max-Sum Diversification admits a PTAS.

3 A framework for finding diverse solutions

In this section, we propose a framework for designing approximation algorithms for Max-Sum Diverse Solutions. The basic strategy to our framework is the local search algorithm described in the previous section. Let $E$ be a finite set and let $\mathcal{X}\subseteq 2^{E}$ be a set of feasible solutions. We set $X\coloneqq\mathcal{X}$ and apply the local search algorithm for Max-Sum Diversification to $(X,d_{w},k)$ . Recall that our diversity measure $d_{\rm sum}$ is the sum of weighted Hamming distances $d_{w}$ . Moreover, $d_{w}$ is an $\ell_{1}$ -metric, as observed in the previous section. By Theorem 4, the local search algorithm for Max-Sum Diversification has approximation factor $1-2/k$ . However, the running time of a straightforward application of Lemma 3 is $O(\left|\mathcal{X}\right|\cdot\left|E\right|k^{2}\log k)$ even if the feasible solutions in $\mathcal{X}$ can be enumerated in $O(\left|\mathcal{X}\right|\cdot\left|E\right|)$ total time, which may be exponential in the input size $\left|E\right|$ .

A main obstacle to applying the local search algorithm is that from a current set $\mathcal{Y}=\{Y_{1},\ldots,Y_{k}\}$ of feasible solutions, we need to find a pair of feasible solutions $(X,Y)\in(\mathcal{X}\setminus\mathcal{Y})\times\mathcal{Y}$ maximizing $d_{\rm sum}(\mathcal{Y}-Y+X)$ . To overcome this obstacle, we exploit top- $k$ enumeration algorithms. Let $w^{\prime}\colon E\to\mathbb{R}$ be a weight function. An algorithm $\mathcal{A}$ is called a top- $k$ enumeration algorithm for $(E,\mathcal{X},w^{\prime},k)$ if for a positive integer $k$ , $\mathcal{A}$ finds $k$ feasible solutions $Y_{1},\ldots,Y_{k}\in\mathcal{X}$ such that for any $Y\in\{Y_{1},\ldots,Y_{k}\}$ and $X\in\mathcal{X}\setminus\{Y_{1},\ldots,Y_{k}\}$ , $w^{\prime}(X)\leq w^{\prime}(Y)$ holds. By using $\mathcal{A}$ , we can compute the pair $(X,Y)$ as follows.

We first guess $Y\in\mathcal{Y}$ in the pair $(X,Y)$ and let $\mathcal{Y}^{\prime}=\mathcal{Y}\setminus\{Y\}$ . To find the pair $(X,Y)$ , it suffices to find $X\in\mathcal{X}\setminus\mathcal{Y}^{\prime}$ that maximizes $\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\triangle Y^{\prime})$ . For an element $e\in E$ , we define a new weight $w^{\prime}(e)\coloneqq w(e)(\mathit{Ex}(e,\mathcal{Y}^{\prime})-\mathit{In}(e,\mathcal{Y}^{\prime}))$ , where $\mathit{In}(e,\mathcal{Y}^{\prime})$ (resp. $\mathit{Ex}(e,\mathcal{Y}^{\prime})$ ) is the number of feasible solutions in $\mathcal{Y}^{\prime}$ that contain $e$ (resp. do not contain $e$ ). For notational convenience, we fix $\mathcal{Y}^{\prime}$ and write $\mathit{In}(e)$ and $\mathit{Ex}(e)$ to denote $\mathit{In}(e,\mathcal{Y}^{\prime})$ and $\mathit{Ex}(e,\mathcal{Y}^{\prime})$ , respectively. The following lemma shows that a feasible solution $X$ that maximizes $w^{\prime}(X)$ also maximizes $\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})$ .

Lemma 6.

For any feasible solution $X\in\mathcal{X}$ , $\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})=w^{\prime}(X)+\sum_{e\in E}w(e)\cdot\mathit{In}(e)$ .

Proof.

The contribution of $e\in X$ to $w(X\mathbin{\triangle}Y^{\prime})$ is $w(e)$ if $e\not\in Y^{\prime}$ , and $0$ otherwise. Thus, $e\in X$ contributes $w(e)\cdot\mathit{Ex}(e)$ to $\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})$ . Similarly, $e\in E\setminus X$ contributes $w(e)\cdot\mathit{In}(e)$ to $\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})$ . This gives us $\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})=w^{\prime}(X)+\sum_{e\in E}w(e)\cdot\mathit{In}(e)$ as follows.

	$\displaystyle\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})$
	$\displaystyle=\sum_{e\in X}w(e)\cdot\mathit{Ex}(e)+\sum_{e\in E\setminus X}w(e)\cdot\mathit{In}(e)$
	$\displaystyle=\sum_{e\in X}w(e)\cdot\mathit{Ex}(e)+\sum_{e\in E}w(e)\cdot\mathit{In}(e)-\sum_{e\in X}w(e)\cdot\mathit{In}(e)$
	$\displaystyle=\sum_{e\in X}w(e)(\mathit{Ex}(e)-\mathit{In}(e))+\sum_{e\in E}w(e)\cdot\mathit{In}(e)$
	$\displaystyle=w^{\prime}(X)+\sum_{e\in E}w(e)\cdot\mathit{In}(e).\qed$

From the above lemma, we can find the pair $(X,Y)$ with a top- $k$ enumeration algorithm $\mathcal{A}$ for $(E,\mathcal{X},w^{\prime},k)$ as follows. By Lemma 6, for any feasible solution $X\in\mathcal{X}$ , $\sum_{Y^{\prime}\in\mathcal{Y}^{\prime}}w(X\mathbin{\triangle}Y^{\prime})=w^{\prime}(X)+\sum_{e\in E}w(e)\cdot\mathit{In}(e)$ . Since the second term does not depend on $X$ , to find a feasible solution $X$ maximizing the left-hand side, it suffices to maximize $w^{\prime}(X)$ subject to $X\in\mathcal{X}\setminus\mathcal{Y}^{\prime}$ . The algorithm $\mathcal{A}$ allows us to find $k$ feasible solution $Z_{1},\ldots,Z_{k}$ such that $w^{\prime}(Z_{1})\geq\cdots\geq w^{\prime}(Z_{k})\geq w^{\prime}(Z)$ for any feasible solution $Z$ other than $Z_{1},\ldots,Z_{k}$ . As $|\mathcal{Y}^{\prime}|<k$ , at least one of these solutions provides such a solution $X$ .

The entire algorithm is as follows. We first find a set of $k$ distinct feasible solutions in $\mathcal{X}$ using the enumeration algorithm $\mathcal{A}$ . Then, we repeat the local update procedure described above $O(k\log k)$ times. Suppose that $\mathcal{A}$ enumerates $k$ feasible solutions in time $O((|E|+k)^{c})$ for some constant $c$ . Then, the entire algorithm runs in time $O((|E|+k)^{c}|E|k^{2}\log k)$ as we can compute the pair $(X,Y)$ in time $O((|E|+k)^{c}|E|k)$ by simply guessing $Y\in\mathcal{Y}$ .

Note that the approximation factor $1-2/k$ does not give a reasonable bound for $k=2$ . In this case, however, we still have an approximation factor $1/2$ with a greedy algorithm for Max-Sum Diversification (Birnbaum and Goldman, 2009), which is described as follows. Initially, we set $\mathcal{Y}=\{Y_{1}\}$ with arbitrary $Y_{1}\in\mathcal{X}$ . Then, we compute a feasible solution $Y_{2}\in\mathcal{X}\setminus\mathcal{Y}$ maximizing $\sum_{Y\in\mathcal{Y}}w(Y_{2}\triangle Y)$ . By Lemma 6 and the above discussion, we can find such a solution $Y_{2}$ with a top- $k$ enumeration algorithm for $(E,\mathcal{X},w^{\prime},k)$ , where $w^{\prime}(e)\coloneqq w(e)\cdot(\mathit{Ex}(e,\mathcal{Y})-\mathit{In}(e,\mathcal{Y}))$ for $e\in E$ . We repeat this $k-1$ times so that $\mathcal{Y}$ contains $k$ feasible solutions. As discussed in this section, the approximation factor of this algorithm is $1/2$ as in (Birnbaum and Goldman, 2009). Thus, the following theorem holds.

Theorem 7.

Let $E$ be a finite set, $\mathcal{X}\subseteq 2^{E}$ , and $w\colon E\to\mathbb{R}_{>0}$ . Suppose that there is a top- $k$ enumeration algorithm for $(E,\mathcal{X},w^{\prime},k)$ that runs in $O((\left|E\right|+k)^{c})$ time for a constant, where $w^{\prime}\colon E\to\mathbb{R}$ is an arbitrary weight function. Then, there is an $\alpha$ -approximation algorithm for Max-Sum Diverse Solutions that runs in $O((\left|E\right|+k)^{c}|E|k^{2}\log k)$ time, where $\alpha=\max(1-2/k,1/2)$ . Moreover, if there is a polynomial-time exact algorithm for Max-Sum Diverse Solutions for constant $k$ , then it admits a PTAS.

4 Applications of the framework

To complete the description of approximation algorithms based on our framework, we need to develop top- $k$ enumeration algorithms for specific problems. In what follows, we design top- $k$ enumeration algorithms for matchings, common bases of two matroids, and interval schedulings.

Our top- $k$ enumeration algorithms are based on a well-known technique used in (Lawler, 1972) (also discussed in (Eppstein, 2008)). The key to enumeration algorithms is the following Weighted Extension.

Definition 8 (Weighted Extension).

Given a finite set $E$ , a set of feasible solutions $\mathcal{X}\subseteq 2^{E}$ as a membership oracle, a weight function $w^{\prime}\colon E\to\mathbb{R}$ , and a pair of disjoint subsets $\mathit{In}$ and $\mathit{Ex}$ of $E$ , the task is to find a feasible solution $X\in\mathcal{X}$ that satisfies $\mathit{In}\subseteq X$ and $X\cap\mathit{Ex}=\emptyset$ maximizing $w^{\prime}(X)$ subject to these conditions.

If we can solve the above problem in $O(\left|E\right|^{c})$ time, then we can obtain a top- $k$ enumeration algorithm for $(E,\mathcal{X},w^{\prime},k)$ that runs in $O(k\left|E\right|^{c+1})$ time.

Lemma 9 ((Lawler, 1972)).

Suppose that Weighted Extension for $(E,\mathcal{X},w^{\prime},k)$ can be solved in $O(\left|E\right|^{c})$ time. Then, there is an $O(k\left|E\right|^{c+1})$ -time top- $k$ enumeration algorithm for $(E,\mathcal{X},w^{\prime},k)$ .

4.1 Matchings

Matching is one of the most fundamental combinatorial objects in graphs, and the polynomial-time algorithm for computing a maximum weight matching due to (Edmonds, 1965) is a cornerstone result in this context. Finding diverse matchings has also been studied so far (Hanaka et al., 2021b, a; Fomin et al., 2020, 2021). Let $G=(V,E)$ be a graph. A set of edges $M$ is a matching of $G$ if $M$ has no pair of edges that share a common endpoint. A matching $M$ is called a perfect matching of $G$ if every vertex in $G$ is incident to an edge in $M$ . By using our framework, we design an approximation algorithm for finding diverse matchings. The formal definition of the problem is as follows.

Definition 10 (Diverse Matchings).

Given a graph $G=(V,E)$ , a weight function $w\colon E\to\mathbb{R}_{>0}$ , and integers $k$ and $r$ , the task of Diverse Matchings is to find $k$ distinct matchings $M_{1},\ldots,M_{k}$ of size $r$ that maximize $d_{\rm sum}(\{M_{1},\ldots,M_{k}\})$ .

To apply our framework, it suffices to show that Weighted Extension for matchings can be solved in polynomial time. Our method is similar to a reduction from the maximum weight perfect matching problem to the maximum weight matching problem (Duan and Pettie, 2014). Let $\mathit{In},\mathit{Ex}\subseteq E$ be disjoint subsets of edges and let $w^{\prime}\colon E\to\mathbb{R}$ . Then, our goal is to find a matching $M$ of $G$ with $|M|=r$ such that $\textit{In}\subseteq M$ and $\textit{Ex}\cap M=\emptyset$ , and $M$ maximizes $w^{\prime}(M)$ subject to these constraints. This problem can be reduced to that of finding a maximum weight perfect matching as follows. We assume that In is a matching of $G$ as otherwise there is no matching containing it. Let $G^{\prime}=(V^{\prime},E^{\prime})$ be the graph obtained from $G$ by removing (1) all edges in Ex and (2) all end vertices of edges in In. Then, it is easy to see that $M$ is a matching of $G$ with $\textit{In}\subseteq M$ and $\textit{Ex}\cap M=\emptyset$ if and only if $M\setminus\textit{In}$ is a matching of $G^{\prime}$ . Thus, it suffices to find a maximum weight matching of size exactly $r^{\prime}=r-\left|\textit{In}\right|$ in $G^{\prime}$ . To this end, we add $|V^{\prime}|-2r^{\prime}$ vertices $U$ to $G^{\prime}$ and add all possible edges between vertices $v\in V^{\prime}$ and $u\in U$ . The graph obtained in this way is denoted by $H=(V^{\prime}\cup U,E\cup F)$ , where $F=\{\{u,v\}:u\in U,v\in V^{\prime}\}$ . We extend the weight function $w^{\prime}$ by setting $w^{\prime}(f)=0$ for $f\in F$ . Then, the following lemma holds.

Lemma 11.

Let $M^{*}$ be a maximum weight perfect matching in $H$ . Then, $M^{*}\setminus F$ is a matching of size $r^{\prime}$ in $G^{\prime}$ such that for every size- $r^{\prime}$ matching $M^{\prime}$ in $G^{\prime}$ , it holds that $w^{\prime}(M^{\prime})\leq w^{\prime}(M^{*}\setminus F)$ .

Proof.

Since $M^{*}$ is a perfect matching and any edge incident to $U$ is contained in $F$ , $M^{*}$ must contain exactly $|U|$ edges of $F$ . This implies that the perfect matching $M^{*}$ contains exactly $r^{\prime}$ edges of $G^{\prime}$ . Suppose that there is a size- $r^{\prime}$ matching $M^{\prime}$ in $G^{\prime}$ such that $w^{\prime}(M^{\prime})>w^{\prime}(M^{*}\setminus F)$ . As every vertex in $U$ is adjacent to $V^{\prime}$ , we can choose exactly a set $N\subseteq F$ of $|U|$ edges between $U$ and $V^{\prime}$ so that $M^{\prime}\cup N$ forms a perfect matching in $H$ . Then, we have $w^{\prime}(M^{\prime}\cup N)=w^{\prime}(M^{\prime})+w^{\prime}(N)>w^{\prime}(M^{*}\setminus F)+w^{\prime}(M^{*}\cap F)=w^{\prime}(M^{*})$ as $w^{\prime}(N)=w^{\prime}(M^{*}\cap F)=0$ , contradicting the fact that $M^{*}$ is a maximum weight perfect matching of $H$ . ∎

Thus, we can solve Weighted Extension for a size- $r$ matching in polynomial time (Edmonds, 1965). By Theorem 7 and Lemma 9, we have the following theorem.

Theorem 12.

There is a polynomial-time approximation algorithm for Diverse Matchings with approximation factor $\max(1-2/k,1/2)$ .

4.2 Common bases of two matroids

Let $E$ be a finite set and let a non-empty family of subsets $\mathcal{I}$ of $E$ . The pair $\mathcal{M}=(E,\mathcal{I})$ is a matroid if (1) for each $X\in\mathcal{I}$ , every subset of $X$ is included in $\mathcal{I}$ and (2) if $X,Y\in\mathcal{I}$ and $\left|X\right|<\left|Y\right|$ , then there exists an element $e\in Y\setminus X$ such that $X\cup\{e\}\in\mathcal{I}$ . Each set in $\mathcal{I}$ is called an independent set of $\mathcal{M}$ . An inclusion-wise maximal independent set $I$ of $\mathcal{M}$ is a base of $\mathcal{M}$ . Because of condition (2), all bases in $\mathcal{M}$ have the same cardinality. For two matroids $\mathcal{M}_{1}=(E,\mathcal{I}_{1})$ and $\mathcal{M}_{2}=(E,\mathcal{I}_{2})$ , a subset $X\subseteq E$ is a common base of $\mathcal{M}_{1}$ and $\mathcal{M}_{2}$ if $X$ is a base of both $\mathcal{M}_{1}$ and $\mathcal{M}_{2}$ . In this subsection, we give an approximation algorithm for diverse common bases of two matroids.

Definition 13 (Diverse Matroid Common Bases).

Given two matroids $\mathcal{M}_{1}=(E,\mathcal{I}_{1})$ and $\mathcal{M}_{2}=(E,\mathcal{I}_{2})$ as membership oracles, a weight function $w\colon E\to\mathbb{R}_{>0}$ , and an integer $k$ , the task of Diverse Matroid Common Bases is to find $k$ distinct common bases $B_{1},\ldots,B_{k}$ of $\mathcal{M}_{1}$ and $\mathcal{M}_{2}$ that maximize $d_{\rm sum}(B_{1},\ldots,B_{k})$ .

Given two matroids $\mathcal{M}_{1}=(E,\mathcal{I}_{1})$ and $\mathcal{M}_{2}=(E,\mathcal{I}_{2})$ as membership oracles, the problem of partitioning $E$ into $k$ common bases of $\mathcal{M}_{1}$ and $\mathcal{M}_{2}$ is a notoriously hard problem, which requires an exponential number of membership queries (Bérczi and Schwarcz, 2021). This fact together with 1 implies that Diverse Matroid Common Bases cannot be solved with polynomial number of membership queries in our problem setting. Given this fact, we develop a constant-factor approximation algorithm for Diverse Matroid Common Bases. To this end, we show that Weighted Extension for common bases of two matroids can be solved in polynomial time.

Similarly to the case of matchings, we can find a maximum weight common base $B\in\mathcal{I}_{1}\cap\mathcal{I}_{2}$ subject to $\mathit{In}\subseteq B$ and $\mathit{Ex}\cap B=\emptyset$ for given disjoint $\mathit{In},\mathit{Ex}\subseteq E$ , which is as follows. Let $\mathcal{M}=(E,\mathcal{I})$ be a matroid. For $X\subseteq E$ , we let $\mathcal{M}\setminus X=(E\setminus X,\mathcal{J})$ , where $\mathcal{J}=\{J\setminus X:J\in\mathcal{I}\}$ . Then, $\mathcal{M}\setminus X$ is a matroid (see (Oxley, 2006)). Similarly, for $X\subseteq E$ , we let $\mathcal{M}/X=(E\setminus X,\mathcal{J}^{\prime})$ , where $\mathcal{J}^{\prime}=\{J:J\cup X\in\mathcal{I},J\subseteq E\setminus X\}$ . Then $(E,\mathcal{J})$ is also a matroid (see (Oxley, 2006)). For two matroids $\mathcal{M}_{1}$ and $\mathcal{M}_{2}$ , we consider two matroids $\mathcal{M}^{\prime}_{1}=(\mathcal{M}_{1}\setminus\mathit{Ex})/\mathit{In}$ and $\mathcal{M}^{\prime}_{2}=(\mathcal{M}_{2}\setminus\mathit{Ex})/\mathit{In}$ . For every independent set $X$ in $\mathcal{M}^{\prime}_{1}$ and $\mathcal{M}^{\prime}_{2}$ , $X$ does not contain any element in $\mathit{Ex}$ and $X\cup\mathit{In}$ is an independent set in both $\mathcal{M}_{1}$ and $\mathcal{M}_{2}$ . Thus, Weighted Extension can be solved by computing a maximum weight common base in $\mathcal{M}^{\prime}_{1}$ and $\mathcal{M}^{\prime}_{2}$ , which can be solved in polynomial time (see Theorem 41.7 in (Schrijver, 2003)). By Theorem 7 and Lemma 9, the following theorem holds.

Theorem 14.

There is a polynomial-time approximation algorithm for Diverse Matroid Common Bases with approximation factor $\max(1-2/k,1/2)$ , provided that the membership oracles for $\mathcal{M}_{1}$ and $\mathcal{M}_{2}$ can be evaluated in polynomial time.

4.3 Minimum cuts

Let $G=(V,E)$ be a graph. A partition of $V$ into two non-empty sets $A$ and $B$ is called a cut of $G$ . For a cut $(A,B)$ of $G$ , the set of edges having one end in $A$ and the other end in $B$ is denoted by $E(A,B)$ . When no confusion arises, we may refer to $E(A,B)$ as a cut of $G$ . The size of a cut $C=E(A,B)$ is defined by $|E(A,B)|$ . A cut $C$ is called a minimum cut of $G$ if there is no cut $C^{\prime}$ of $G$ with $|C^{\prime}|<|C|$ . In this section, we consider the following problem.

Definition 15 (Diverse Minimum Cuts).

Given a graph $G=(V,E)$ with an edge-weight function $w\colon E\to\mathbb{R}_{\geq 0}$ and an integer $k$ , the task of Diverse Minimum Cuts is to find $k$ distinct minimum cuts $C_{1},\ldots,C_{k}\subseteq E$ of $G$ that maximize $d_{\rm sum}(\{C_{1},\ldots,C_{k}\})$ .

An important observation for this problem is that the number of minimum cuts of any graph $G$ is $O(|V|^{2})$ (Karger, 2000). Moreover, we can enumerate all minimum cuts in a graph in polynomial time (Yeh et al., 2010; Vazirani and Yannakakis, 1992). Thus, we can solve both Weighted Extension for minimum cuts and Diverse Minimum Cuts for constant $k$ in polynomial time, yielding a PTAS for Diverse Minimum Cuts.

Theorem 16.

Diverse Minimum Cuts admits a PTAS.

Given this, it is natural to ask whether Diverse Minimum Cuts admits a polynomial-time algorithm. However, we show that Diverse Minimum Cuts is \NP-hard even if $G$ has a cut of size $3$ . Let $\lambda(G)$ be the size of a minimum cut of $G$ .

Theorem 17.

Diverse Minimum Cuts is \NP-hard even if $\lambda(G)=3$ .

The \NP-hardness is shown by performing a polynomial-time reduction from the maximum independent set problem on cubic graphs, which is known to be \NP-complete (Garey et al., 1974). For a graph $H$ , we denote by $\alpha(H)$ the maximum size of an independent set of $H$ . Let $H$ be a graph in which every vertex has degree exactly $3$ . Let $H^{\prime}$ be the graph obtained from $H$ by subdividing each edge twice, that is, each edge is replaced by a path of three edges. The set of vertices in $H^{\prime}$ that do not appear in $H$ is denoted by $D$ . The following folklore lemma ensures that the value of $\alpha$ increases exactly by $m$ .

Lemma 18 (folklore).

Let $m$ be the number of edges in $H$ . Then, $\alpha(H^{\prime})=\alpha(H)+m$ .

We construct a graph $G=(V,E)$ from $H^{\prime}$ by adding a new vertex $v^{*}$ and adding an edge between $v^{*}$ and each vertex in $D$ . Note that the degree of $v^{*}$ in $H^{\prime}$ is more than $3$ .

Lemma 19.

$G$ has $k$ edge-disjoint cuts of size $3$ if and only if $H^{\prime}$ has an independent set of size $k$ .

Proof.

Suppose first that $H^{\prime}$ has an independent set $S$ of size $k$ . Since every vertex in $S$ appears also in $G$ , we can construct a cut of the form $C_{i}=E_{G}(\{v_{i}\},V\setminus\{v_{i}\})$ for each $v_{i}\in S$ . As $S$ is an independent set of $G$ , these $k$ cuts are edge-disjoint. Moreover, these cuts have exactly three edges since every vertex in $S$ has degree $3$ in $G$ . Thus, $G$ has $k$ edge-disjoint cuts of size $3$ .

Conversely, suppose $G$ has $k$ edge-disjoint cuts $C_{1},C_{2},\ldots,C_{k}\subseteq E$ with $|C_{i}|=3$ for $1\leq i\leq k$ . It suffices to prove that each of these cuts forms $C_{i}=E_{G}(\{v\},V\setminus\{v\})$ for some $v\in V\setminus\{v^{*}\}$ . Let $C_{i}=E_{G}(X,V\setminus X)$ for some $X\subseteq V$ . Without loss of generality, we assume that $v^{*}\in V\setminus X$ . In the following, we show that $X$ contains exactly one vertex. Since every vertex in $D$ is adjacent to $v^{*}$ , $X$ contains at most three vertices of $D$ . Suppose first that $|X\cap D|=3$ . Since every vertex of $D$ has a neighbor in $D$ , $V\setminus X$ has a vertex in $D$ that has a neighbor in $X\cap D$ . However, as every vertex of $X\cap D$ is adjacent to $v^{*}$ , there are at least four edges between $X$ and $V\setminus X$ , contradicting to the fact that $|C_{i}|=3$ .

Suppose next that $|X\cap D|=2$ . Let $u,v\in X\cap D$ be distinct. If $u$ is not adjacent to $v$ , there are two vertices $u^{\prime}$ and $v^{\prime}$ in $(V\setminus X)\cap D$ that are adjacent to $u$ and $v$ , respectively. This implies $C_{i}$ contains four edges $\{u,u^{\prime}\},\{v,v^{\prime}\},\{u,v^{*}\},\{v,v^{*}\}$ , yielding a contradiction. Thus, $u$ is adjacent to $v$ . Let $u^{\prime}$ and $v^{\prime}$ be the vertices in $V\setminus(D\cup\{v^{*}\})$ that are adjacent to $u$ and $v$ , respectively. Observe that at least one of $u^{\prime}$ and $v^{\prime}$ , say $u^{\prime}$ , belongs to $X$ as otherwise there are four edges ( $\{u,u^{\prime}\},\{v,v^{\prime}\},\{u,v^{*}\},\{v,v^{*}\}$ ) between $X$ and $V\setminus X$ . Since $|X\cap D|=2$ and $u^{\prime}$ has three neighbors in $D$ , the other two neighbors of $u^{\prime}$ belongs to $V\setminus D$ , which ensures at least four edges between $X$ and $V\setminus X$ .

Suppose that $|X\cap D|=1$ . Let $u\in X\cap D$ . In this case, we show that $X=\{u\}$ . To see this, consider the neighbors of $u$ . Since $v^{*}\in V\setminus X$ and $|X\cap D|=1$ , at least two neighbors of $u$ , which are $v^{*}$ and a vertex in $D$ , belong to $V\setminus X$ . If the other neighbor $v$ is in $X$ , then by the assumption that $|X\cap D|=1$ , the two neighbors of $v$ other than $u$ belong to $V\setminus X$ , which implies there are at least four edges between $X$ and $V\setminus X$ . Thus, all the neighbors of $u$ belong to $V\setminus X$ . Since $G$ is connected, all the vertices except for $u$ belong to $V\setminus X$ as well. Thus, we have $C_{i}=E_{G}(\{u\},V\setminus\{u\})$ .

Finally, suppose that $X\cap D=\emptyset$ . In this case, at least one vertex of $V\setminus(D\cup\{v^{*}\})$ is included in $X$ . Let $u\in X\setminus(D\cup\{v^{*}\})$ . Since $X\cap D=\emptyset$ , every neighbor of $u$ belongs to $V\setminus D$ . Similarly to the previous case, we have $X=\{u\}$ , which completes the proof. ∎

Note that the proof of Theorem 17 also shows that the graph constructed in the reduction has no cut of size at most two. Therefore, by Lemmas 18 and 19, Theorem 17 follows.

When $\lambda(G)=1$ , then Diverse Minimum Cuts is trivially solvable in linear time as the problem can be reduced to finding all bridges in $G$ . If $\lambda(G)=2$ , the problem is slightly nontrivial, which in fact is solvable in polynomial time as well.

Theorem 20.

Diverse Minimum Cuts can be solved in $\left|V\right|^{O(1)}$ time, provided that $\lambda(G)\leq 2$ .

We reduce the problem to that of finding a subgraph of prescribed size with maximizing the sum of convex functions on their degrees of vertices.

Theorem 21 ((Apollonio and Sebö, 2009)).

Given an undirected graph $H$ , an integer $k$ , and convex functions $f_{v}:\mathbb{N}_{\geq 0}\to\mathbb{R}$ for $v\in V(H)$ , the problem of finding $k$ -edge subgraph $H^{\prime}$ of $H$ maximizing $\sum_{v\in V(H)}f_{v}(d_{H^{\prime}}(v))$ is solvable in polynomial time, where $d_{H^{\prime}}(v)$ is the degree of $v$ in $H^{\prime}$ .

We first enumerate all minimum cuts of $G$ in polynomial time. If $G$ has no $k$ minimum cuts, then the instance is trivially infeasible. Suppose otherwise. We construct a graph $H$ whose vertex set corresponds to $E$ , and the edge set of $H$ is defined as follows. For each pair $e,f\in E$ , we add an edge between $e$ and $f$ to $H$ if $\{e,f\}$ is a cut of $G$ . Obviously, the graph $H$ can be constructed in polynomial time. For each $e\in E$ , we let $f_{e}(i):=w(e)\cdot i\cdot(k-i)$ for $0\leq i\leq k$ and $f_{e}(i)=\infty$ for $i>k$ . Clearly, the function $f_{e}$ is convex. Let $C_{1},\ldots,C_{k}\subseteq E$ be $k$ minimum cuts of $G$ . For each $e$ , we denote by $m(e)$ the number of occurrences of $e$ among $C_{1},\ldots,C_{k}$ . Since each edge in $E$ contributes $w(e)\cdot m(e)\cdot(k-m(e))$ to $d_{\rm sum}(\{C_{1},\ldots,C_{k}\})$ , we immediately have the following lemma.

Lemma 22.

$H$ has a subgraph $H^{\prime}$ of $k$ edges such that $\sum_{v\in V(H)}f_{e}(d_{H^{\prime}}(e))\geq t$ if and only if there are $k$ edge cuts $C_{1},\ldots,C_{k}\subseteq E$ of $G$ with $|C_{i}|=2$ for $1\leq i\leq k$ such that $d_{\rm sum}(\{C_{1},\ldots,C_{k}\})\geq t$ .

By Lemma 22 and Theorem 21, Diverse Minimum Cuts can be solved in $\left|V\right|^{O(1)}$ time, proving Theorem 20.

4.4 Interval schedulings

For a pair of integers $a$ and $b$ with $a\leq b$ , the set of all numbers between $a$ and $b$ is denoted by $[a,b]$ . We call $I=[a,b]$ an interval. For a pair of intervals $I=[a,b]$ and $J=[c,d]$ , we say that $I$ overlaps $J$ if $I\cap J\neq\emptyset$ . For a set of intervals $\mathcal{S}=\{I_{1},\ldots,I_{r}\}$ , we say that $\mathcal{S}$ is a valid scheduling (or simply a scheduling) if for any pair of intervals $I_{i},I_{j}\in\mathcal{S}$ , $I_{i}$ does not overlap $I_{j}$ . In particular, we call $\mathcal{S}$ an $r$ -scheduling if $\left|\mathcal{S}\right|=r$ for $r\in\mathbb{N}$ . In this section, we deal with the following problem.

Definition 23 (Diverse Interval Schedulings).

Given a set of intervals $\mathcal{I}=\{I_{1},\ldots,I_{n}\}$ , a weight function $w\colon\mathcal{I}\to\mathbb{R}_{>0}$ , and integers $k$ and $r$ , the task of Diverse Interval Schedulings is to find $k$ distinct $r$ -schedulings $\mathcal{S}_{1},\ldots,\mathcal{S}_{k}\subseteq\mathcal{I}$ that maximize $d_{\rm sum}(\{\mathcal{S}_{1},\ldots,\mathcal{S}_{k}\})$ .

Since the problem of partitioning a set of intervals $\mathcal{I}$ into $k$ scheduling $\mathcal{S}_{1},\ldots,\mathcal{S}_{k}$ such that each $\mathcal{S}_{i}$ has exactly $r$ intervals is known to be NP-hard (Bodlaender and Jansen, 1995; Gardi, 2009)¹¹1Note that the NP-hardness is proven for the case that each $\mathcal{S}_{i}$ has at most $r$ intervals, but a simple reduction proves the NP-hardness of this variant., by 1, we have the following theorem.

Theorem 24.

Diverse Interval Schedulings is \NP-hard.

To apply Theorem 7 to Diverse Interval Schedulings, it suffices to give a polynomial-time algorithm for Weighted Extension for interval schedulings. Observe that if $\mathit{In}$ is not a scheduling, then there is no scheduling containing $\mathit{In}$ . Observe also that we can remove all intervals included in $\mathit{Ex}$ or overlapping some interval in $\mathit{In}$ . Thus, the problem can be reduced to the one for finding a maximum weight scheduling with cardinality $r^{\prime}=r-\left|\mathit{In}\right|$ . This problem can be solved in polynomial time by using a simple dynamic programming approach.

Lemma 25.

Given a set $\mathcal{I}$ and $w^{\prime}\colon\mathcal{I}\to\mathbb{R}$ and $r^{\prime}\in\mathbb{N}$ , there is a polynomial-time algorithm finding a maximum weight $r^{\prime}$ -scheduling in $O(\left|\mathcal{I}\right|^{2}r^{\prime})$ time.

Proof.

The algorithm is analogous to that to find a maximum weight independent set on interval graphs, which is roughly sketched as follows. We assume that $\mathcal{I}=\{I_{1},I_{2},\ldots,I_{n}\}$ is sorted with respect to their right end points. We define ${\rm opt}(p,q)$ as the maximum total weight of a $q$ -scheduling $\mathcal{S}$ in $\{I_{1},I_{2},\ldots,I_{p}\}$ such that $I_{p}\in\mathcal{S}$ for $0\leq p\leq n$ and $0\leq q\leq r^{\prime}$ . Then, the values of ${\rm opt}(p,q)$ for all $p$ and $q$ can be computed by a standard dynamic programming algorithm in time $O(\left|\mathcal{I}\right|^{2}r^{\prime})$ . ∎

By Theorem 7 and Lemma 9, we obtain a polynomial-time approximation algorithm for Diverse Interval Schedulings with factor $\max(1-2/k,1/2)$ .

Finally, we show that Diverse Interval Schedulings can be solved in polynomial time for fixed $k$ using a dynamic programming approach, which implies a PTAS for Diverse Interval Schedulings.

Similarly to the proof of Lemma 25, assume that $\mathcal{I}=\{I_{1},I_{2},\ldots,I_{n}\}$ is sorted with respect to their right end points. Let $[k]=\{1,2,\ldots,k\}$ . For each $0\leq p\leq\left|\mathcal{I}\right|$ , we consider a tuple $T=(p,L,R,\Gamma)$ , where $L$ and $R$ are vectors in $([n]\cup\{0\})^{k}$ and $([r]\cup\{0\})^{k}$ , respectively, and $\Gamma$ is a subset of $\binom{[k]}{2}$ . Clearly, the number of tuples is $O(n(n+1)^{k}(r+1)^{k}2^{\binom{k}{2}})$ , which is polynomial when $k$ is a constant. We denote by $\ell_{i}$ and $r_{i}$ the $i$ th component of $L$ and $R$ , respectively. For a tuple $T=(p,L,R,\Gamma)$ , the value ${\rm opt}(T)$ is the maximum value of $d_{\rm sum}(\{\mathcal{S}_{1},\ldots,\mathcal{S}_{k}\})$ for $k$ schedulings under the following four conditions: (1) the maximum index of an interval in $\bigcup_{1\leq i\leq k}\mathcal{S}_{i}$ is $p$ ( $p=0$ if $\bigcup_{1\leq i\leq k}\mathcal{S}_{i}=\emptyset$ ); (2) for $1\leq i\leq k$ , the maximum index of an interval in $\mathcal{S}_{i}$ is $\ell_{i}$ ( $\ell_{i}=0$ if $\mathcal{S}_{i}=\emptyset$ ); (3) for $1\leq i\leq k$ , $\left|\mathcal{S}_{i}\right|=r_{i}$ ; and (4) for $1\leq i<j\leq k$ , $\{i,j\}\in\Gamma$ if and only if $\mathcal{S}_{i}$ and $\mathcal{S}_{j}$ are distinct.

We define ${\rm opt}(T)=-\infty$ if no such a set of scheduings exists. When $R=(r,r,\ldots,r)$ and $\Gamma=\binom{[k]}{2}$ , there is a set of $k$ distinct $r$ -schedulings that have the sum diversity ${\rm opt}(T)$ unless ${\rm opt}(T)=-\infty$ . For a tuple $T$ , we say that a set of $k$ schedulings is valid for $T$ if it satisfies the above four conditions. Hence, among the tuples of the form $(p,L,R,\Gamma)$ with $R=(r,\ldots,r)$ and $\Gamma=\binom{[k]}{2}$ , ${\rm opt}(T)$ is the optimal value for Diverse Interval Schedulings. We next explain the outline of our dynamic programming algorithm to compute ${\rm opt}(T)$ for any $T$ .

As a base case, $p=0$ , $L=(0,\ldots,0)$ , $R=(0,\ldots,0)$ , and $\Gamma=\emptyset$ if and only if ${\rm opt}(T)=0$ . Let $T^{\prime}$ be a tuple $(p^{\prime},L^{\prime},R^{\prime},\Gamma^{\prime})$ that satisfies the following conditions: (1) $p^{\prime}<p$ ; (2) for any $1\leq i\leq k$ , $\ell^{\prime}_{i}\leq\ell_{i}$ and $r^{\prime}_{i}\leq r_{i}$ ; and (3) $\Gamma^{\prime}\subseteq\Gamma$ . We say that a tuple $T^{\prime}$ satisfying the above conditions is dominated by $T$ . We denote the set of tuples dominated by $T$ as $D(T)$ . Let $C(T)=\{i:\ell_{i}=p\}$ . A tuple $T^{\prime}$ is valid for $T$ if $T^{\prime}$ satisfies the following conditions: (1) $T^{\prime}\in D(T)$ ; (2) if $i\in C(T)$ and $\ell_{i}>0$ , then interval $I_{\ell_{i}}$ does not overlap with $I_{p}$ ; (3) if $i\in C(T)$ , $r^{\prime}_{i}=r_{i}-1$ , otherwise, $r^{\prime}_{i}=r_{i}$ ; and (4) $\Gamma=\Gamma^{\prime}\cup P(T)$ with $P(T)\coloneqq\{\{i,j\}\in\binom{[k]}{2}:\left|\{i,j\}\cap C(T)\right|=1\}$ . We denote the set of valid tuples for $T$ as $V(T)$ . We compute ${\rm opt}(T)$ using the following lemma.

Lemma 26.

For a tuple $T$ ,

\displaystyle{\rm opt}(T)=\underset{T^{\prime}\in V(T)}{\max}({\rm opt}(T^{\prime})+w(I_{p})\cdot\left|C(T)\right|\cdot(k-\left|C(T)\right|)).

Proof.

Let $T=(p,L,R,\Gamma)$ . Let $\mathcal{S}=\{\mathcal{S}_{1},\ldots,\mathcal{S}_{k}\}$ be a valid set of schedulings with $d_{\rm sum}(\{\mathcal{S}_{1},\ldots,\mathcal{S}_{j}\})={\rm opt}(T)$ . Then, $\mathcal{S}^{\prime}=(\mathcal{S}_{1}\setminus\{I_{p}\},\ldots,\mathcal{S}_{k}\setminus\{I_{p}\})$ is a valid set of scheduings for $T^{\prime}\in V(T)$ . Moreover, $d_{\rm sum}(\mathcal{S})=d_{\rm sum}(\mathcal{S}^{\prime})+w(I_{p})\cdot|C(T)|\cdot(k-|C(T)|)$ as $I_{p}$ contributes $w(I_{p})\cdot|C(T)|\cdot(k-|C(T)|)$ to the diversity. Thus, the left-hand side is at most the right-hand side.

Conversely, let $T^{\prime}$ be a tuple maximizing the left-hand side and let $\mathcal{S}^{\prime}=\{\mathcal{S}^{\prime}_{1},\ldots,\mathcal{S}^{\prime}_{k}\}$ be a valid set of schedulings for $T^{\prime}$ . For each $1\leq i\leq k$ , we set $\mathcal{S}_{i}=\mathcal{S}^{\prime}_{i}\cup\{I_{p}\}$ if $i\in C(T)$ and $\mathcal{S}_{i}=\mathcal{S}^{\prime}_{i}$ otherwise. By condition (2) in the definition of a valid tuple, each interval in $\mathcal{S}^{\prime}_{i}$ does not overlap with $I_{p}$ , meaning that $\mathcal{S}_{i}$ is a scheduling. Thus, the right-hand side is at most the left-hand side. ∎

From the above lemma, we can compute ${\rm opt}(T)$ for any $T$ in polynomial time when $k$ is a constant. Moreover, from ${\rm opt}(T)$ , we can find $k$ schedulings with the maximum sum diversity by a standard trace back technique. Combining the approximation algorithm and the above algorithm, we obtain a PTAS.

Theorem 27.

Diverse Interval Schedulings admits a PTAS.

5 Conclusion

In this paper, we give a framework for designing approximation algorithms for Max-Sum Diverse Solutions. This framework runs in ${\rm poly}(\left|E\right|+k)$ time and versatile, which can be applied to the diverse version of several well-studied combinatorial problems, i.e., Diverse Matchings, Diverse Matroid Common Bases, Diverse Minimum Cuts, and Diverse Interval Schedulings. The key to applying our framework is the polynomial-time solvability of Weighted Extension, which yields constant-factor approximation algorithms for Diverse Matchings and Diverse Matroid Common Bases. Moreover, we obtain a PTAS for Max-Sum Diverse Solutions if we can solve the problem in polynomial time for fixed $k$ , yielding PTASes for Diverse Minimum Cuts and Diverse Interval Schedulings.

References

Apollonio and Sebö [2009] Nicola Apollonio and András Sebö. Minconvex factors of prescribed size in graphs. SIAM J. Discret. Math., 23(3):1297–1310, 2009.
Baste et al. [2019] Julien Baste, Lars Jaffke, Tomás Masarík, Geevarghese Philip, and Günter Rote. FPT algorithms for diverse collections of hitting sets. Algorithms, 12(12):254, 2019.
Baste et al. [2022] Julien Baste, Michael R. Fellows, Lars Jaffke, TomÃ¡Å¡ MasaÅÃk, Mateus de Oliveira Oliveira, Geevarghese Philip, and Frances A. Rosamond. Diversity of solutions: An exploration through the lens of fixed-parameter tractability theory. Artificial Intelligence, 303:103644, 2022.
Bérczi and Schwarcz [2021] Kristóf Bérczi and Tamás Schwarcz. Complexity of packing common bases in matroids. Math. Program., 188(1):1–18, 2021.
Birnbaum and Goldman [2009] Benjamin E. Birnbaum and Kenneth J. Goldman. An improved analysis for a greedy remote-clique algorithm using factor-revealing LPs. Algorithmica, 55(1):42–59, 2009.
Bodlaender and Jansen [1995] Hans L. Bodlaender and Klaus Jansen. Restrictions of graph partition problems. part I. Theor. Comput. Sci., 148(1):93–109, 1995.
Cevallos et al. [2016] Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. Max-Sum Diversity Via Convex Programming. In 32nd International Symposium on Computational Geometry (SoCG 2016), volume 51 of Leibniz International Proceedings in Informatics (LIPIcs), pages 26:1–26:14, Dagstuhl, Germany, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
Cevallos et al. [2019] Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. An improved analysis of local search for max-sum diversification. Math. Oper. Res., 44(4):1494–1509, 2019.
Deza and Laurent [1997] Michel Marie Deza and Monique Laurent. Geometry of cuts and metrics, volume 2. Springer, 1997.
Duan and Pettie [2014] Ran Duan and Seth Pettie. Linear-time approximation for maximum weight matching. J. ACM, 61(1), jan 2014.
Edmonds [1965] Jack Edmonds. Paths, trees, and flowers. Canadian J. Math., 17:449â467, 1965.
Eppstein [2008] David Eppstein. k-best enumeration. In Encyclopedia of Algorithms, pages 1–4. Springer US, Boston, MA, 2008.
Fernau et al. [2019] Henning Fernau, Petr Golovach, Marie-France Sagot, et al. Algorithmic enumeration: Output-sensitive, input-sensitive, parameterized, approximative (dagstuhl seminar 18421). In Dagstuhl Reports, volume 8. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
Fomin et al. [2020] Fedor V. Fomin, Petr A. Golovach, Lars Jaffke, Geevarghese Philip, and Danil Sagunov. Diverse pairs of matchings. In 31st International Symposium on Algorithms and Computation, ISAAC 2020, December 14-18, 2020, Hong Kong, China (Virtual Conference), volume 181 of LIPIcs, pages 26:1–26:12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
Fomin et al. [2021] Fedor V. Fomin, Petr A. Golovach, Fahad Panolan, Geevarghese Philip, and Saket Saurabh. Diverse collections in matroids and graphs. In 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), volume 187 of LIPIcs, pages 31:1–31:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
Gardi [2009] Frédéric Gardi. Mutual exclusion scheduling with interval graphs or related classes, part I. Discret. Appl. Math., 157(1):19–35, 2009.
Garey et al. [1974] M. R. Garey, David S. Johnson, and Larry J. Stockmeyer. Some simplified np-complete problems. In Robert L. Constable, Robert W. Ritchie, Jack W. Carlyle, and Michael A. Harrison, editors, Proceedings of the 6th Annual ACM Symposium on Theory of Computing, April 30 - May 2, 1974, Seattle, Washington, USA, pages 47–63. ACM, 1974.
Gillenwater et al. [2015] Jennifer Gillenwater, Rishabh K. Iyer, Bethany Lusch, Rahul Kidambi, and Jeff A. Bilmes. Submodular hamming metrics. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 3141–3149, 2015.
Hanaka et al. [2021a] Tesshu Hanaka, Yasuaki Kobayashi, Kazuhiro Kurita, See Woo Lee, and Yota Otachi. Computing diverse shortest paths efficiently: A theoretical and experimental study, 2021.
Hanaka et al. [2021b] Tesshu Hanaka, Yasuaki Kobayashi, Kazuhiro Kurita, and Yota Otachi. Finding diverse trees, paths, and more. Proceedings of the AAAI Conference on Artificial Intelligence, 35(5):3778–3786, May 2021.
Hao et al. [2020] Fei Hao, Zheng Pei, and Laurence T. Yang. Diversified top-k maximal clique detection in social internet of things. Future Generation Computer Systems, 107:408–417, 2020.
Karger [2000] David R. Karger. Minimum cuts in near-linear time. J. ACM, 47(1):46â76, jan 2000.
Lawler [1972] Eugene L. Lawler. A procedure for computing the $k$ best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401–405, 1972.
Oxley [2006] James G. Oxley. Matroid Theory (Oxford Graduate Texts in Mathematics). Oxford University Press, Inc., USA, 2006.
Ravi et al. [1994] S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299–310, 1994.
Schrijver [2003] A. Schrijver. Combinatorial Optimization - Polyhedra and Efficiency. Springer, 2003.
Vazirani and Yannakakis [1992] Vijay V. Vazirani and Mihalis Yannakakis. Suboptimal cuts: Their enumeration, weight and number. In W. Kuich, editor, Automata, Languages and Programming, pages 366–377, Berlin, Heidelberg, 1992. Springer Berlin Heidelberg.
Wang et al. [2013] Jia Wang, James Cheng, and Ada Wai-Chee Fu. Redundancy-aware maximal cliques. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013, pages 122–130. ACM, 2013.
Yeh et al. [2010] Li-Pu Yeh, Biing-Feng Wang, and Hsin-Hao Su. Efficient algorithms for the problems of enumerating cuts by non-decreasing weights. Algorithmica, 56(3):297â312, mar 2010.
Yuan et al. [2016] Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjie Zhang. Diversified top-k clique search. VLDB J., 25(2):171–196, 2016.