This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Stochastic kk-Submodular Bandits with Full Bandit Feedback

Guanyu Nie    Vaneet Aggarwal    Christopher John Quinn
Abstract

In this paper, we present the first sublinear α\alpha-regret bounds for online kk-submodular optimization problems with full-bandit feedback, where α\alpha is a corresponding offline approximation ratio. Specifically, we propose online algorithms for multiple kk-submodular stochastic combinatorial multi-armed bandit problems, including (i) monotone functions and individual size constraints, (ii) monotone functions with matroid constraints, (iii) non-monotone functions with matroid constraints, (iv) non-monotone functions without constraints, and (v) monotone functions without constraints. We transform approximation algorithms for offline kk-submodular maximization problems into online algorithms through the offline-to-online framework proposed by Nie et al. (2023a). A key contribution of our work is analyzing the robustness of the offline algorithms.

Machine Learning, ICML

1 INTRODUCTION

In various sequential decision problems, including sensor placement, influence maximization, and clinical trials, decisions are made by sequentially selecting a subset of elements, making assignments for those elements, and then observing the outcomes. These scenarios often exhibit diminishing returns based on the choice of the subset of elements and their corresponding assignments.

Consider a multi-agent scenario where multiple companies (agents) cooperate to spread kk different types of content across a social network. Each company can select a subset of users and control how it distributes content to initiate propagation. The collective goal is to maximize the overall spread of all content types across the network. However, each company’s decisions impact not only its own success but also that of others. For instance, if two companies target users with overlapping follower networks, their efforts may lead to redundant influence, limiting the potential additional spread. Conversely, well-coordinated strategies could generate synergistic effects, amplifying the overall reach. This redundancy effect exists even for individual companies, making it crucial to target diverse users to avoid diminishing returns.

Each type of content follows an independent propagation process. Due to privacy restrictions, the companies only know the total number of users influenced by the content, without visibility into the underlying diffusion process. Additionally, because the diffusion is inherently random, the observations may vary, necessitating a balance between exploiting successful configurations (exploitation) and experimenting with different users and assignments to discover better strategies (exploration). See Appendix D in the supplementary material for more motivating applications.

For each company, an offline variant of this sequential decision problem (i.e., where the agent has access to an exact value oracle and is only evaluated on a final output) can be modeled as a (un)constrained kk-submodular optimization problem (Huber & Kolmogorov, 2012). The class of kk-submodular functions generalizes the class of submodular set functions, an important non-linear function class for numerous optimization problems exhibiting a notion of diminishing returns, by not only accounting for the value of a subset of elements (e.g., which users) but also the assignment (e.g., which types of content to which users) as well. Maximizing a kk-submodular function is known to be NP-hard even for unconstrained problems (Ward & Živný, 2014). There has been significant progress in developing approximation algorithms for various offline kk-submodular maximization problems (Iwata et al., 2015; Ohsaka & Yoshida, 2015; Sakaue, 2017).

The sequential decision problem described above can be modeled as a stochastic combinatorial Multi-Armed Bandit (CMAB) problem with (i) an expected reward that is kk-submodular, (ii) an action space formed by the constraints (if any) on elements and their assignments, and (iii) bandit feedback. Multi-Armed Bandits is a classical framework for sequential decision-making under uncertainty. Combinatorial MAB is a specialized branch where each action is a combination of base arms, known as a “super arm”, and the number of super-arms is prohibitively large to try each one.

For CMAB problems with non-linear rewards, the nature of the feedback significantly affects the problem complexity. “Semi-bandit” feedback involves observing additional information (such as the value of individual arms) aside from joint reward of the selected super arm, which greatly simplifies learning. In contrast, “bandit” or “full-bandit” feedback observes only the (joint) reward for actions. Estimating arm values for non-linear rewards with bandit feedback requires deliberate sampling of sub-optimal actions (e.g., such as actions consisting of individual base arms when the function is monotone), which is not typically done in standard MAB methods, since by design they do not take actions identified (with some confidence) to be sub-optimal.

In this paper, we address the problem of stochastic CMAB with kk-submodular expected rewards, various constraints, and only bandit feedback.

Our Contributions: We propose and analyze the first CMAB algorithms with sub-linear α\alpha-regret for kk-submodular rewards using only full-bandit feedback. For each of the following results, we achieve them in part through analyzing the robustness of respective offline algorithms.

  • For non-monotone kk-submodular rewards and no constraints, we propose a CMAB algorithm with 𝒪~(nk13T23)\tilde{\mathcal{O}}\left(nk^{\frac{1}{3}}T^{\frac{2}{3}}\right) 1/21/2-regret.

  • For monotone kk-submodular rewards and no constraints, we propose a CMAB algorithm with 𝒪~(nk13T23)\tilde{\mathcal{O}}\left(nk^{\frac{1}{3}}T^{\frac{2}{3}}\right) k/(2k1)k/(2k-1)-regret.

  • For monotone kk-submodular rewards under individual size constraints, we propose a CMAB algorithm with 𝒪~(n13k13BT23)\tilde{\mathcal{O}}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}BT^{\frac{2}{3}}\right) 1/31/3-regret.

  • For monotone kk-submodular rewards under a matroid constraint, we propose a CMAB algorithm with 𝒪~(n13k13MT23)\tilde{\mathcal{O}}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}MT^{\frac{2}{3}}\right) 1/21/2-regret. We specialize our CMAB algorithm for monotone functions with a matroid constraints to the case of total size constraints.

  • For non-monotone kk-submodular rewards under a matroid constraint, we propose a CMAB algorithm with 𝒪~(n13k13MT23)\tilde{\mathcal{O}}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}MT^{\frac{2}{3}}\right) 1/31/3-regret.

In the offline setting, it is important to highlight that many of the properties of ff no longer hold in the presence of noise. For instance, in the case of a monotone function ff, the marginal gains of f^\hat{f} (noisy version of ff) may no longer be guaranteed to be positive. Similarly, for non-monotone objectives, the property of pairwise monotonicity (see Section 2) for ff might not remain valid for f^\hat{f}. As a result, many proof steps in the original paper(s) that introduced those offline algorithms do not go through directly and, consequently, need novel analysis. To address these challenges, this study incorporates bounded error to establish properties akin to the original ones in their respective contexts.

Analyzing the robustness of the offline algorithms is important, even without transforming them into an online setting. For those algorithms that possess inherent robustness, we refrain from modifications and maintain the original offline algorithm to ensure an efficient offline-to-online transformation. However, substantial modifications may be necessary for the original offline algorithm to maintain robustness in the presence of noise (e.g., Section 4). In such cases, we aim to introduce minimal alterations to the offline algorithm. We note that these adjustments require careful design, which is a part of the novelty of this paper.

Remark 1.1.

The regret guarantees we obtained in this work are all of order 𝒪(T2/3)\mathcal{O}(T^{2/3}). It is unknown whether O(T)O(\sqrt{T}) expected cumulative α\alpha-regret is possible even in the case k=1k=1. The only known results with bandit feedback for k=1k=1 are O(T2/3)O(T^{2/3}) for submodular bandits (Streeter & Golovin, 2008; Niazadeh et al., 2021). While some lower bounds of O(T2/3)O(T^{2/3}) exist (e.g., in adversarial setup with cardinality constraint (Niazadeh et al., 2021), or stochastic setup with cardinality constraint where regret is defined as gap to greedy algorithm (Tajdini et al., 2023)), they are for specialized setups even for k=1k=1, and thus a general understanding of lower bounds in these setups is an open problem.

Table 1: Summary of offline α\alpha-approximation algorithms for kk-submodular maximization with our δ\delta-robustness analysis and α\alpha-regret bounds for our proposed algorithms for kk-submodular CMAB with full-bandit feedback. NN is an upper bound on the query complexity of the offline algorithm. BB is the total budget. MM is the rank of the matriod. There are no prior sublinear α\alpha-regret bounds for kk-submodular CMAB with full-bandit feedback. * Sun et al. (2022)’s result for non-monotone ff with matroid constraints specializes to the first results for non-monotone ff with total size constraints. \dagger Both the 1/21/2 approximation and the δ\delta we obtain by analyzing the robustness of Ohsaka & Yoshida (2015)’s offline approximation algorithm for monotone ff with total size constraints are strictly generalized by Sakaue (2017)’s offline approximation algorithm for monotone ff with matroid constraints.
Ref. Objective ff Constraint α\alpha δ\delta NN Our α\alpha-regret
Iwata et al. (2015) Non-Monotone Unconstrained 1/21/2 20n20n nknk 𝒪~(nk13T23)\tilde{\mathcal{O}}\left(nk^{\frac{1}{3}}T^{\frac{2}{3}}\right)
Iwata et al. (2015) Monotone Unconstrained k/(2k1)k/(2k-1) (162k)n(16-\frac{2}{k})n nknk 𝒪~(nk13T23)\tilde{\mathcal{O}}\left(nk^{\frac{1}{3}}T^{\frac{2}{3}}\right)
Ohsaka & Yoshida (2015)\dagger Monotone Total Size 1/21/2 B+1B+1 nkBnkB 𝒪~(n13k13BT23)\tilde{\mathcal{O}}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}BT^{\frac{2}{3}}\right)
Sun et al. (2022)* Non-Monotone Total Size 1/31/3 4/3(B+1)4/3(B+1) nkBnkB 𝒪~(n13k13BT23)\tilde{\mathcal{O}}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}BT^{\frac{2}{3}}\right)
Ohsaka & Yoshida (2015) Monotone Individual Size 1/31/3 4/3(B+1)4/3(B+1) nkBnkB 𝒪~(n13k13BT23)\tilde{\mathcal{O}}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}BT^{\frac{2}{3}}\right)
Sakaue (2017)\dagger Monotone Matroid 1/21/2 M+1M+1 nkMnkM 𝒪~(n13k13MT23)\tilde{\mathcal{O}}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}MT^{\frac{2}{3}}\right)
Sun et al. (2022) Non-Monotone Matroid 1/31/3 4/3(M+1)4/3(M+1) nkMnkM 𝒪~(n13k13MT23)\tilde{\mathcal{O}}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}MT^{\frac{2}{3}}\right)

Related works: We briefly highlight closely related works. See Appendix C for an expanded discussion.

kk-submodular CMAB

To the best of our knowledge, the only prior work for kk-submodular CMAB for any constraint type and/or feedback model is presented in (Soma, 2019). They considered unconstrained kk-submodular maximization under semi-bandit feedback in adversarial setting. For the non-monotone and monotone cases, they propose algorithms with 1/21/2 and k2k1\frac{k}{2k-1} regrets upper bounded by 𝒪(nkT)\mathcal{O}(nk\sqrt{T}) respectively. Their approach is based on Blackwell approachability (Blackwell, 1956). As discussed in the introduction, availability of semi-bandit feedback in CMAB problems with non-linear rewards significantly simplifies the learning problem (i.e., with semi-bandit feedback T\sqrt{T} α\alpha-regret bound dependence is typically easy to achieve). We also note that the adversarial CMAB setting does not strictly generalize the stochastic setting. In the adversarial setting, the reward functions at each time step {ft}\{f_{t}\} must exhibit kk-submodularity but otherwise can vary widely. In the stochastic setting, the individual ftf_{t}’s are not necessarily kk-submodular but the expected reward function, represented as f=𝔼[ft]f=\mathbb{E}[f_{t}], is.

Submodular CMAB

For submodular rewards (i.e., k=1k=1), Streeter & Golovin (2008) proposed and analyzed an algorithm for adversarial CMAB with submodular rewards, full-bandit feedback, and under a knapsack constraint (though only in expectation, taken over randomness in the algorithm). The authors adapted a simpler greedy algorithm (Khuller et al., 1999), using an ϵ\epsilon-greedy exploration type framework. Niazadeh et al. (2021) proposed a framework for transforming iterative greedy α\alpha-approximation algorithms for offline problems to online methods in an adversarial bandit setting, for both semi-bandit (achieving O~(T1/2)\widetilde{O}(T^{1/2}) α\alpha-regret) and full-bandit feedback (achieving O~(T2/3)\widetilde{O}(T^{2/3}) α\alpha-regret). In the stochastic setting, Nie et al. (2022) proposed an explore-then-commit type algorithm for online monotone submodular maximization under cardinality constraint. The result is extended with an optimized stochastic-explore-then-commit approach in Fourati et al. (2024b). Fourati et al. (2023) proposed randomized greedy learning algorithm for online non-monotone submodular maximization. Nie et al. (2023a) recently proposed a general framework for adapting offline to online algorithms under full bandit feedback. Their framework require the adapted offline algorithm to satisfy the so called (α,δ)(\alpha,\delta)-robustness property.

There are also a number of works that require additional “semi-bandit” feedback. For combinatorial MAB with submodular rewards, a common type of semi-bandit feedback are marginal gains (Lin et al., 2015; Yue & Guestrin, 2011; Yu et al., 2016; Takemori et al., 2020), which enable the learner to take actions of maximal cardinality or budget, receive a corresponding reward, and gain information not just on the set but individual elements.

2 BACKGROUND

2.1 kk-Submodular Functions:

Let kk be a positive integer for the number of types (i.e., types of stories) and V=[n]V=[n] be the ground set of elements (i.e., users in a social network). Let (k+1)V:={(X1,,Xk)|XiV,i{1,,k},XiXj=,ij}(k+1)^{V}:=\{(X_{1},\ldots,X_{k})|X_{i}\subseteq V,i\in\{1,\ldots,k\},X_{i}\cap X_{j}=\emptyset,\forall i\neq j\}. A function f:(k+1)Vf:(k+1)^{V}\rightarrow\mathbb{R} is called kk-submodular if, for any 𝒙=(X1,,Xk)\boldsymbol{x}=(X_{1},\ldots,X_{k}) and 𝒚=(Y1,,Yk)\boldsymbol{y}=(Y_{1},\ldots,Y_{k}) in (k+1)V(k+1)^{V}, we have

f(𝒙)+f(𝒚)f(𝒙𝒚)+f(𝒙𝒚)f(\boldsymbol{x})+f(\boldsymbol{y})\geq f(\boldsymbol{x}\sqcup\boldsymbol{y})+f(\boldsymbol{x}\sqcap\boldsymbol{y})

where 𝒙𝒚:=(X1Y1,,XkYk),𝒙𝒚:=(X1Y1(i1XiYi),,XkYk(ikXiYi)).\boldsymbol{x}\sqcap\boldsymbol{y}:=(X_{1}\cap Y_{1},\ldots,X_{k}\cap Y_{k}),\boldsymbol{x}\sqcup\boldsymbol{y}:=(X_{1}\cup Y_{1}\setminus(\cup_{i\neq 1}X_{i}\cup Y_{i}),\ldots,X_{k}\cup Y_{k}\setminus(\cup_{i\neq k}X_{i}\cup Y_{i})). Note that setting k=1k=1 in these definitions recovers submodular functions, set intersection, and set union, respectively. We further define a relation \preceq on (k+1)V(k+1)^{V} so that, for 𝒙=(X1,,Xk)\boldsymbol{x}=(X_{1},\ldots,X_{k}) and 𝒚=(Y1,,Yk)\boldsymbol{y}=(Y_{1},\ldots,Y_{k}) in (k+1)V,𝒙𝒚(k+1)^{V},\boldsymbol{x}\preceq\boldsymbol{y} if XiYiX_{i}\subseteq Y_{i} for every ii with i[k]i\in[k]. We also define the marginal gain of assigning type i[k]i\in[k] to element ee given a current solution 𝒙\boldsymbol{x} (provided that ee has not been assigned any type in 𝒙\boldsymbol{x}),

Δe,if(𝒙)\displaystyle\Delta_{e,i}f(\boldsymbol{x}) =f(X1,,Xi1,Xi{e},Xi+1,,Xk)\displaystyle=f(X_{1},\ldots,X_{i-1},X_{i}\cup\{e\},X_{i+1},\ldots,X_{k})
f(X1,,Xk)\displaystyle\qquad-f(X_{1},\ldots,X_{k})

for 𝒙(k+1)V,e[k]X\boldsymbol{x}\in(k+1)^{V},e\notin\bigcup_{\ell\in[k]}X_{\ell}, and i[k]i\in[k].

Theorem 2.1.

(Ward & Živný, 2014) A function f:(k+1)Vf:(k+1)^{V}\rightarrow\mathbb{R} is kk-submodular if and only if ff satisfies the following two conditions:
Orthant submodularity: Δe,if(𝐱)Δe,if(𝐲)\Delta_{e,i}f(\boldsymbol{x})\geq\Delta_{e,i}f(\boldsymbol{y}) for any 𝐱,𝐲(k+1)V\boldsymbol{x},\boldsymbol{y}\in(k+1)^{V} with 𝐱𝐲,e[k]Y\boldsymbol{x}\preceq\boldsymbol{y},e\notin\bigcup_{\ell\in[k]}Y_{\ell}, and i[k]i\in[k];
Pairwise monotonicity: Δe,if(𝐱)+Δe,jf(𝐱)0\Delta_{e,i}f(\boldsymbol{x})+\Delta_{e,j}f(\boldsymbol{x})\geq 0 for any 𝐱(k+1)V,e[k]X\boldsymbol{x}\in(k+1)^{V},e\notin\bigcup_{\ell\in[k]}X_{\ell}, and i,j[k]i,j\in[k] with iji\neq j.

We define the support of 𝒙(k+1)V\boldsymbol{x}\in(k+1)^{V} as supp(𝒙)={eV𝒙(e)0}\operatorname{supp}(\boldsymbol{x})=\{e\in V\mid\boldsymbol{x}(e)\neq 0\} and define the support of type i[k]i\in[k] as suppi(𝒙)={eV𝒙(e)=i}\operatorname{supp}_{i}(\boldsymbol{x})=\{e\in V\mid\boldsymbol{x}(e)=i\}. Further, a function f:(k+1)Vf:(k+1)^{V}\rightarrow\mathbb{R} is called monotone if Δe,if(𝒙)0\Delta_{e,i}f(\boldsymbol{x})\geq 0 for any 𝒙(k+1)V,e[k]X\boldsymbol{x}\in(k+1)^{V},e\notin\bigcup_{\ell\in[k]}X_{\ell}, and i[k]i\in[k].

Matroids

For a finite set EE and 2E\mathcal{F}\subseteq 2^{E}, we say a system (E,)(E,\mathcal{F}) is a matroid if the following hold:

  • (M1) \emptyset\in\mathcal{F},

  • (M2) If ABA\subseteq B\in\mathcal{F} then AA\in\mathcal{F},

  • (M3) If A,BA,B\in\mathcal{F} and |A|<|B||A|<|B| then there exists eB\Ae\in B\backslash A such that A{e}A\cup\{e\}\in\mathcal{F}.

The elements of \mathcal{F} are called independent, and we say AA\in\mathcal{F} is maximal if no BB\in\mathcal{F} satisfies ABA\subsetneq B. A maximal independent set BB is called a basis of the matroid. The rank function of a matroid =(E,)\mathcal{M}=(E,\mathcal{F}) is defined as r(A)=max{|S|:SA,S}r_{\mathcal{M}(A)}=\max\{|S|:S\subseteq A,S\in\mathcal{F}\}. Under the matroid constraint with a matroid =(E,)\mathcal{M}=(E,\mathcal{F}), a solution 𝒙(k+1)V\boldsymbol{x}\in(k+1)^{V} is feasible if 𝒙\boldsymbol{x}\in\mathcal{F}.

A simple but important matroid constraint =(E,)\mathcal{M}=(E,\mathcal{F}) is the uniform matroid in which ={XV:|X|B}\mathcal{F}=\{X\in V:|X|\leq B\} for a given BB. It is equivalent to the Total Size (TS) constraint in the problem we consider.

2.2 CMAB

In the CMAB framework, we consider sequential, combinatorial decision-making problems over a finite time horizon TT. Let Ω\Omega denote the ground set of base arms and n=|Ω|n=|\Omega| denote the number of arms. Let D2ΩD\subseteq 2^{\Omega} denote the subset of feasible actions, for which we presume membership can be efficiently evaluated. We will use the terminologies subset and action interchangeably throughout the paper.

At each time step tt, the learner selects a feasible action AtDA_{t}\in D. After the subset AtA_{t} is selected, the learner receives a reward ft(At)f_{t}(A_{t}). We assume the reward ftf_{t} is stochastic, bounded in [0,1][0,1], and i.i.d. conditioned on the action AtA_{t}. Define the expected reward function as f(A):=𝔼[ft(A)]f(A):=\mathbb{E}[f_{t}(A)]. The goal of the learner is to maximize the cumulative reward t=1Tft(At)\sum_{t=1}^{T}f_{t}(A_{t}). To measure the performance of the algorithm, one common metric is to compare the learner to an agent with access to a value oracle for ff. However, if optimizing ff over DD is NP-hard, such a comparison would not be meaningful unless the horizon is exponentially large in the problem parameters. Alternatively, if there exists a known approximation algorithm 𝒜\mathcal{A} with an approximation ratio α(0,1]\alpha\in(0,1] for optimizing ff over DD, it is more natural to evaluate the performance of a CMAB algorithm against what 𝒜\mathcal{A} could achieve. In this scenario, we consider the expected cumulative α\alpha-regret α,T\mathcal{R}_{\alpha,T}, which quantifies the difference between α\alpha times the cumulative reward of the optimal subset’s expected value and the average received reward, (we write T\mathcal{R}_{T} when α\alpha is understood from context)

𝔼[T]=αTf(OPT)𝔼[t=1Tft(At)],\displaystyle\mathbb{E}[\mathcal{R}_{T}]=\alpha Tf(\mathrm{OPT})-\mathbb{E}\left[\sum_{t=1}^{T}f_{t}(A_{t})\right], (1)

where OPT is the optimal solution, i.e., OPTargmaxADf(A)\text{OPT}\in\operatorname*{arg\,max}_{A\in D}f(A) and the expectations are with respect to both the rewards and actions (if random).

2.3 Problem Statement

We consider the sequential decision making problem under the stochastic CMAB framework where the expected reward function ff is kk-submodular. Each arm consists of a tuple (we call it an item-type pair): (e,i)V×[k](e,i)\in V\times[k], and a super arm is defined as 𝒙(k+1)V\boldsymbol{x}\in(k+1)^{V}, which is a combination of base arms. We consider full-bandit feedback, where after each time an action AtA_{t} is selected, the learner can only observe ft(At)f_{t}(A_{t}) as reward. We aim to transform various offline algorithms in kk-submodular optimization literature to online algorithms, thus we consider α\alpha regret as our performance metric and α\alpha is defined to be the approximation ratio of the corresponding offline algorithm.

2.4 Offline-to-Online Framework

Algorithm 1 C-ETC (Nie et al., 2023a)
1:  Input: horizon TT, set Ω\Omega of nn base arms, an offline (α,δ)(\alpha,\delta)-robust algorithm 𝒜\mathcal{A}, and an upper-bound NN on the number of queries 𝒜\mathcal{A} will make to the value oracle.
2:  Initialize mδ2/3T2/3log(T)1/32N2/3m\leftarrow\left\lceil\frac{\delta^{2/3}T^{2/3}\log(T)^{1/3}}{2N^{2/3}}\right\rceil.
3:  // Exploration Phase //
4:  while 𝒜\mathcal{A} queries the value of some AΩA\subseteq\Omega do
5:     For mm times, play action AA.
6:     Calculate the empirical mean f¯\bar{f}.
7:     Return f¯\bar{f} to 𝒜\mathcal{A}.
8:  end while
9:  // Exploitation Phase //
10:  for remaining time do
11:     Play action SS output by algorithm 𝒜\mathcal{A}.
12:  end for

As we adopt the novel offline-to-online transformation framework proposed in (Nie et al., 2023a), in this section, we briefly introduce the framework. In (Nie et al., 2023a), they introduced a criterion for the robustness of an offline approximation algorithm. They showed that this property alone is sufficient to guarantee that the offline algorithm can be adapted to solve CMAB problems in the corresponding online setting with just bandit feedback and achieve sub-linear regret. More importantly, the CMAB adaptation will not rely on any special structure of the algorithm design, instead employing it as a black box. We restate the robustness definition in the following. This definition of robustness indicates that the algorithm 𝒜\mathcal{A} will maintain reasonably good performance when function evaluations have errors.

Definition 2.2 ((α,δ,N)(\alpha,\delta,N)-Robust Approximation (Nie et al., 2023a)).

An algorithm (possibly random) 𝒜\mathcal{A} is an (α,δ,N)(\alpha,\delta,N)-robust approximation algorithm for the combinatorial optimization problem of maximizing a function f:2Ωf:2^{\Omega}\to\mathbb{R} over a finite domain D2ΩD\subseteq 2^{\Omega} if its output SS^{*} using a value oracle for f^\hat{f} satisfies the relation below with the optimal solution OPT\mathrm{OPT} under ff, provided that for any ϵ>0\epsilon>0 that |f(S)f^(S)|ϵ|f(S)-\hat{f}(S)|\leq\epsilon for all SDS\in D,

𝔼[f(S)]αf(OPT)δϵ,\displaystyle\mathbb{E}[f(S^{*})]\geq\alpha f(\mathrm{OPT})-\delta\epsilon,

where Ω\Omega is the ground set, the expectation is over the randomness of the algorithm 𝒜\mathcal{A}, and algorithm 𝒜\mathcal{A} uses at most NN value oracle queries.

Note that when we have access to the exact oracle (ε=0\varepsilon=0), the definition will give us the same guarantee as the original offline algorithm; if the offline algorithm is exact, α=1\alpha=1. Equipped with a robustness assurance, the authors introduced a stochastic CMAB algorithm named “Combinatorial Explore-Then-Commit” (C-ETC). See Algorithm 1 for the pseudocode. C-ETC interfaces with an offline (α,δ,N)(\alpha,\delta,N)-robust algorithm denoted as 𝒜\mathcal{A}. In the exploration phase, when 𝒜\mathcal{A} requests information from the value oracle pertaining to action AA, C-ETC adopts a strategy of executing action AA multiple times, specifically mm times, where mm is an optimization parameter. Subsequently, C-ETC computes the empirical mean, represented as f¯\bar{f}, of the rewards associated with action AA and communicates this computed value back to the offline algorithm 𝒜\mathcal{A}. In the exploitation phase, C-ETC continuously deploys the solution SS generated by the 𝒜\mathcal{A} algorithm. They showed the following theorem:

Theorem 2.3.

(Nie et al., 2023a) 111Nie et al. (2023a)’s results were for deterministic offline approximation algorithms. See https://arxiv.org/pdf/2301.13326.pdf for an extension to randomized offline approximation algorithms. The expected cumulative α\alpha-regret of C-ETC using an (α,δ,N)(\alpha,\delta,N)-robust approximation algorithm 𝒜\mathcal{A} as a subroutine is at most 𝒪(δ23N13T23log(T)13)\mathcal{O}\left(\delta^{\frac{2}{3}}N^{\frac{1}{3}}T^{\frac{2}{3}}\log(T)^{\frac{1}{3}}\right) with Tmax{N,22Nδ}T\geq\max\left\{N,\frac{2\sqrt{2}N}{\delta}\right\}.

This approach allows for a CMAB procedure that does not require any specialized structural characteristics from 𝒜\mathcal{A}. It necessitates no intricate construction beyond the execution of 𝒜\mathcal{A}, provided the criteria of robustness (Definition 2.2) are met.

Remark 2.4.

We note that this result can be extended to multiple agents, where the exploration (mm times play of action AA) happens in a distributed manner over the agents as shown in Fourati et al. (2024a).

In the following sections, we transform various offline algorithms for kk-submodular maximization problems under different constraints to online bandit setting by analyzing the robustness of the offline algorithms. In analyzing the robustness of offline algorithms, we assume the offline algorithm is evaluated with a surrogate function f^\hat{f} with |f(S)f^(S)|ϵ|f(S)-\hat{f}(S)|\leq\epsilon for all SDS\in D, instead of the exact function ff. We will highlight some parts of the proof for non-monotone unconstrained problem, and defer all missing proofs to the appendix.

Remark 2.5 (Offline v.s. Online Algorithms).

We note the following major differences between offline algorithms and online algorithms. First, an offline algorithm assumes exact oracle access to the objective function ff. However, in certain scenarios, the objective might not be predetermined. Nonetheless, if we engage in recurrent tasks while encountering objectives f1,f2,,fTf_{1},f_{2},\cdots,f_{T}, potentially sampled from a distribution, there is a possibility of acquiring the ability to perform satisfactorily on average over time. Such algorithms are referred to as online algorithms. Second, offline algorithms are designed to optimize an objective ff in the sense of finding a “final” solution, but the quality of intermediate solutions does not matter. Those offline algorithms care about computational and sample complexity that depends on problem size, but there is no sense of a horizon explicitly. In contrast, the aim of online algorithms is not to output a single solution (there is “simple regret” but we consider more common cumulative regret) but the sum of achieved values, so intermediate solutions matter and the horizon plays an explicit role.

3 Non-monotone Functions without Constraints

In this section, we consider the case where the expected objective function is non-monotone kk-submodular and there is no constraint in selecting actions. We adopt the offline algorithm proposed in Iwata et al. (2015).

3.1 Algorithm

We first remark the following key fact for (monotone or non-monotone) kk-submodular maximization without constraints (see Iwata et al. (2013) for the proof):

Proposition 3.1.

For any kk-submodular function f:(k+1)V+f:(k+1)^{V}\rightarrow\mathbb{R}_{+} with k2k\geq 2, there exists a partition of VV that attains the maximum value of ff.

This says that there is an optimal solution with all elements included (a combinatorial search is needed to determine their types). Proposition 3.1 remains valid for non-monotone functions due to the special property of kk-submodular functions (pairwise monotonicity).

Armed with this property, they proposed a randomized offline algorithm for maximizing a non-monotone kk-submodular maximization without constraint. The algorithm is presented in Algorithm 2 in the appendix. This algorithm iterates through all eVe\in V (in an arbitrary but fixed order) and selects the type i[k]i\in[k] randomly according to some carefully designed probabilities pip_{i} for each eVe\in V. The authors showed that this algorithm achieves a 1/21/2-approximation ratio for non-monotone kk-submodular maximization without constraints.

3.2 Robustness

We show that Algorithm 2 is (12,δ,N)(\frac{1}{2},\delta,N) robust for a certain δ\delta and NN. Due to that robustness we can use the framework proposed in Nie et al. (2023a), to transform Algorithm 2 into an online algorithm under bandit feedback achieving sublinear 1/21/2-regret.

Let 𝒐\boldsymbol{o} be an optimal solution with supp(𝒐)=V\operatorname{supp}(\boldsymbol{o})=V (by Proposition 3.1 such a solution exists). Let 𝒔\boldsymbol{s} be the output of the algorithm with |supp(𝒔)|=n|\operatorname{supp}(\boldsymbol{s})|=n using surrogate function f^\hat{f}. We consider the jj-th iteration of the algorithm, and let e(j)e^{(j)} be the element of VV considered in the jj-th iteration, pi(j)p_{i}^{(j)} be the probability that ii-th type is chosen in the jj-th iteration, and 𝒔(j)\boldsymbol{s}^{(j)} be the solution after the ii-th iteration, where 𝒔(0)=𝟎\boldsymbol{s}^{(0)}=\mathbf{0}. Also for 0jn0\leq j\leq n, let 𝒐(j)\boldsymbol{o}^{(j)} be the element in (k+1)V(k+1)^{V} obtained from 𝒐\boldsymbol{o} by replacing the coordinates on supp(𝒔(j))\operatorname{supp}(\boldsymbol{s}^{(j)}) with those of 𝒔(j)\boldsymbol{s}^{(j)}, and for 1jn1\leq j\leq n let 𝒕(j1)\boldsymbol{t}^{(j-1)} be the element in (k+1)V(k+1)^{V} obtained from 𝒐(j)\boldsymbol{o}^{(j)} by changing 𝒐(j)(e(j))\boldsymbol{o}^{(j)}(e^{(j)}) with 0.

For i[k]i\in[k], let yi(j)=Δe(j),if(𝒔(j1))y_{i}^{(j)}=\Delta_{e^{(j)},i}f(\boldsymbol{s}^{(j-1)}), y^i(j)=Δe(j),if^(𝒔(j1))\hat{y}_{i}^{(j)}=\Delta_{e^{(j)},i}\hat{f}(\boldsymbol{s}^{(j-1)}) and let ai(j)=Δe(j),if(𝒕(j1))a_{i}^{(j)}=\Delta_{e^{(j)},i}f(\boldsymbol{t}^{(j-1)}), a^i(j)=Δe(j),if^(𝒕(j1))\hat{a}_{i}^{(j)}=\Delta_{e^{(j)},i}\hat{f}(\boldsymbol{t}^{(j-1)}). Due to pairwise monotonicity, we have yi(j)+yi(j)0y_{i}^{(j)}+y_{i^{\prime}}^{(j)}\geq 0 and ai(j)+ai(j)0a_{i}^{(j)}+a_{i^{\prime}}^{(j)}\geq 0 for all i,i[k]i,i^{\prime}\in[k] with iii\neq i^{\prime}, and thus, while the surrogate function f^\hat{f} does not necessarily have pairwise monotonicity, we have

y^i(j)+y^i(j)4ε,a^i(j)+a^i(j)4ε.\hat{y}_{i}^{(j)}+\hat{y}_{i^{\prime}}^{(j)}\geq-4\varepsilon,\hat{a}_{i}^{(j)}+\hat{a}_{i^{\prime}}^{(j)}\geq-4\varepsilon. (2)

for all i,i[k]i,i^{\prime}\in[k] with iii\neq i^{\prime} since both inequalities involve a sum of four function values of f^\hat{f} which can be off by at most ϵ\epsilon with respect to ff. Also from 𝒔(j)𝒕(j)\boldsymbol{s}^{(j)}\preceq\boldsymbol{t}^{(j)} which holds by construction, orthant submodularity implies yi(j)ai(j)y_{i}^{(j)}\geq a_{i}^{(j)} for all i[k]i\in[k] and thus,

y^i(j)a^i(j)4ε.\displaystyle\hat{y}_{i}^{(j)}\geq\hat{a}_{i}^{(j)}-4\varepsilon. (3)

for all i[k]i\in[k]. W.l.o.g., we assume f(𝟎)=0f(\boldsymbol{0})=0.

Before showing the main result, we first show the following lemma, which is a generalization of Lemma 2.2 in Iwata et al. (2015) in the presence of noise.

Lemma 3.2.

Let c,d+c,d\in\mathbb{R}_{+}. Conditioning on 𝐬(j1)\boldsymbol{s}^{(j-1)}, suppose that

i=1k(a^i(j)a^i(j))pi(j)c(i=1ky^i(j)pi(j))+dε\displaystyle\sum_{i=1}^{k}\left(\hat{a}_{i^{*}}^{(j)}-\hat{a}_{i}^{(j)}\right)p_{i}^{(j)}\leq c\left(\sum_{i=1}^{k}\hat{y}_{i}^{(j)}p_{i}^{(j)}\right)+d\varepsilon (4)

holds for each jj with 1jn1\leq j\leq n, where i=𝐨(e(j))i^{*}=\boldsymbol{o}(e^{(j)}). Then 𝔼[f(𝐬)]11+cf(𝐨)(2c+d+4)nε\mathbb{E}[f(\boldsymbol{s})]\geq\frac{1}{1+c}f(\boldsymbol{o})-(2c+d+4)n\varepsilon.

We defer the proof of this lemma to Section A.1. Then, we show the following proposition:

Proposition 3.3.

Algorithm 2 for maximizing a non-monotone kk-submodular function is a (12,20n,nk)(\frac{1}{2},20n,nk)-robust approximation algorithm.

To show Proposition 3.3, by Definition 2.2, we aim to show that the output 𝒔\boldsymbol{s} of Algorithm 2 satisfies 𝔼[f(𝒔)]12f(𝒐)20nε\mathbb{E}[f(\boldsymbol{s})]\geq\frac{1}{2}f(\boldsymbol{o})-20n\varepsilon when the function evaluation is based on surrogate function f^\hat{f}. By Lemma 3.2 it suffices to prove (4) for every j[n]j\in[n] for c=1c=1 and d=14d=14. For simplicity of the description, we shall omit the superscript (j)(j) if it is clear from the context. Due to limited space, here we only consider the case when i+3i^{+}\geq 3 since this case needs more effort technically. The proof for other cases is in supplementary material Section A.2.

Part of proof of Proposition 3.3.

(i+3i^{+}\geq 3) Our goal is to show

i=1k(y^i+a^i)pia^i14ε,\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}+\hat{a}_{i})p_{i}\geq\hat{a}_{i^{*}}-14\varepsilon, (5)

which is equivalent to (4) with c=1c=1 and d=14d=14. By the ordering of {y^i}i[k]\{\hat{y}_{i}\}_{i\in[k]} and (3), for iii\leq i^{*} we have

y^iy^ia^i4ε.\displaystyle\hat{y}_{i}\geq\hat{y}_{i^{*}}\geq\hat{a}_{i^{*}}-4\varepsilon. (6)

Denote rargmini[k]a^ir\in\operatorname*{arg\,min}_{i\in[k]}\hat{a}_{i}.

Case 1: If r=ir=i^{*}, we have ia^ipia^i(ipi)=a^i\sum_{i}\hat{a}_{i}p_{i}\geq\hat{a}_{i^{*}}(\sum_{i}p_{i})=\hat{a}_{i^{*}}. Since iy^ipi0\sum_{i}\hat{y}_{i}p_{i}\geq 0 and since from Line 11 in Algorithm 2, a positive probability pi>0p_{i}>0 is only assigned to positive types ii with positive y^i>0\hat{y}_{i}>0 values,

i=1k(y^i+a^i)pi0+a^ia^i14ε.\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}+\hat{a}_{i})p_{i}\geq 0+\hat{a}_{i^{*}}\geq\hat{a}_{i^{*}}-14\varepsilon.

Thus (5) follows.

Case 2: If rir\neq i^{*} and ii+i^{*}\geq i^{+}, since from Line 11 in Algorithm 2, a positive probability pi>0p_{i}>0 is only assigned to positive y^\hat{y} values, we have that iy^ipi=ii+y^ipiii+(a^i4ε)pi=a^i4ε\sum_{i}\hat{y}_{i}p_{i}=\sum_{i\leq i^{+}}\hat{y}_{i}p_{i}\geq\sum_{i\leq i^{+}}(\hat{a}_{i^{*}}-4\varepsilon)p_{i}=\hat{a}_{i^{*}}-4\varepsilon by (6). When a^r0\hat{a}_{r}\geq 0, since a^ia^r\hat{a}_{i}\geq\hat{a}_{r} for any iri\neq r by definition of rr, we have ia^ipi0\sum_{i}\hat{a}_{i}p_{i}\geq 0. When a^r<0\hat{a}_{r}<0,

ia^ipi\displaystyle\sum_{i}\hat{a}_{i}p_{i} ir(a^r4ε)pi+a^rpr\displaystyle\geq\sum_{i\neq r}(-\hat{a}_{r}-4\varepsilon)p_{i}+\hat{a}_{r}p_{r} (by (2))
=a^r(prirpi)4εirpi\displaystyle=\hat{a}_{r}\left(p_{r}-\sum_{i\neq r}p_{i}\right)-4\varepsilon\sum_{i\neq r}p_{i}
04ε,\displaystyle\geq 0-4\varepsilon,

where the last inequality follows from a^r<0\hat{a}_{r}<0, irpipr\sum_{i\neq r}p_{i}\geq p_{r} and irpi1\sum_{i\neq r}p_{i}\leq 1, and irpipr\sum_{i\neq r}p_{i}\geq p_{r} follows from Line 11 in Algorithm 2 that the largest single probability assigned is 12\frac{1}{2}.

Thus,

i=1k(y^i+a^i)pi\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}+\hat{a}_{i})p_{i} (a^i4ε)+(4ε)a^i14ε.\displaystyle\geq(\hat{a}_{i^{*}}-4\varepsilon)+(-4\varepsilon)\geq\hat{a}_{i^{*}}-14\varepsilon.

Therefore (5) holds.

Case 3: If rir\neq i^{*} and i<i+i^{*}<i^{+}, we have

i=1k(y^i+a^i)pi=i=1ky^ipi+i=1ka^ipi\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}+\hat{a}_{i})p_{i}=\sum_{i=1}^{k}\hat{y}_{i}p_{i}+\sum_{i=1}^{k}\hat{a}_{i}p_{i}
ii(a^i4ε)pi+i>i(a^i4ε)pi+i=1ka^ipi\displaystyle\geq\sum_{i\leq i^{*}}(\hat{a}_{i^{*}}-4\varepsilon)p_{i}+\sum_{i>i^{*}}(\hat{a}_{i}-4\varepsilon)p_{i}+\sum_{i=1}^{k}\hat{a}_{i}p_{i} (using (6) and (3))
=iia^ipi+i>ia^ipi+i=1ka^ipi4ε\displaystyle=\sum_{i\leq i^{*}}\hat{a}_{i^{*}}p_{i}+\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i=1}^{k}\hat{a}_{i}p_{i}-4\varepsilon
=(i<ia^ipi+2a^ipi)\displaystyle=\left(\sum_{i<i^{*}}\hat{a}_{i^{*}}p_{i}+2\hat{a}_{i^{*}}p_{i^{*}}\right)
+(i>ia^ipi+ir,ia^ipi+a^rpr)4ε.\displaystyle\qquad+\left(\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}\right)-4\varepsilon. (7)

The first term equals a^i\hat{a}_{i^{*}} by simple calculations. Hence it suffices to show that the second term of (7) is greater than or equal to 10ε-10\varepsilon.

Sub-case 3.a: First, we consider a^r0\hat{a}_{r}\geq 0. By definition of rr, for all i[k]i\in[k], a^ia^r0\hat{a}_{i}\geq\hat{a}_{r}\geq 0. The second term of (7) can then be bounded

i>ia^ipi+ir,ia^ipi+a^rpr\displaystyle\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}
a^r(pr+i>ipi+ir,ipi)010ε.\displaystyle\qquad\geq\hat{a}_{r}\left(p_{r}+\sum_{i>i^{*}}p_{i}+\sum_{i\neq r,i^{*}}p_{i}\right)\geq 0\geq-10\varepsilon. (8)

Sub-case 3.b: Second we consider a^r<0\hat{a}_{r}<0. Since i<i+i^{*}<i^{+}, we have

i>ipi+ir,ipi=i+1ii+1pi+pi++ir,ipi\displaystyle\sum_{i>i^{*}}p_{i}+\sum_{i\neq r,i^{*}}p_{i}=\sum_{i^{*}+1\leq i\leq i^{+}-1}p_{i}+p_{i^{+}}+\sum_{i\neq r,i^{*}}p_{i}
=i=i+1i+1(12)i+(12)i+1+ir,ipi\displaystyle=\sum_{i=i^{*}+1}^{i^{+}-1}\left(\frac{1}{2}\right)^{i}+\left(\frac{1}{2}\right)^{i^{+}-1}+\sum_{i\neq r,i^{*}}p_{i} (by construction of pip_{i}’s)
=(12)i+ir,ipi\displaystyle=\left(\frac{1}{2}\right)^{i^{*}}+\sum_{i\neq r,i^{*}}p_{i} (geometric sum)
=pi+ir,ipi=1pr.\displaystyle=p_{i^{*}}+\sum_{i\neq r,i^{*}}p_{i}=1-p_{r}. (9)

Therefore, if r<ir<i^{*}, we get

i>ia^ipi+ir,ia^ipi+a^rpr\displaystyle\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}
i>i(a^r4ε)pi+ir,i(a^r4ε)pi+a^rpr\displaystyle\geq\sum_{i>i^{*}}(-\hat{a}_{r}-4\varepsilon)p_{i}+\sum_{i\neq r,i^{*}}(-\hat{a}_{r}-4\varepsilon)p_{i}+\hat{a}_{r}p_{r} (using (2) )
=a^r(pri>ipiir,ipi)4ε(i>ipi+ir,ipi)\displaystyle=\hat{a}_{r}\left(p_{r}-\sum_{i>i^{*}}p_{i}-\sum_{i\neq r,i^{*}}p_{i}\right)-4\varepsilon\left(\sum_{i>i^{*}}p_{i}+\sum_{i\neq r,i^{*}}p_{i}\right)
=a^r(pr(1pr))4ε(1pr)\displaystyle=\hat{a}_{r}\left(p_{r}-\left(1-p_{r}\right)\right)-4\varepsilon(1-p_{r}) (using (9))
=a^r(2pr1)4ε(1pr)\displaystyle=\hat{a}_{r}\left(2p_{r}-1\right)-4\varepsilon(1-p_{r})
04ε10ε.\displaystyle\geq 0-4\varepsilon\geq-10\varepsilon. (a^r<0\hat{a}_{r}<0 and pr1/2p_{r}\leq 1/2)

If r>ir>i^{*}. Then pr1/4p_{r}\leq 1/4 by Line 11 in Algorithm 2 since r1r\neq 1 and i+3i^{+}\geq 3. Hence, by a^r<0\hat{a}_{r}<0, the second term of (7) can then be bounded as

i>ia^ipi+ir,ia^ipi+a^rpr\displaystyle\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}
=i>i,ira^ipi+ir,ia^ipi+2a^rpr\displaystyle=\sum_{i>i^{*},i\neq r}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+2\hat{a}_{r}p_{r}
i>i,ir(a^r4ε)pi+ir,i(a^r4ε)pi+2a^rpr\displaystyle\geq\sum_{i>i^{*},i\neq r}(-\hat{a}_{r}-4\varepsilon)p_{i}+\sum_{i\neq r,i^{*}}(-\hat{a}_{r}-4\varepsilon)p_{i}+2\hat{a}_{r}p_{r} (using (2))
=a^r(2pri>i,irpiir,ipi)4ε(i>i,irpi+ir,ipi)\displaystyle=\hat{a}_{r}\left(2p_{r}-\sum_{i>i^{*},i\neq r}p_{i}-\sum_{i\neq r,i^{*}}p_{i}\right)-4\varepsilon\left(\sum_{i>i^{*},i\neq r}p_{i}+\sum_{i\neq r,i^{*}}p_{i}\right)
=a^r(2pr(12pr))4ε(12pr)\displaystyle=\hat{a}_{r}(2p_{r}-(1-2p_{r}))-4\varepsilon(1-2p_{r}) (using (9))
04ε10ε.\displaystyle\geq 0-4\varepsilon\geq-10\varepsilon. (a^r<0\hat{a}_{r}<0 and pr1/4p_{r}\leq 1/4)

Thus we conclude that when a^r<0\hat{a}_{r}<0, we have

i>ia^ipi+ir,ia^ipi+a^rpr10ε.\displaystyle\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}\geq-10\varepsilon. (10)

Combining (10) and (8), we conclude the proof for i+3i^{+}\geq 3. ∎

Remark 3.4.

The assumption of bounded noise plays an essential role in the analysis. Although in the presence of noise, the original property (e.g., pairwise monotonicity) of the function does not hold, with bounded noise bridging the f^\hat{f} and ff, we can still have a relaxed version of the desired property, leading to similar approximation ratio with only small error. In many cases, showing the relaxed version require completely new steps.

3.3 Regret Bound

Once we have analysed the robustness of Algorithm 2 and identified the number of function evaluations of Algorithm 2 is exactly nknk, we can employ the framework proposed in Nie et al. (2023a) and obtain the following result:

Corollary 3.5.

For an online non-monotone unconstrained kk-submodular maximization problem, the expected cumulative 1/21/2-regret of C-ETC using Algorithm 2 as a sub-routine is at most 𝒪(nk13T23log(T)13)\mathcal{O}\left(nk^{\frac{1}{3}}T^{\frac{2}{3}}\log(T)^{\frac{1}{3}}\right) given TnkT\geq nk.

4 Monotone Functions without Constraints

In this section, we continue to explore the unconstrained case, but now the objective is monotone. We adopt the offline k2k1\frac{k}{2k-1}-approximation algorithm proposed in (Iwata et al., 2015). The pseudo-code is presented in Algorithm 3 in the appendix. Similar to Algorithm 2 for the non-monotone case, the algorithm iterates through all eVe\in V (in an arbitrary but fixed order) and selects the type i[k]i\in[k] randomly according to some carefully designed probabilities for each eVe\in V.

One important note to highlight is the modification made on line 6 of the algorithm. In the original algorithm from (Iwata et al., 2015), it was stated as βi=1kyit\beta\leftarrow\sum_{i=1}^{k}y_{i}^{t}. This particular step is not robust to noise since later in line 9, the probability of yit/βy_{i}^{t}/\beta is assigned to type ii. The probability becomes meaningless if β<0\beta<0 and this is possible due to noise. We identified that a sufficient modification is to change this step to βi=1k[yi]+t\beta\leftarrow\sum_{i=1}^{k}[y_{i}]_{+}^{t}. Here, [x]+=x[x]_{+}=x if x0x\geq 0, and 0 otherwise. Note that when there is no noise, Algorithm 3 reduces to the original algorithm in Iwata et al. (2015) since in the monotone case, yi0y_{i}\geq 0 always holds.

We show the following robustness guarantee of Algorithm 3:

Proposition 4.1.

Algorithm 3 for maximizing a monotone kk-submodular function is a (k2k1,(162k)n,nk)(\frac{k}{2k-1},(16-\frac{2}{k})n,nk)-robust approximation.

By Definition 2.2, we aim to show that the output 𝒔\boldsymbol{s} of Algorithm 3 satisfies 𝔼[f(𝒔)]k2k1f(𝒐)(162k)nε\mathbb{E}[f(\boldsymbol{s})]\geq\frac{k}{2k-1}f(\boldsymbol{o})-(16-\frac{2}{k})n\varepsilon when the function evaluation is based on surrogate function f^\hat{f}. Note that in this setting, Lemma 3.2 still holds. Thus, it is sufficient to show

i=1k(a^ia^i)pi(11k)i=1ky^ipi+10ε,\displaystyle\sum_{i=1}^{k}(\hat{a}_{i^{*}}-\hat{a}_{i})p_{i}\leq(1-\frac{1}{k})\sum_{i=1}^{k}\hat{y}_{i}p_{i}+10\varepsilon, (11)

which is equivalent to (4) with c=11kc=1-\frac{1}{k} and d=10d=10. The detailed proof is in Section A.3.

Remark 4.2.

This modification of β\beta becomes significant when dealing with a noisy function oracle f^\hat{f}, where it’s possible that y^i<0\hat{y}_{i}<0. Our analysis demonstrates that this modification is crucial in handling the case of β=0\beta=0, which is a trivial case in the noiseless analysis of Iwata et al. (2015). In the presence of noise, the case of β=0\beta=0 becomes non trivial. It also plays a pivotal role in another case (β>0\beta>0), where we utilized β[y^i]+tβ\beta-[\hat{y}_{i^{*}}]_{+}^{t}\leq\beta. In the original algorithm, βy^itβ\beta-\hat{y}_{i^{*}}^{t}\leq\beta does not necessarily hold due to noise.

With the robustness guarantee for Algorithm 3, we can transfer it to the online bandit setting using the framework proposed in Nie et al. (2023a):

Corollary 4.3.

For an online monotone unconstrained kk-submodular maximization problem, the expected cumulative k2k1\frac{k}{2k-1}-regret of C-ETC is at most 𝒪(nk13T23log(T)13)\mathcal{O}\left(nk^{\frac{1}{3}}T^{\frac{2}{3}}\log(T)^{\frac{1}{3}}\right) given Tmax{nk,22k162k}T\geq\max\{nk,\frac{2\sqrt{2}k}{16-\frac{2}{k}}\}.

5 Monotone Functions with Individual Size Constraints

In this section, we consider Individual Size (IS) constraint. In IS, each type ii has a limit BiB_{i} on the maximum number of pairs of that type ii, with B=iBiB=\sum_{i}B_{i} as the total budget. We consider the offline greedy algorithm proposed in Ohsaka & Yoshida (2015).

The pseudo-code is presented in Algorithm 4 in the appendix. The algorithm builds the solution greedily by iteratively including the element-type pair that yields the largest marginal gain, as long as that pair is still available and the individual budget is not exhausted. We show the following robustness guarantee of Algorithm 4:

Proposition 5.1.

Algorithm 4 for maximizing a monotone kk-submodular function under individual size constraints is a (13,43(B+1),nkB)(\frac{1}{3},\frac{4}{3}(B+1),nkB)-robust approximation, where BB is the total size limit.

The proof is in Section A.4. With the robustness guarantee for Algorithm 4, we can transfer it to the online bandit setting using the framework proposed in Nie et al. (2023a):

Corollary 5.2.

For an online monotone kk-submodular maximization problem under individual size constraints, the expected cumulative 1/31/3-regret of C-ETC is at most 𝒪(n13k13BT23log(T)13)\mathcal{O}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}BT^{\frac{2}{3}}\log(T)^{\frac{1}{3}}\right) given Tnkmax{1,32B2(B+1)}T\geq nk\max\{1,\frac{3\sqrt{2}B}{2(B+1)}\}.

6 Monotone Functions with Matroid Constraints

In this section, we consider matroid constraint with monotone objective functions. We adapt the offline algorithm proposed in Sakaue (2017), which is depicted in Algorithm 5 in the appendix. The algorithm still builds the solution in a greedy manner. At each iteration, the algorithm constructs available elements E(𝒔)E(\boldsymbol{s}) given the current solution 𝒔\boldsymbol{s} using an assumed independence oracle, and includes the element-type pair that yields the largest marginal gain. We show the following robustness guarantee of Algorithm 5:

Proposition 6.1.

Algorithm 5 for maximizing a monotone kk-submodular function under a matroid constraint is a (12,M+1,nkM)(\frac{1}{2},M+1,nkM)-robust approximation, where MM is the rank of the matroid.

The proof is in Section A.5. With the robustness guarantee for Algorithm 5, we transform it into a CMAB algorithm using the framework proposed in (Nie et al., 2023a):

Corollary 6.2.

For an online monotone kk-submodular maximization problem under a matroid constraint, the expected cumulative 1/21/2-regret of C-ETC is at most 𝒪(n13k13MT23log(T)13)\mathcal{O}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}MT^{\frac{2}{3}}\log(T)^{\frac{1}{3}}\right) given Tnkmax{1,32M2(M+1)}T\geq nk\max\{1,\frac{3\sqrt{2}M}{2(M+1)}\}.

One special case of matroid constraint is Total Size (TS) constraint (Ohsaka & Yoshida, 2015). In TS, there is a limited budget on the total number of element-type pairs that can be selected. This is also called uniform matroid in literature. In this case, the rank of the matroid is the total size budget BB. We can immediately get an online algorithm as a result of Corollary 6.2:

Corollary 6.3.

For an online monotone kk-submodular maximization problem under a TS constraint, the expected cumulative 1/21/2-regret of C-ETC is at most 𝒪(n13k13BT23log(T)13)\mathcal{O}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}BT^{\frac{2}{3}}\log(T)^{\frac{1}{3}}\right) given Tnkmax{1,32B2(B+1)}T\geq nk\max\{1,\frac{3\sqrt{2}B}{2(B+1)}\}, using N𝒜=nkBN_{\mathcal{A}}=nkB as an upper bound of the number of function evaluations for Algorithm 5.

Remark 6.4.

In Sections 5 and 6, although marginal gains are not guaranteed to be non-negative, we prove that there is no need to modify the algorithms as in Section 4. This is due to the inherent design of the algorithms. In Algorithm 3, probabilities are assigned to each item in proportion to their marginal gains for each type (Line 9), so it is necessary to ensure that the sum of the marginal gains is non-negative. However, Algorithms 4 and 5 select item-type pairs by directly comparing marginal gains, regardless of whether they are positive or negative. In cases where monotonicity does not hold due to noise, we use bounded error analysis to derive analogous properties.

7 Non-Monotone Functions with Matroid Constraints

In Sun et al. (2022), it is shown that Algorithm 5 in appendix can achieve 1/31/3 approximation ratio even the objective function is non-monotone. We show the following robustness result when Algorithm 5 is fed with surrogate function f^\hat{f}.

Proposition 7.1.

Algorithm 5 for maximizing a non-monotone kk-submodular function under a matroid constraint is a (13,43(M+1),nkM)(\frac{1}{3},\frac{4}{3}(M+1),nkM)-robust approximation, where MM is the rank of the matroid.

The proof is in Section A.6. We use the offline-to-online transformation framework proposed in Nie et al. (2023a) to adapt Algorithm 5:

Corollary 7.2.

For an online non-monotone kk-submodular maximization problem under a matroid constraint, the expected cumulative 1/31/3-regret of C-ETC is at most 𝒪(n13k13MT23log(T)13)\mathcal{O}\left(n^{\frac{1}{3}}k^{\frac{1}{3}}MT^{\frac{2}{3}}\log(T)^{\frac{1}{3}}\right) given Tnkmax{1,32M2(M+1)}T\geq nk\max\{1,\frac{3\sqrt{2}M}{2(M+1)}\}.

8 Evaluations

In this section, we empirically evaluate the performance of our proposed methods in the context of online influence maximization with kk topics. This problem is formulated as a monotone kk-submodular bandit problem (detailed below). As there are no directly comparable baselines in the literature, we select naive UCB (NaiveUCB) and random selection (Random) as benchmarks, with UCB implemented using the standard UCB1 algorithm.

Online Influence Maximization with kk Topics: The problem involves a social network represented as a directed graph G=(V,E)G=(V,E), where VV is the set of nodes and EE is the set of edges. Each edge (u,v)E(u,v)\in E has associated weights pu,vip_{u,v}^{i}, i[k]i\in[k], where pu,vip_{u,v}^{i} denotes the influence strength from user uu to vv on topic ii. The goal is to maximize the number of users influenced by one or more topics. We adopt the kk-topic independent cascade (kk-IC) model from Ohsaka & Yoshida (2015), which generalizes the independent cascade model (Kempe et al., 2003).

Specifically, the influence spread σ:(k+1)V+\sigma:(k+1)^{V}\rightarrow\mathbb{R}_{+} in the kk-IC model is defined as σ(S)=𝔼[|i[k]Ai(Ui(S))|]\sigma(S)=\mathbb{E}\left[\left|\bigcup_{i\in[k]}A_{i}\left(U_{i}(S)\right)\right|\right], where Ai(Ui(S))A_{i}\left(U_{i}(S)\right) is a random variable representing the set of influenced nodes in the diffusion process of topic ii. As shown in Ohsaka & Yoshida (2015), σ\sigma is monotone kk-submodular. Given a directed graph G=(V,E)G=(V,E), edge probabilities {pu,vi(u,v)E,i[k]}\{p_{u,v}^{i}\mid(u,v)\in E,i\in[k]\}, the task is to select a seed set S(k+1)VS\in(k+1)^{V} that maximizes σ(S)\sigma(S), subject to some constraints (e.g., total size constraints).

Experiment settings: We use the ego-facebook network (Leskovec & Mcauley, 2012). After applying a community detection algorithm (Blondel et al., 2008) to extract a subgraph, we convert it into a directed graph consisting of 350 users and 2,845 edges. Three sets of edge weights, corresponding to three topics (k=3k=3), are generated: Uniform(0, 0.2), Normal(0.1, 0.05), and Exponential(10), with the Normal and Exponential distributions truncated to [0, 1]. We evaluate our algorithms under both TS and IS constraints. The TS budget is set to B=6B=6, while for IS constraints, the budgets for the three topics are all 2. As greedy algorithms offer 1/21/2 and 1/31/3 approximations for TS and IS respectively (Ohsaka & Yoshida, 2015), we use an offline greedy algorithm to compute the α\alpha-optimal value, with 100 diffusion simulations approximating the true function value. Since all cases are monotone, NaiveUCB and Random restrict their search space to actions that exhaust the budget. Means and standard deviations are calculated over 10 independent runs.

Results: The instantaneous rewards over 10410^{4} time steps is shown in Figure 1. Under both TS and IS constraints, the results can be summarized as follows. In the early stages (before 4,000), ETCG explores suboptimal actions that do not use the full budget, leading to worse initial instantaneous rewards. However, ETCG catches up in later stages and achieves lower cumulative regret over the entire time horizon. We note that NaiveUCB does UCB on all actions that exhaust budgets, which takes approximately (606)5×108{60\choose 6}\sim 5\times 10^{8} for TS and (202)37×107{20\choose 2}^{3}\sim 7\times 10^{7} for IS, respectively, to even explore each action once, resulting a bad performance.

Refer to caption
Refer to caption
Figure 1: Instantaneous Rewards on Influence Maximization experiments.

9 Conclusion

In this paper, we investigate the problem of online kk-submodular maximization problem under bandit feedback. Utilizing the framework proposed by Nie et al. (2023a), we propose CMAB algorithms through adapting various offline algorithms and analyzing their robustness. We obtain sublinear regret for online kk-submodular maximization under bandit feedback under various settings, including monotone and non-monotone objectives without constraint, monotone objectives with individual size constraint, monotone and non-monotone objectives with matroid constraint. Numerical experiments on online influence maximization with kk topics were conducted to evaluate the effectiveness of our methods.

References

  • Blackwell (1956) Blackwell, D. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6:1–8, 1956.
  • Blondel et al. (2008) Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008:10008, 2008.
  • Chen et al. (2022) Chen, J., Tang, Z., and Wang, C. Monotone k-submodular knapsack maximization: An analysis of the greedy+singleton algorithm. In Ni, Q. and Wu, W. (eds.), Algorithmic Aspects in Information and Management, pp.  144–155, Cham, 2022. Springer International Publishing. ISBN 978-3-031-16081-3.
  • Ene & Nguyen (2022) Ene, A. and Nguyen, H. Streaming algorithm for monotone k-submodular maximization with cardinality constraints. In International Conference on Machine Learning, pp.  5944–5967. PMLR, 2022.
  • Fourati et al. (2023) Fourati, F., Aggarwal, V., Quinn, C., and Alouini, M.-S. Randomized greedy learning for non-monotone stochastic submodular maximization under full-bandit feedback. In International Conference on Artificial Intelligence and Statistics, pp.  7455–7471. PMLR, 2023.
  • Fourati et al. (2024a) Fourati, F., Alouini, M.-S., and Aggarwal, V. Federated combinatorial multi-agent multi-armed bandits. In Forty-first International Conference on Machine Learning, 2024a.
  • Fourati et al. (2024b) Fourati, F., Quinn, C. J., Alouini, M.-S., and Aggarwal, V. Combinatorial stochastic-greedy bandit. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  12052–12060, 2024b.
  • Huber & Kolmogorov (2012) Huber, A. and Kolmogorov, V. Towards minimizing k-submodular functions. In International Symposium on Combinatorial Optimization, 2012.
  • Iwata et al. (2013) Iwata, S., ichi Tanigawa, S., and Yoshida, Y. Bisubmodular function maximization and extensions. Technical Report METR 2013–16, the University of Tokyo, September 2013.
  • Iwata et al. (2015) Iwata, S., Tanigawa, S.-i., and Yoshida, Y. Improved approximation algorithms for k-submodular function maximization. In ACM-SIAM Symposium on Discrete Algorithms, 2015.
  • Kempe et al. (2003) Kempe, D., Kleinberg, J., and Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.  137–146, 2003.
  • Khuller et al. (1999) Khuller, S., Moss, A., and Naor, J. S. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39–45, 1999.
  • Leskovec & Mcauley (2012) Leskovec, J. and Mcauley, J. Learning to discover social circles in ego networks. volume 25, 2012.
  • Lin et al. (2015) Lin, T., Li, J., and Chen, W. Stochastic online greedy learning with semi-bandit feedbacks. In Proceedings of the 29th International Conference on Neural Information Processing Systems, pp.  352–360, 2015.
  • Matsuoka & Ohsaka (2021) Matsuoka, T. and Ohsaka, N. Maximization of monotone kk-submodular functions with bounded curvature and non-kk-submodular functions. In Asian Conference on Machine Learning, pp.  1707–1722. PMLR, 2021.
  • Niazadeh et al. (2021) Niazadeh, R., Golrezaei, N., Wang, J. R., Susan, F., and Badanidiyuru, A. Online learning via offline greedy algorithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, pp.  737–738, 2021.
  • Nie et al. (2022) Nie, G., Agarwal, M., Umrawal, A. K., Aggarwal, V., and Quinn, C. J. An explore-then-commit algorithm for submodular maximization under full-bandit feedback. In Uncertainty in Artificial Intelligence, pp.  1541–1551. PMLR, 2022.
  • Nie et al. (2023a) Nie, G., Nadew, Y. Y., Zhu, Y., Aggarwal, V., and Quinn, C. J. A framework for adapting offline algorithms to solve combinatorial multi-armed bandit problems with bandit feedback. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  26166–26198. PMLR, 23–29 Jul 2023a.
  • Nie et al. (2023b) Nie, G., Zhu, Y., Nadew, Y. Y., Basu, S., Pavan, A., and Quinn, C. J. Size-constrained k-submodular maximization in near-linear time. In Evans, R. J. and Shpitser, I. (eds.), Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, volume 216 of Proceedings of Machine Learning Research, pp.  1545–1554. PMLR, 31 Jul–04 Aug 2023b.
  • Ohsaka & Yoshida (2015) Ohsaka, N. and Yoshida, Y. Monotone k-submodular function maximization with size constraints. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
  • Oshima (2021) Oshima, H. Improved randomized algorithm for k-submodular function maximization. SIAM Journal on Discrete Mathematics, 35(1):1–22, 2021.
  • Pham et al. (2021) Pham, C. V., Vu, Q. C., Ha, D. K., and Nguyen, T. T. Streaming algorithms for budgeted k-submodular maximization problem. In Computational Data and Social Networks: 10th International Conference, Proceedings, pp.  27–38. Springer, 2021.
  • Sakaue (2017) Sakaue, S. On maximizing a monotone k-submodular function subject to a matroid constraint. Discrete Optimization, 23:105–113, 2017. ISSN 1572-5286.
  • Soma (2019) Soma, T. No-regret algorithms for online kk-submodular maximization. In Chaudhuri, K. and Sugiyama, M. (eds.), Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pp.  1205–1214. PMLR, 16–18 Apr 2019.
  • Streeter & Golovin (2008) Streeter, M. and Golovin, D. An online algorithm for maximizing submodular functions. In Proceedings of the 21st International Conference on Neural Information Processing Systems, pp.  1577–1584, 2008.
  • Sun et al. (2022) Sun, Y., Liu, Y., and Li, M. Maximization of kk-submodular function with a matroid constraint. In Du, D.-Z., Du, D., Wu, C., and Xu, D. (eds.), Theory and Applications of Models of Computation, pp.  1–10, Cham, 2022. Springer International Publishing.
  • Sviridenko (2004) Sviridenko, M. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32:41–43, 2004.
  • Tajdini et al. (2023) Tajdini, A., Jain, L., and Jamieson, K. Minimax optimal submodular optimization with bandit feedback, 2023.
  • Takemori et al. (2020) Takemori, S., Sato, M., Sonoda, T., Singh, J., and Ohkuma, T. Submodular bandit problem under multiple constraints. In Conference on Uncertainty in Artificial Intelligence, pp.  191–200. PMLR, 2020.
  • Tang et al. (2022) Tang, Z., Wang, C., and Chan, H. On maximizing a monotone k-submodular function under a knapsack constraint. Operations Research Letters, 50:28–31, 2022.
  • Ward & Živný (2014) Ward, J. and Živný, S. Maximizing k-submodular functions and beyond. ACM Transactions on Algorithms, 12:1 – 26, 2014.
  • Xiao et al. (2023) Xiao, H., Liu, Q., Zhou, Y., and Li, M. Approximation algorithms for kk-submodular maximization subject to a knapsack constraint, 2023.
  • Yu et al. (2016) Yu, B., Fang, M., and Tao, D. Linear submodular bandits with a knapsack constraint. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
  • Yu et al. (2023) Yu, K., Li, M., Zhou, Y., and Liu, Q. On maximizing monotone or non-monotone k-submodular functions with the intersection of knapsack and matroid constraints. J. Comb. Optim., 45(3), apr 2023. ISSN 1382-6905.
  • Yue & Guestrin (2011) Yue, Y. and Guestrin, C. Linear submodular bandits and their application to diversified retrieval. Advances in Neural Information Processing Systems, 24, 2011.

Appendix A Missing Proofs

A.1 Proof of Lemma 3.2

We restate the lemma here:

Lemma A.1.

Let c,d+c,d\in\mathbb{R}_{+}. Conditioning on 𝐬(j1)\boldsymbol{s}^{(j-1)}, suppose that

i=1k(a^i(j)a^i(j))pi(j)c(i=1ky^i(j)pi(j))+dε\displaystyle\sum_{i=1}^{k}\left(\hat{a}_{i^{*}}^{(j)}-\hat{a}_{i}^{(j)}\right)p_{i}^{(j)}\leq c\left(\sum_{i=1}^{k}\hat{y}_{i}^{(j)}p_{i}^{(j)}\right)+d\varepsilon (12)

holds for each jj with 1jn1\leq j\leq n, where i=𝐨(e(j))i^{*}=\boldsymbol{o}(e^{(j)}). Then 𝔼[f(𝐬)]11+cf(𝐨)(2c+d+4)nε\mathbb{E}[f(\boldsymbol{s})]\geq\frac{1}{1+c}f(\boldsymbol{o})-(2c+d+4)n\varepsilon.

Proof.

Fix j[n]j\in[n]. Conditioning on 𝒔(j1)\boldsymbol{s}^{(j-1)}, element e(j)e^{(j)} will be randomly assigned a type i[k]i\in[k] following probabilities described in Lines 9-11 in the pseudocode. We first calculate the conditional expected difference between f(𝒐(j1))f(\boldsymbol{o}^{(j-1)}) and f(𝒐(j))f(\boldsymbol{o}^{(j)}). Let ii^{*} denote the type of e(j)e^{(j)} in 𝒐\boldsymbol{o} (and thus in 𝒐(j1)\boldsymbol{o}^{(j-1)}). By construction, since 𝒕(j1)\boldsymbol{t}^{(j-1)} is obtained from 𝒐(j)\boldsymbol{o}^{(j)} by changing 𝒐(j)(e(j))\boldsymbol{o}^{(j)}(e^{(j)}) (which is the same as 𝒔(e(j))\boldsymbol{s}(e^{(j)}) and thus is ii with probability pi(j)p_{i}^{(j)}) with 0, we have

𝔼[f(𝒐(j))f(𝒕(j1))|𝒔(j1)]=i=1kΔe(j),if(𝒕(j1))pi(j)=i=1kai(j)pi(j).\displaystyle\mathbb{E}[f(\boldsymbol{o}^{(j)})-f(\boldsymbol{t}^{(j-1)})|\boldsymbol{s}^{(j-1)}]=\sum_{i=1}^{k}\Delta_{e^{(j)},i}f(\boldsymbol{t}^{(j-1)})p_{i}^{(j)}=\sum_{i=1}^{k}a_{i}^{(j)}p_{i}^{(j)}. (13)

Also, 𝒐(j1)\boldsymbol{o}^{(j-1)} can be obtained from 𝒕(j1)\boldsymbol{t}^{(j-1)} by replacing 𝒕(j1)(e(j))\boldsymbol{t}^{(j-1)}(e^{(j)}) (which is 0) with 𝒐(j)(e(j))\boldsymbol{o}^{(j)}(e^{(j)}) (which is ii^{*}), we have

f(𝒐(j1))f(𝒕(j1))=ai(j).\displaystyle f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{t}^{(j-1)})=a_{i^{*}}^{(j)}. (14)

Thus

𝔼[f(𝒐(j1))f(𝒐(j))|𝒔(j1)]\displaystyle\mathbb{E}[f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})|\boldsymbol{s}^{(j-1)}]
=\displaystyle= 𝔼[f(𝒐(j1))f(𝒕(j1))+f(𝒕(j1))f(𝒐(j))|𝒔(j1)]\displaystyle\mathbb{E}[f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{t}^{(j-1)})+f(\boldsymbol{t}^{(j-1)})-f(\boldsymbol{o}^{(j)})|\boldsymbol{s}^{(j-1)}]
=\displaystyle= i=1k(ai(j)ai(j))pi(j)\displaystyle\sum_{i=1}^{k}\left(a_{i^{*}}^{(j)}-a_{i}^{(j)}\right)p_{i}^{(j)} (using (13) and (14))
\displaystyle\leq i=1k(a^i(j)a^i(j)+4ε)pi(j)\displaystyle\sum_{i=1}^{k}\left(\hat{a}_{i^{*}}^{(j)}-\hat{a}_{i}^{(j)}+4\varepsilon\right)p_{i}^{(j)} (definition of f^\hat{f})
=\displaystyle= i=1k(a^i(j)a^i(j))pi(j)+4ε\displaystyle\sum_{i=1}^{k}\left(\hat{a}_{i^{*}}^{(j)}-\hat{a}_{i}^{(j)}\right)p_{i}^{(j)}+4\varepsilon (15)

where the last equality is due to ipi(j)=1\sum_{i}p_{i}^{(j)}=1. 𝒔(j)\boldsymbol{s}^{(j)} can be obtained from 𝒔(j1)\boldsymbol{s}^{(j-1)} by replacing 𝒔(j1)(e(j))\boldsymbol{s}^{(j-1)}(e^{(j)}) (which is 0) with 𝒔(j)(e(j))\boldsymbol{s}^{(j)}(e^{(j)}) (which is ii with probability pi(j)p_{i}^{(j)}), we have

𝔼[f(𝒔(j))f(𝒔(j1))|𝒔(j1)]\displaystyle\mathbb{E}[f(\boldsymbol{s}^{(j)})-f(\boldsymbol{s}^{(j-1)})|\boldsymbol{s}^{(j-1)}]
=\displaystyle= i=1kyi(j)pi(j)\displaystyle\sum_{i=1}^{k}y_{i}^{(j)}p_{i}^{(j)} (by construction)
\displaystyle\geq i=1k(y^i(j)2ε)pi(j)\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}^{(j)}-2\varepsilon)p_{i}^{(j)} (definition of f^\hat{f})
=\displaystyle= i=1ky^i(j)pi(j)2ε,\displaystyle\sum_{i=1}^{k}\hat{y}_{i}^{(j)}p_{i}^{(j)}-2\varepsilon, (16)

where both expectations are taken over the randomness of the algorithm. Combining (15), (16) and (12), we have

𝔼[f(𝒐(j1))f(𝒐(j))|𝒔(j1)]c𝔼[f(𝒔(j))f(𝒔(j1))|𝒔(j1)]+(2c+d+4)ε.\displaystyle\mathbb{E}[f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})|\boldsymbol{s}^{(j-1)}]\leq c\mathbb{E}[f(\boldsymbol{s}^{(j)})-f(\boldsymbol{s}^{(j-1)})|\boldsymbol{s}^{(j-1)}]+(2c+d+4)\varepsilon. (17)

Note that 𝒐(0)=𝒐\boldsymbol{o}^{(0)}=\boldsymbol{o} and 𝒐(n)=𝒔\boldsymbol{o}^{(n)}=\boldsymbol{s} by construction. Hence

f(𝒐)𝔼[f(𝒔)]\displaystyle f(\boldsymbol{o})-\mathbb{E}[f(\boldsymbol{s})] =j=1n𝔼[f(𝒐(j1))f(𝒐(j))]\displaystyle=\sum_{j=1}^{n}\mathbb{E}[f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})] (by construction)
=j=1n𝔼[𝔼[f(𝒐(j1))f(𝒐(j))|𝒔(j1)]]\displaystyle=\sum_{j=1}^{n}\mathbb{E}[\mathbb{E}[f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})|\boldsymbol{s}^{(j-1)}]]
=j=1n𝔼[c𝔼[f(𝒔(j))f(𝒔(j1))|𝒔(j1)]+(2c+d+4)nε]]\displaystyle=\sum_{j=1}^{n}\mathbb{E}\left[c\mathbb{E}[f(\boldsymbol{s}^{(j)})-f(\boldsymbol{s}^{(j-1)})|\boldsymbol{s}^{(j-1)}]+(2c+d+4)n\varepsilon]\right] (using (17))
c(j=1n𝔼[f(𝒔(j))f(𝒔(j1))])+(2c+d+4)nε\displaystyle\leq c\left(\sum_{j=1}^{n}\mathbb{E}[f(\boldsymbol{s}^{(j)})-f(\boldsymbol{s}^{(j-1)})]\right)+(2c+d+4)n\varepsilon
=c(𝔼[f(𝒔)]f(𝟎))+(2c+d+4)nε\displaystyle=c(\mathbb{E}[f(\boldsymbol{s})]-f(\mathbf{0}))+(2c+d+4)n\varepsilon
=c𝔼[f(𝒔)]+(2c+d+4)nε,\displaystyle=c\mathbb{E}[f(\boldsymbol{s})]+(2c+d+4)n\varepsilon, (f(𝟎)=0f(\boldsymbol{0})=0)

and we get the statement. ∎

Algorithm 2 A randomized algorithm for kk-submodular maximization without constraints (Iwata et al., 2015)
1:  Input: ground set VV, value oracle access for ff
2:  Output: a vector 𝒔(k+1)V\boldsymbol{s}\in(k+1)^{V}.
3:  Initialize 𝒔𝟎n\boldsymbol{s}\leftarrow\mathbf{0}_{n}.
4:  for eVe\in V do
5:     yiΔe,if(𝒔)y_{i}\leftarrow\Delta_{e,i}f(\boldsymbol{s}) for i[k]i\in[k].
6:     w.l.o.g. assume y1y2yky_{1}\geq y_{2}\geq\ldots\geq y_{k}.
7:     i+{max. i such that yi>0if y1>0,0otherwise.i^{+}\leftarrow\begin{cases}\text{max. }i\text{ such that }y_{i}>0&\text{if }y_{1}>0,\\ 0&\text{otherwise.}\end{cases}
8:     Initialize p𝟎kp\leftarrow\mathbf{0}_{k}.
9:     if i+1i^{+}\leq 1p11p_{1}\leftarrow 1.
10:     else if i+=2i^{+}=2, for i{1,2}i\in\{1,2\}  piyi(y1+y2)p_{i}\leftarrow\frac{y_{i}}{(y_{1}+y_{2})}.
11:     else pi+(12)i+1p_{i^{+}}\leftarrow\left(\frac{1}{2}\right)^{i^{+}-1} and for i<i+i<i^{+}, pi(12)ip_{i}\leftarrow\left(\frac{1}{2}\right)^{i}
12:     Choose 𝒔(e)(𝒔(e)=i)=pi\boldsymbol{s}(e)\sim\mathbb{P}(\boldsymbol{s}(e)=i)=p_{i} for all i[k]i\in[k].
13:  end for
14:  Return 𝒔\boldsymbol{s}.

A.2 Proof of Proposition 3.3

We repeat some notations that we will use in our analysis that have already been introduced in the main text.

Let 𝒐\boldsymbol{o} be an optimal solution with supp(𝒐)=V\operatorname{supp}(\boldsymbol{o})=V (by Proposition 3.1 such a solution exists). Let 𝒔\boldsymbol{s} be the output of the algorithm with |supp(𝒔)|=n|\operatorname{supp}(\boldsymbol{s})|=n using surrogate function f^\hat{f}. We consider the jj-th iteration of the algorithm, and let e(j)e^{(j)} be the element of VV considered in the jj-th iteration, pi(j)p_{i}^{(j)} be the probability that ii-th type is chosen in the jj-th iteration, and 𝒔(j)\boldsymbol{s}^{(j)} be the solution after the ii-th iteration, where 𝒔(0)=𝟎\boldsymbol{s}^{(0)}=\mathbf{0}. Also for 0jn0\leq j\leq n, let 𝒐(j)\boldsymbol{o}^{(j)} be the element in {0,1,,k}V\{0,1,\ldots,k\}^{V} obtained from 𝒐\boldsymbol{o} by replacing the coordinates on supp(𝒔(j))\operatorname{supp}(\boldsymbol{s}^{(j)}) with those of 𝒔(j)\boldsymbol{s}^{(j)}, and for 1jn1\leq j\leq n let 𝒕(j1)\boldsymbol{t}^{(j-1)} be the element in {0,1,,k}V\{0,1,\ldots,k\}^{V} obtained from 𝒐(j)\boldsymbol{o}^{(j)} by changing 𝒐(j)(e(j))\boldsymbol{o}^{(j)}(e^{(j)}) with 0.

The subsequent analyses will involve comparing different marginal gains, which we next introduce. For i[k]i\in[k], let yi(j)=Δe(j),if(𝒔(j1))y_{i}^{(j)}=\Delta_{e^{(j)},i}f(\boldsymbol{s}^{(j-1)}), y^i(j)=Δe(j),if^(𝒔(j1))\hat{y}_{i}^{(j)}=\Delta_{e^{(j)},i}\hat{f}(\boldsymbol{s}^{(j-1)}) and let ai(j)=Δe(j),if(𝒕(j1))a_{i}^{(j)}=\Delta_{e^{(j)},i}f(\boldsymbol{t}^{(j-1)}), a^i(j)=Δe(j),if^(𝒕(j1))\hat{a}_{i}^{(j)}=\Delta_{e^{(j)},i}\hat{f}(\boldsymbol{t}^{(j-1)}). Due to pairwise monotonicity, we have yi(j)+yi(j)0y_{i}^{(j)}+y_{i^{\prime}}^{(j)}\geq 0 and ai(j)+ai(j)0a_{i}^{(j)}+a_{i^{\prime}}^{(j)}\geq 0 for all i,i[k]i,i^{\prime}\in[k] with iii\neq i^{\prime}, and thus, while the surrogate function f^\hat{f} does not necessarily have pairwise monotonicity, we have

y^i(j)+y^i(j)4ε\displaystyle\hat{y}_{i}^{(j)}+\hat{y}_{i^{\prime}}^{(j)}\geq-4\varepsilon , (18)
a^i(j)+a^i(j)4ε\displaystyle\hat{a}_{i}^{(j)}+\hat{a}_{i^{\prime}}^{(j)}\geq-4\varepsilon . (19)

for all i,i[k]i,i^{\prime}\in[k] with iii\neq i^{\prime} since both inequalities involve a sum of four function values of f^\hat{f} which can be off by at most ϵ\epsilon with respect to ff. Also from 𝒔(j)𝒕(j)\boldsymbol{s}^{(j)}\preceq\boldsymbol{t}^{(j)} which holds by construction, orthant submodularity implies yi(j)ai(j)y_{i}^{(j)}\geq a_{i}^{(j)} for all i[k]i\in[k] and thus,

y^i(j)a^i(j)4ε.\displaystyle\hat{y}_{i}^{(j)}\geq\hat{a}_{i}^{(j)}-4\varepsilon. (20)

for all i[k]i\in[k]. Without loss of generality, we assume f(𝟎)=0f(\boldsymbol{0})=0.

Now we are ready to prove Proposition 3.3.

Proof of Proposition 3.3.

It is trivial to see the number of value oracle queries required is nknk, as it iterates through all item-type pairs. The following proof will be dedicated to show the α\alpha and δ\delta terms in Definition 2.2.

By Definition 2.2, we aim to show that the output 𝒔\boldsymbol{s} of Algorithm 2 satisfies 𝔼[f(𝒔)]12f(𝒐)20nε\mathbb{E}[f(\boldsymbol{s})]\geq\frac{1}{2}f(\boldsymbol{o})-20n\varepsilon when the function evaluation is based on surrogate function f^\hat{f}. By Lemma 3.2, our goal is to show

i=1k(y^i+a^i)pia^i14ε,\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}+\hat{a}_{i})p_{i}\geq\hat{a}_{i^{*}}-14\varepsilon, (21)

which is equivalent to (4) in main text with c=1c=1 and d=14d=14. For simplicity of the description we shall omit the superscript (j)(j) if it is clear from the context. Recall that y^i+y^i4ε\hat{y}_{i}+\hat{y}_{i^{\prime}}\geq-4\varepsilon and a^i+a^i4ε\hat{a}_{i}+\hat{a}_{i^{\prime}}\geq-4\varepsilon for i,i[k]i,i^{\prime}\in[k] with iii\neq i^{\prime}, and y^ia^i4ε\hat{y}_{i}\geq\hat{a}_{i}-4\varepsilon for i[k]i\in[k] (c.f. (18),(19) and (20)). We break up the analysis to three cases: i+1i^{+}\leq 1, i+=2i^{+}=2 and i+3i^{+}\geq 3.

Case 1:

If i+1i^{+}\leq 1, then by Line 9 in Algorithm 2 p1=1p_{1}=1 and pi=0p_{i}=0 for i>1i>1. So (21) specializes to a^1+y^1a^i14ε\hat{a}_{1}+\hat{y}_{1}\geq\hat{a}_{i^{*}}-14\varepsilon. Since y^i+y^i4ε\hat{y}_{i}+\hat{y}_{i^{\prime}}\geq-4\varepsilon for i,i[k]i,i^{\prime}\in[k] and y^s\hat{y}^{\prime}s are ranked in decreasing order, we have y^1+y^i4ε\hat{y}_{1}+\hat{y}_{i^{\prime}}\geq-4\varepsilon and y^1y^i\hat{y}_{1}\geq\hat{y}_{i^{\prime}} for i[k]i^{\prime}\in[k]. Summing these two inequalities we have y^12ε\hat{y}_{1}\geq-2\varepsilon.

Sub-case 1.a:

If i=1i^{*}=1,

a^1+y^1=a^i+y^1a^i2εa^i14ε,\hat{a}_{1}+\hat{y}_{1}=\hat{a}_{i^{*}}+\hat{y}_{1}\geq\hat{a}_{i^{*}}-2\varepsilon\geq\hat{a}_{i^{*}}-14\varepsilon,

showing the statement.

Sub-case 1.b:

If i1i^{*}\neq 1, from i+1i^{+}\leq 1 and (20), 0y^ia^i4ε0\geq\hat{y}_{i^{*}}\geq\hat{a}_{i^{*}}-4\varepsilon. We also have a^1+a^i4ε\hat{a}_{1}+\hat{a}_{i^{*}}\geq-4\varepsilon by (19). Hence by summing these two inequalities we have a^18ε\hat{a}_{1}\geq-8\varepsilon. Combining with y^12ε\hat{y}_{1}\geq-2\varepsilon we have

a^1+y^110εa^i14ε,\hat{a}_{1}+\hat{y}_{1}\geq-10\varepsilon\geq\hat{a}_{i^{*}}-14\varepsilon,

showing the statement.

Case 2:

If i+=2i^{+}=2, , then by Line 10 in Algorithm 2, (21) specializes to

(a^1+y^1)y^1+(a^2+y^2)y^2a^i(y^1+y^2)14ε(y^1+y^2).(\hat{a}_{1}+\hat{y}_{1})\hat{y}_{1}+(\hat{a}_{2}+\hat{y}_{2})\hat{y}_{2}\geq\hat{a}_{i^{*}}(\hat{y}_{1}+\hat{y}_{2})-14\varepsilon(\hat{y}_{1}+\hat{y}_{2}).

In this case, we have y^1,y^2>0\hat{y}_{1},\hat{y}_{2}>0 by definition of i+i^{+}.

Sub-case 2.a:

i2i^{*}\leq 2. Now (a^1+y^1)y^1+(a^2+y^2)y^2=a^1y^1+a^2y^2+(y^1y^2)2+2y^1y^2a^1y^1+a^2y^2+2y^1y^2(\hat{a}_{1}+\hat{y}_{1})\hat{y}_{1}+(\hat{a}_{2}+\hat{y}_{2})\hat{y}_{2}=\hat{a}_{1}\hat{y}_{1}+\hat{a}_{2}\hat{y}_{2}+(\hat{y}_{1}-\hat{y}_{2})^{2}+2\hat{y}_{1}\hat{y}_{2}\geq\hat{a}_{1}\hat{y}_{1}+\hat{a}_{2}\hat{y}_{2}+2\hat{y}_{1}\hat{y}_{2}. If i=1i^{*}=1, then

a^1y^1+a^2y^2+2y^1y^2\displaystyle\hat{a}_{1}\hat{y}_{1}+\hat{a}_{2}\hat{y}_{2}+2\hat{y}_{1}\hat{y}_{2}
a^1y^1+a^2y^2+2(a^14ε)y^2\displaystyle\geq\hat{a}_{1}\hat{y}_{1}+\hat{a}_{2}\hat{y}_{2}+2(\hat{a}_{1}-4\varepsilon)\hat{y}_{2} (using (20))
=a^1(y^1+y^2)+(a^2+a^1)y^28εy^2\displaystyle=\hat{a}_{1}(\hat{y}_{1}+\hat{y}_{2})+(\hat{a}_{2}+\hat{a}_{1})\hat{y}_{2}-8\varepsilon\hat{y}_{2}
a^1(y^1+y^2)12εy^2\displaystyle\geq\hat{a}_{1}(\hat{y}_{1}+\hat{y}_{2})-12\varepsilon\hat{y}_{2} (using (19))
a^1(y^1+y^2)12ε(y^1+y^2)\displaystyle\geq\hat{a}_{1}(\hat{y}_{1}+\hat{y}_{2})-12\varepsilon(\hat{y}_{1}+\hat{y}_{2}) (y^10\hat{y}_{1}\geq 0)
=a^i(y^1+y^2)12ε(y^1+y^2)\displaystyle=\hat{a}_{i^{*}}(\hat{y}_{1}+\hat{y}_{2})-12\varepsilon(\hat{y}_{1}+\hat{y}_{2}) (i=1i^{*}=1)
a^i(y^1+y^2)14ε(y^1+y^2)\displaystyle\geq\hat{a}_{i^{*}}(\hat{y}_{1}+\hat{y}_{2})-14\varepsilon(\hat{y}_{1}+\hat{y}_{2}) (22)

as required. By a similar calculation the claim follows if i=2i^{*}=2.

Sub-case 2.b:

If i3i^{*}\geq 3, then 0y^ia^i4ε0\geq\hat{y}_{i^{*}}\geq\hat{a}_{i^{*}}-4\varepsilon by definition of i+i^{+} and (20). By (19), we also have that a^1+a^i4ε\hat{a}_{1}+\hat{a}_{i^{*}}\geq-4\varepsilon and a^2+a^i4ε\hat{a}_{2}+\hat{a}_{i^{*}}\geq-4\varepsilon. Thus,

(a^1+y^1)y^1+(a^2+y^2)y^2\displaystyle(\hat{a}_{1}+\hat{y}_{1})\hat{y}_{1}+(\hat{a}_{2}+\hat{y}_{2})\hat{y}_{2}
=a^1y^1+y^12+a^2y^2+y^22\displaystyle=\hat{a}_{1}\hat{y}_{1}+\hat{y}_{1}^{2}+\hat{a}_{2}\hat{y}_{2}+\hat{y}_{2}^{2}
a^1y^1+a^2y^2\displaystyle\geq\hat{a}_{1}\hat{y}_{1}+\hat{a}_{2}\hat{y}_{2} (y^1,y^2>0\hat{y}_{1},\hat{y}_{2}>0 in Case 2)
(a^i4ε)(y^1+y^2)\displaystyle\geq(-\hat{a}_{i^{*}}-4\varepsilon)(\hat{y}_{1}+\hat{y}_{2})
(a^i4ε)(y^1+y^2)+2(a^i4ε)(y^1+y^2)\displaystyle\geq(-\hat{a}_{i^{*}}-4\varepsilon)(\hat{y}_{1}+\hat{y}_{2})+2(\hat{a}_{i^{*}}-4\varepsilon)(\hat{y}_{1}+\hat{y}_{2}) (y^1,y^20\hat{y}_{1},\hat{y}_{2}\geq 0 and a^i4ε0\hat{a}_{i^{*}}-4\varepsilon\leq 0)
=a^i(y^1+y^2)12ε(y^1+y^2)\displaystyle=\hat{a}_{i^{*}}(\hat{y}_{1}+\hat{y}_{2})-12\varepsilon(\hat{y}_{1}+\hat{y}_{2})
a^i(y^1+y^2)14ε(y^1+y^2)\displaystyle\geq\hat{a}_{i^{*}}(\hat{y}_{1}+\hat{y}_{2})-14\varepsilon(\hat{y}_{1}+\hat{y}_{2}) (23)

as required.

Case 3:

When i+3i^{+}\geq 3, our goal is to show

i=1k(y^i+a^i)pia^i14ε,\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}+\hat{a}_{i})p_{i}\geq\hat{a}_{i^{*}}-14\varepsilon, (24)

which is equivalent to (4) with c=1c=1 and d=14d=14. By the ordering of {y^i}i[k]\{\hat{y}_{i}\}_{i\in[k]} and (3), for iii\leq i^{*} we have

y^iy^ia^i4ε.\displaystyle\hat{y}_{i}\geq\hat{y}_{i^{*}}\geq\hat{a}_{i^{*}}-4\varepsilon. (25)

Denote rargmini[k]a^ir\in\operatorname*{arg\,min}_{i\in[k]}\hat{a}_{i}.

Sub-case 3.a: If r=ir=i^{*}, we have ia^ipia^i(ipi)=a^i\sum_{i}\hat{a}_{i}p_{i}\geq\hat{a}_{i^{*}}(\sum_{i}p_{i})=\hat{a}_{i^{*}}. Since iy^ipi0\sum_{i}\hat{y}_{i}p_{i}\geq 0 and since from Line 11 in Algorithm 2, a positive probability pi>0p_{i}>0 is only assigned to positive types ii with positive y^i>0\hat{y}_{i}>0 values,

i=1k(y^i+a^i)pi0+a^ia^i14ε.\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}+\hat{a}_{i})p_{i}\geq 0+\hat{a}_{i^{*}}\geq\hat{a}_{i^{*}}-14\varepsilon.

Thus (24) follows.

Sub-case 3.b: If rir\neq i^{*} and ii+i^{*}\geq i^{+}, since from Line 11 in Algorithm 2, a positive probability pi>0p_{i}>0 is only assigned to positive y^\hat{y} values, we have that iy^ipi=ii+y^ipiii+(a^i4ε)pi=a^i4ε\sum_{i}\hat{y}_{i}p_{i}=\sum_{i\leq i^{+}}\hat{y}_{i}p_{i}\geq\sum_{i\leq i^{+}}(\hat{a}_{i^{*}}-4\varepsilon)p_{i}=\hat{a}_{i^{*}}-4\varepsilon by (25). When a^r0\hat{a}_{r}\geq 0, since a^ia^r\hat{a}_{i}\geq\hat{a}_{r} for any iri\neq r by definition of rr, we have ia^ipi0\sum_{i}\hat{a}_{i}p_{i}\geq 0. When a^r<0\hat{a}_{r}<0,

ia^ipi\displaystyle\sum_{i}\hat{a}_{i}p_{i} ir(a^r4ε)pi+a^rpr\displaystyle\geq\sum_{i\neq r}(-\hat{a}_{r}-4\varepsilon)p_{i}+\hat{a}_{r}p_{r} (by (2))
=a^r(prirpi)4εirpi\displaystyle=\hat{a}_{r}\left(p_{r}-\sum_{i\neq r}p_{i}\right)-4\varepsilon\sum_{i\neq r}p_{i}
04ε,\displaystyle\geq 0-4\varepsilon,

where the last inequality follows from a^r<0\hat{a}_{r}<0, irpipr\sum_{i\neq r}p_{i}\geq p_{r} and irpi1\sum_{i\neq r}p_{i}\leq 1, and irpipr\sum_{i\neq r}p_{i}\geq p_{r} follows from Line 11 in Algorithm 2 that the largest single probability assigned is 12\frac{1}{2}. Thus,

i=1k(y^i+a^i)pi(a^i4ε)+(4ε)a^i14ε.\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}+\hat{a}_{i})p_{i}\geq(\hat{a}_{i^{*}}-4\varepsilon)+(-4\varepsilon)\geq\hat{a}_{i^{*}}-14\varepsilon.

Therefore (24) holds.

Sub-case 3.c: If rir\neq i^{*} and i<i+i^{*}<i^{+}, we have

i=1k(y^i+a^i)pi\displaystyle\sum_{i=1}^{k}(\hat{y}_{i}+\hat{a}_{i})p_{i} =i=1ky^ipi+i=1ka^ipi\displaystyle=\sum_{i=1}^{k}\hat{y}_{i}p_{i}+\sum_{i=1}^{k}\hat{a}_{i}p_{i}
ii(a^i4ε)pi+i>i(a^i4ε)pi+i=1ka^ipi\displaystyle\geq\sum_{i\leq i^{*}}(\hat{a}_{i^{*}}-4\varepsilon)p_{i}+\sum_{i>i^{*}}(\hat{a}_{i}-4\varepsilon)p_{i}+\sum_{i=1}^{k}\hat{a}_{i}p_{i} (using (25) and (3))
=iia^ipi+i>ia^ipi+i=1ka^ipi4ε\displaystyle=\sum_{i\leq i^{*}}\hat{a}_{i^{*}}p_{i}+\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i=1}^{k}\hat{a}_{i}p_{i}-4\varepsilon
=(i<ia^ipi+2a^ipi)+(i>ia^ipi+ir,ia^ipi+a^rpr)4ε.\displaystyle=\left(\sum_{i<i^{*}}\hat{a}_{i^{*}}p_{i}+2\hat{a}_{i^{*}}p_{i^{*}}\right)+\left(\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}\right)-4\varepsilon. (26)

The first term equals a^i\hat{a}_{i^{*}} by construction of pip_{i}’s and since i<i+i^{*}<i^{+}. Hence it suffices to show that the second term of (26) is greater than or equal to 10ε-10\varepsilon.

Subsub-case 3.c.I: First we consider a^r0\hat{a}_{r}\geq 0. By definition of rr, for all i[k]i\in[k], a^ia^r0\hat{a}_{i}\geq\hat{a}_{r}\geq 0. The second term of (26) can then be bounded as

i>ia^ipi+ir,ia^ipi+a^rpra^r(pr+i>ipi+ir,ipi)010ε.\displaystyle\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}\geq\hat{a}_{r}\left(p_{r}+\sum_{i>i^{*}}p_{i}+\sum_{i\neq r,i^{*}}p_{i}\right)\geq 0\geq-10\varepsilon. (27)

Subsub-case 3.c.II: Second we consider a^r<0\hat{a}_{r}<0. Since i<i+i^{*}<i^{+}, we have

i>ipi+ir,ipi\displaystyle\sum_{i>i^{*}}p_{i}+\sum_{i\neq r,i^{*}}p_{i} =i+1ii+1pi+pi++ir,ipi\displaystyle=\sum_{i^{*}+1\leq i\leq i^{+}-1}p_{i}+p_{i^{+}}+\sum_{i\neq r,i^{*}}p_{i}
=i=i+1i+1(12)i+(12)i+1+ir,ipi\displaystyle=\sum_{i=i^{*}+1}^{i^{+}-1}\left(\frac{1}{2}\right)^{i}+\left(\frac{1}{2}\right)^{i^{+}-1}+\sum_{i\neq r,i^{*}}p_{i} (by construction of pip_{i}’s)
=(12)i+ir,ipi\displaystyle=\left(\frac{1}{2}\right)^{i^{*}}+\sum_{i\neq r,i^{*}}p_{i} (geometric sum)
=pi+ir,ipi=1pr.\displaystyle=p_{i^{*}}+\sum_{i\neq r,i^{*}}p_{i}=1-p_{r}. (28)

Therefore, if r<ir<i^{*}, we get

i>ia^ipi+ir,ia^ipi+a^rpr\displaystyle\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}
i>i(a^r4ε)pi+ir,i(a^r4ε)pi+a^rpr\displaystyle\geq\sum_{i>i^{*}}(-\hat{a}_{r}-4\varepsilon)p_{i}+\sum_{i\neq r,i^{*}}(-\hat{a}_{r}-4\varepsilon)p_{i}+\hat{a}_{r}p_{r} (using (2) )
=a^r(pri>ipiir,ipi)4ε(i>ipi+ir,ipi)\displaystyle=\hat{a}_{r}\left(p_{r}-\sum_{i>i^{*}}p_{i}-\sum_{i\neq r,i^{*}}p_{i}\right)-4\varepsilon\left(\sum_{i>i^{*}}p_{i}+\sum_{i\neq r,i^{*}}p_{i}\right)
=a^r(pr(1pr))4ε(1pr)\displaystyle=\hat{a}_{r}\left(p_{r}-\left(1-p_{r}\right)\right)-4\varepsilon(1-p_{r}) (using (28))
=a^r(2pr1)4ε(1pr)\displaystyle=\hat{a}_{r}\left(2p_{r}-1\right)-4\varepsilon(1-p_{r})
04ε10ε.\displaystyle\geq 0-4\varepsilon\geq-10\varepsilon. (a^r<0\hat{a}_{r}<0 and pr1/2p_{r}\leq 1/2)

If r>ir>i^{*}. Then pr1/4p_{r}\leq 1/4 by Line 11 in Algorithm 2 since r1r\neq 1 and i+3i^{+}\geq 3. Hence, by a^r<0\hat{a}_{r}<0, the second term of (26) can then be bounded as

i>ia^ipi+ir,ia^ipi+a^rpr\displaystyle\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}
=i>i,ira^ipi+ir,ia^ipi+2a^rpr\displaystyle=\sum_{i>i^{*},i\neq r}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+2\hat{a}_{r}p_{r}
i>i,ir(a^r4ε)pi+ir,i(a^r4ε)pi+2a^rpr\displaystyle\geq\sum_{i>i^{*},i\neq r}(-\hat{a}_{r}-4\varepsilon)p_{i}+\sum_{i\neq r,i^{*}}(-\hat{a}_{r}-4\varepsilon)p_{i}+2\hat{a}_{r}p_{r} (using (2))
=a^r(2pri>i,irpiir,ipi)4ε(i>i,irpi+ir,ipi)\displaystyle=\hat{a}_{r}\left(2p_{r}-\sum_{i>i^{*},i\neq r}p_{i}-\sum_{i\neq r,i^{*}}p_{i}\right)-4\varepsilon\left(\sum_{i>i^{*},i\neq r}p_{i}+\sum_{i\neq r,i^{*}}p_{i}\right)
=a^r(2pr(12pr))4ε(12pr)\displaystyle=\hat{a}_{r}(2p_{r}-(1-2p_{r}))-4\varepsilon(1-2p_{r}) (using (28))
04ε10ε.\displaystyle\geq 0-4\varepsilon\geq-10\varepsilon. (a^r<0\hat{a}_{r}<0 and pr1/4p_{r}\leq 1/4)

Thus we conclude that when a^r<0\hat{a}_{r}<0, we have

i>ia^ipi+ir,ia^ipi+a^rpr10ε.\displaystyle\sum_{i>i^{*}}\hat{a}_{i}p_{i}+\sum_{i\neq r,i^{*}}\hat{a}_{i}p_{i}+\hat{a}_{r}p_{r}\geq-10\varepsilon. (29)

Combining (29) and (27), we conclude Case 3 of the proof. ∎

Algorithm 3 A randomized algorithm for monotone kk-submodular maximization without constraints (adapted from (Iwata et al., 2015))
1:  Input: ground set VV, value oracle access for a monotone kk-submodular function f:(k+1)V+f:(k+1)^{V}\rightarrow\mathbb{R}_{+}.
2:  Output: a vector 𝒔(k+1)V\boldsymbol{s}\in(k+1)^{V}.
3:  Initialize 𝒔𝟎n\boldsymbol{s}\leftarrow\mathbf{0}_{n}, tk1t\leftarrow k-1.
4:  for eVe\in V do
5:     yiΔe,if(𝒔)y_{i}\leftarrow\Delta_{e,i}f(\boldsymbol{s}) for i[k]i\in[k].
6:     βi=1k[yi]+t\beta\leftarrow\sum_{i=1}^{k}[y_{i}]_{+}^{t}.
7:     Initialize p𝟎kp\leftarrow\mathbf{0}_{k}.
8:     if β0\beta\neq 0 then
9:        piyitβp_{i}\leftarrow\frac{y_{i}^{t}}{\beta} for i[k]i\in[k].
10:     else
11:        pi{1if i=10otherwise.p_{i}\leftarrow\begin{cases}1&\text{if }i=1\\ 0&\text{otherwise.}\end{cases} for i[k]i\in[k].
12:     end if
13:     Choose 𝒔(e)(𝒔(e)=i)=pi\boldsymbol{s}(e)\sim\mathbb{P}(\boldsymbol{s}(e)=i)=p_{i} for all i[k]i\in[k].
14:  end for
15:  Return 𝒔\boldsymbol{s}.

A.3 Proof of Proposition 4.1

We use the same setup and constructions as in Section A.2 and further, since we are considering the monotone case, we have ai(j)0a_{i}^{(j)}\geq 0 and yi(j)0y_{i}^{(j)}\geq 0 for all i[k]i\in[k] and j[n]j\in[n]. Thus, for the surrogate function f^\hat{f}, we have

a^i(j)2ε and y^i(j)2ε.\displaystyle\hat{a}_{i}^{(j)}\geq-2\varepsilon\text{ and }\hat{y}_{i}^{(j)}\geq-2\varepsilon. (30)

Now start to prove Proposition 4.1. Again, we shall omit the superscript (j)(j) if it is clear from the context.

Proof of Proposition 4.1.

It is trivial to see the number of value oracle queries required is nknk, as it iterates through all item-type pairs. The following proof will be dedicated to showing the α\alpha and δ\delta terms in Definition 2.2.

By Definition 2.2, we aim to show that the output 𝒔\boldsymbol{s} of Algorithm 3 satisfies 𝔼[f(𝒔)]k2k1f(𝒐)(162k)nε\mathbb{E}[f(\boldsymbol{s})]\geq\frac{k}{2k-1}f(\boldsymbol{o})-(16-\frac{2}{k})n\varepsilon when the function evaluation is based on surrogate function f^\hat{f}. Note that in this setting, Lemma 3.2 still holds. Thus, our goal is to show

i=1k(a^ia^i)pi(11k)i=1ky^ipi+10ε,\displaystyle\sum_{i=1}^{k}(\hat{a}_{i^{*}}-\hat{a}_{i})p_{i}\leq(1-\frac{1}{k})\sum_{i=1}^{k}\hat{y}_{i}p_{i}+10\varepsilon, (31)

which is equivalent to (4) in main text with c=11kc=1-\frac{1}{k} and d=10d=10. Recall that y^i+y^i4ε\hat{y}_{i}+\hat{y}_{i^{\prime}}\geq-4\varepsilon and a^i+a^i4ε\hat{a}_{i}+\hat{a}_{i^{\prime}}\geq-4\varepsilon for i,i[k]i,i^{\prime}\in[k] with iii\neq i^{\prime}, and y^ia^i4ε\hat{y}_{i}\geq\hat{a}_{i}-4\varepsilon for i[k]i\in[k] (c.f. (18),(19) and (20)). We break up the analysis into two cases: β=0\beta=0 and β>0\beta>0.

Case 1:

When β=0\beta=0, we have y^i0\hat{y}_{i}\leq 0 for all i[k]i\in[k] due to the definition of β\beta. In this case, (31) specializes to

a^ia^1(11k)y^1+10ε.\displaystyle\hat{a}_{i^{*}}-\hat{a}_{1}\leq(1-\frac{1}{k})\hat{y}_{1}+10\varepsilon. (32)

We have

a^ia^1\displaystyle\hat{a}_{i^{*}}-\hat{a}_{1} a^i+2ε\displaystyle\leq\hat{a}_{i^{*}}+2\varepsilon (using a^12ε\hat{a}_{1}\geq-2\varepsilon)
6ε\displaystyle\leq 6\varepsilon (using a^i4εy^i0\hat{a}_{i^{*}}-4\varepsilon\leq\hat{y}_{i^{*}}\leq 0)
y^i+6ε\displaystyle\leq-\hat{y}_{i^{*}}+6\varepsilon (using y^i0\hat{y}_{i^{*}}\leq 0)
y^1+10ε\displaystyle\leq\hat{y}_{1}+10\varepsilon (using y^i+y^14ε\hat{y}_{i^{*}}+\hat{y}_{1}\geq-4\varepsilon)
(11k)y^1+10ε,\displaystyle\leq(1-\frac{1}{k})\hat{y}_{1}+10\varepsilon, (using y^10\hat{y}_{1}\leq 0)

showing the statement.

Case 2:

When β>0\beta>0, we have y^i>0\hat{y}_{i}>0 for some i[k]i\in[k]. In this case, (31) specializes to

i=1k(a^ia^i)y^it(11k)i=1ky^it+1+10βε.\displaystyle\sum_{i=1}^{k}(\hat{a}_{i^{*}}-\hat{a}_{i})\hat{y}_{i}^{t}\leq(1-\frac{1}{k})\sum_{i=1}^{k}\hat{y}_{i}^{t+1}+10\beta\varepsilon. (33)

Sub-case 2.a:

If k=1k=1, the LHS of (33) is 0 since i=1i^{*}=1. RHS is positive since both y^i\hat{y}_{i} and β\beta are positive. Thus, (33) clearly holds.

Sub-case 2.b:

If k2k\geq 2. Let γ=(k1)1t=t1t\gamma=(k-1)^{\frac{1}{t}}=t^{\frac{1}{t}}. We have

ii(a^ia^i)y^it\displaystyle\sum_{i\neq i^{*}}(\hat{a}_{i^{*}}-\hat{a}_{i})\hat{y}_{i}^{t} ii(a^i+2ε)y^it\displaystyle\leq\sum_{i\neq i^{*}}(\hat{a}_{i^{*}}+2\varepsilon)\hat{y}_{i}^{t} (using a^i2ε\hat{a}_{i}\geq-2\varepsilon)
ii(y^i+6ε)y^it\displaystyle\leq\sum_{i\neq i^{*}}(\hat{y}_{i^{*}}+6\varepsilon)\hat{y}_{i}^{t} (using a^i4εy^i\hat{a}_{i}-4\varepsilon\leq\hat{y}_{i})
=iiy^iy^it+6εiiy^it\displaystyle=\sum_{i\neq i^{*}}\hat{y}_{i^{*}}\hat{y}_{i}^{t}+6\varepsilon\sum_{i\neq i^{*}}\hat{y}_{i}^{t}
iiy^iy^it+6εii[y^i]+t\displaystyle\leq\sum_{i\neq i^{*}}\hat{y}_{i^{*}}\hat{y}_{i}^{t}+6\varepsilon\sum_{i\neq i^{*}}[\hat{y}_{i}]_{+}^{t}
=iiy^iy^it+6ε(β[y^i]+t)\displaystyle=\sum_{i\neq i^{*}}\hat{y}_{i^{*}}\hat{y}_{i}^{t}+6\varepsilon(\beta-[\hat{y}_{i^{*}}]_{+}^{t}) (definition of β\beta)
iiy^iy^it+6εβ\displaystyle\leq\sum_{i\neq i^{*}}\hat{y}_{i^{*}}\hat{y}_{i}^{t}+6\varepsilon\beta
=1γ(γy^iiiy^it)+6εβ\displaystyle=\frac{1}{\gamma}\left(\gamma\hat{y}_{i^{*}}\sum_{i\neq i^{*}}\hat{y}_{i}^{t}\right)+6\varepsilon\beta (34)

Using the weighted AM-GM inequality, a1t+1btt+11t+1a+tt+1ba^{\frac{1}{t+1}}b^{\frac{t}{t+1}}\leq\frac{1}{t+1}a+\frac{t}{t+1}b for all a,b0a,b\geq 0, and letting a=(γy^i)t+1a=(\gamma\hat{y}_{i^{*}})^{t+1} and b=(iiy^it)t+1tb=(\sum_{i\neq i^{*}}\hat{y}_{i}^{t})^{\frac{t+1}{t}}, we deduce

1γ(γy^iiiy^it)\displaystyle\frac{1}{\gamma}\left(\gamma\hat{y}_{i^{*}}\sum_{i\neq i^{*}}\hat{y}_{i}^{t}\right) 1γ(1t+1(γy^i)t+1+tt+1(iiy^it)t+1t)\displaystyle\leq\frac{1}{\gamma}\left(\frac{1}{t+1}\left(\gamma\hat{y}_{i^{*}}\right)^{t+1}+\frac{t}{t+1}(\sum_{i\neq i^{*}}\hat{y}_{i}^{t})^{\frac{t+1}{t}}\right) (35)

From Holder’s inequality, iai(iait+1t)tt+1(i1t+1)1t+1\sum_{i}a_{i}\leq(\sum_{i}a_{i}^{\frac{t+1}{t}})^{\frac{t}{t+1}}(\sum_{i}1^{t+1})^{\frac{1}{t+1}} holds for any non-negative aia_{i}’s. By setting ai=y^ita_{i}=\hat{y}_{i}^{t}, we have

1γ(γy^iiiy^it)\displaystyle\frac{1}{\gamma}\left(\gamma\hat{y}_{i^{*}}\sum_{i\neq i^{*}}\hat{y}_{i}^{t}\right) 1γ(1t+1(γy^i)t+1+tt+1(iiy^it+1)(ii1t+1)1t)\displaystyle\leq\frac{1}{\gamma}\left(\frac{1}{t+1}\left(\gamma\hat{y}_{i^{*}}\right)^{t+1}+\frac{t}{t+1}\left(\sum_{i\neq i^{*}}\hat{y}_{i}^{t+1}\right)\cdot\left(\sum_{i\neq i^{*}}1^{t+1}\right)^{\frac{1}{t}}\right)
=1γ(1t+1(γy^i)t+1+t(k1)1/tt+1iiy^it+1)\displaystyle=\frac{1}{\gamma}\left(\frac{1}{t+1}\left(\gamma\hat{y}_{i^{*}}\right)^{t+1}+\frac{t(k-1)^{1/t}}{t+1}\sum_{i\neq i^{*}}\hat{y}_{i}^{t+1}\right)
=γtt+1i=1ky^it+1\displaystyle=\frac{\gamma^{t}}{t+1}\sum_{i=1}^{k}\hat{y}_{i}^{t+1}
=(11k)i=1ky^it+1.\displaystyle=(1-\frac{1}{k})\sum_{i=1}^{k}\hat{y}_{i}^{t+1}. (36)

Plug (36) back into (34), we get

ii(a^ia^i)y^it\displaystyle\sum_{i\neq i^{*}}(\hat{a}_{i^{*}}-\hat{a}_{i})\hat{y}_{i}^{t} (11k)i=1ky^it+1+6εβ\displaystyle\leq(1-\frac{1}{k})\sum_{i=1}^{k}\hat{y}_{i}^{t+1}+6\varepsilon\beta
(11k)i=1ky^it+1+10εβ,\displaystyle\leq(1-\frac{1}{k})\sum_{i=1}^{k}\hat{y}_{i}^{t+1}+10\varepsilon\beta, (37)

which is desired.

Algorithm 4 Monotone kk-submodular maximization with a IS constraint (Ohsaka & Yoshida, 2015)
1:  Input: value oracle access for a monotone kk-submodular function f:(k+1)V+f:(k+1)^{V}\rightarrow\mathbb{R}_{+}, integers B1,,Bk+B_{1},\ldots,B_{k}\in\mathbb{Z}_{+}.
2:  Output: a vector 𝒔\boldsymbol{s} satisfying suppi(𝒔)=Bi\operatorname{supp}_{i}(\boldsymbol{s})=B_{i} for each i[k]i\in[k].
3:  Initialize 𝒔𝟎\boldsymbol{s}\leftarrow\mathbf{0} and Bi[k]BiB\leftarrow\sum_{i\in[k]}B_{i}.
4:  for j[B]j\in[B] do
5:     I{i[k]|suppi(𝒔)<Bi}I\leftarrow\{i\in[k]|\operatorname{supp}_{i}(\boldsymbol{s})<B_{i}\}.
6:     (e,i)argmaxeVsupp(𝒔),iIΔe,if(𝒔)(e,i)\leftarrow\operatorname*{arg\,max}_{e\in V\setminus\operatorname{supp}(\boldsymbol{s}),i\in I}\Delta_{e,i}f(\boldsymbol{s}).
7:     𝒔(e)i\boldsymbol{s}(e)\leftarrow i.
8:  end for
9:  Return 𝒔\boldsymbol{s}.

A.4 Proof of Proposition 5.1

Proof.

Let (e(j),i(j))V×[k]\left(e^{(j)},i^{(j)}\right)\in V\times[k] be the pair greedily chosen in this iteration using the surrogate function f^\hat{f}, and let 𝒔(j)\boldsymbol{s}^{(j)} be the solution after this iteration. Let 𝒐\boldsymbol{o} be an optimal solution. For each j[B]j\in[B], we define Si(j)=suppi(𝒐(j1))\suppi(𝒔(j1))S_{i}^{(j)}=\operatorname{supp}_{i}(\boldsymbol{o}^{(j-1)})\backslash\operatorname{supp}_{i}(\boldsymbol{s}^{(j-1)}). Following the construction of Ohsaka & Yoshida (2015), we iteratively define 𝒐(0)=𝒐,𝒐(1),,𝒐(B)\boldsymbol{o}^{(0)}=\boldsymbol{o},\boldsymbol{o}^{(1)},\ldots,\boldsymbol{o}^{(B)} as follows. We have two cases to consider:

Case 1: Suppose that e(j)Si(j)e^{(j)}\in S_{i^{\prime}}^{(j)} for some ii(j)i^{\prime}\neq i^{(j)}. In this case, let o(j)o^{(j)} be an arbitrary element in Si(j)(j)S_{i^{(j)}}^{(j)}. Then, we define 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} as the resulting vector obtained from 𝒐(j1)\boldsymbol{o}^{(j-1)} by assigning 0 to the e(j)e^{(j)}-th element and the o(j)o^{(j)}-th element, and then define 𝒐(j)\boldsymbol{o}^{(j)} as the resulting vector obtained from 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} by assigning i(j)i^{(j)} to the e(j)e^{(j)}-th element and ii^{\prime} to the o(j)o^{(j)}-th element.

Case 2: Suppose that e(j)Si(j)e^{(j)}\notin S_{i^{\prime}}^{(j)} for any ii(j)i^{\prime}\neq i^{(j)}. In this case, we set o(j)=e(j)o^{(j)}=e^{(j)} if e(j)Si(j)(j)e^{(j)}\in S_{i^{(j)}}^{(j)}, and we set o(j)o^{(j)} to be an arbitrary element in Si(j)(j)S_{i^{(j)}}^{(j)} otherwise. Then, we define 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} as the resulting vector obtained from 𝒐(j1)\boldsymbol{o}^{(j-1)} by assigning 0 to the o(j)o^{(j)}-th element, and then define 𝒐(j)\boldsymbol{o}^{(j)} as the resulting vector obtained from 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} by assigning i(j)i^{(j)} to the e(j)e^{(j)}-th element.

Intuitively, we want a transition from 𝒐\boldsymbol{o} to 𝒔\boldsymbol{s} by swapping in item-type pairs in 𝒔\boldsymbol{s} one by one to 𝒐\boldsymbol{o}, in the order of the pair is chosen by the greedy algorithm. In case 1, if the considered item e(j)e^{(j)} is already in 𝒐(j1)\boldsymbol{o}^{(j-1)} but with another type ii(j)i^{\prime}\neq i^{(j)}, we obtain 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} from 𝒐(j1)\boldsymbol{o}^{(j-1)} by assign type 0 to item e(j)e^{(j)}, and obtain 𝒐(j)\boldsymbol{o}^{(j)} from 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} by assign type i(j)i^{(j)} to item e(j)e^{(j)}. In case 2, if there is no type ii(j)i^{\prime}\neq i^{(j)} that is assigned to item e(j)e^{(j)} in 𝒐(j1)\boldsymbol{o}^{(j-1)}, there are two sub-cases. In the first sub-case, if (e(j),i(j))(e^{(j)},i^{(j)}) is already in 𝒐(j1)\boldsymbol{o}^{(j-1)}, we set 𝒐(j)=𝒐(j1)\boldsymbol{o}^{(j)}=\boldsymbol{o}^{(j-1)} and obtain 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} from 𝒐(j1)\boldsymbol{o}^{(j-1)} by assign type 0 to item e(j)e^{(j)}. In the second sub-case, if e(j)e^{(j)} is not assigned any type in 𝒐(j1)\boldsymbol{o}^{(j-1)}, we obtain 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} from 𝒐(j1)\boldsymbol{o}^{(j-1)} by assign type 0 to an arbitrary item with type i(j)i^{(j)} in 𝒐(j1)\boldsymbol{o}^{(j-1)}, and obtain 𝒐(j)\boldsymbol{o}^{(j)} from 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} by assign type i(j)i^{(j)} to item e(j)e^{(j)}. Note that |suppi(𝒐(j))|=Bi\left|\operatorname{supp}_{i}(\boldsymbol{o}^{(j)})\right|=B_{i} holds for every i[k]i\in[k] and j{0,1,,B}j\in\{0,1,\ldots,B\}, and 𝒐(B)=𝒔(B)=𝒔\boldsymbol{o}^{(B)}=\boldsymbol{s}^{(B)}=\boldsymbol{s}. Moreover, we have 𝒔(j1)𝒐(j1/2)\boldsymbol{s}^{(j-1)}\preceq\boldsymbol{o}^{(j-1/2)} for every j[B]j\in[B]. We further denote Δe,if^(𝒙)=f^(X1,,Xi1,Xi{e},Xi+1,,Xk)f^(X1,,Xk)\Delta_{e,i}\hat{f}(\boldsymbol{x})=\hat{f}(X_{1},\ldots,X_{i-1},X_{i}\cup\{e\},X_{i+1},\ldots,X_{k})-\hat{f}(X_{1},\ldots,X_{k}) as analogous to Δe,if(𝒙)\Delta_{e,i}f(\boldsymbol{x}) under surrogate function f^\hat{f}.

We consider f(𝒐(j1))f(𝒐(j))f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)}) in these two cases.

Case 1: In this case, e(j)Si(j)e^{(j)}\in S_{i^{\prime}}^{(j)} for some ii(j)i^{\prime}\neq i^{(j)}. By construction, we have

f(𝒐(j1))f(𝒐(j1/2))=Δo(j),i(j)f(𝒐(j1/2))+Δe(j),if(𝒐(j1/2))f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j-1/2)})=\Delta_{o^{(j)},i^{(j)}}f\left(\boldsymbol{o}^{(j-1/2)}\right)+\Delta_{e^{(j)},i^{\prime}}f\left(\boldsymbol{o}^{(j-1/2)}\right)

and

f(𝒐(j))f(𝒐(j1/2))=Δe(j),i(j)f(𝒐(j1/2))+Δo(j),if(𝒐(j1/2)).f(\boldsymbol{o}^{(j)})-f(\boldsymbol{o}^{(j-1/2)})=\Delta_{e^{(j)},i^{(j)}}f\left(\boldsymbol{o}^{(j-1/2)}\right)+\Delta_{o^{(j)},i^{\prime}}f\left(\boldsymbol{o}^{(j-1/2)}\right).

Thus,

f(𝒐(j1))f(𝒐(j))\displaystyle f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)}) =(f(𝒐(j1))f(𝒐(j1/2)))(f(𝒐(j))f(𝒐(j1/2)))\displaystyle=\left(f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j-1/2)})\right)-\left(f(\boldsymbol{o}^{(j)})-f(\boldsymbol{o}^{(j-1/2)})\right)
=Δo(j),i(j)f(𝒐(j1/2))+Δe(j),if(𝒐(j1/2))Δe(j),i(j)f(𝒐(j1/2))\displaystyle=\Delta_{o^{(j)},i^{(j)}}f\left(\boldsymbol{o}^{(j-1/2)}\right)+\Delta_{e^{(j)},i^{\prime}}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)})
Δo(j),if(𝒐(j1/2))\displaystyle\qquad-\Delta_{o^{(j)},i^{\prime}}f(\boldsymbol{o}^{(j-1/2)}) (by construction)
Δo(j),i(j)f(𝒐(j1/2))+Δe(j),if(𝒐(j1/2))\displaystyle\leq\Delta_{o^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)})+\Delta_{e^{(j)},i^{\prime}}f(\boldsymbol{o}^{(j-1/2)}) (monotonicity)
Δo(j),i(j)f(𝒔(j1))+Δe(j),if(𝒔(j1))\displaystyle\leq\Delta_{o^{(j)},i^{(j)}}f(\boldsymbol{s}^{(j-1)})+\Delta_{e^{(j)},i^{\prime}}f(\boldsymbol{s}^{(j-1)}) (𝒔(j1)𝒐(j1/2)\boldsymbol{s}^{(j-1)}\preceq\boldsymbol{o}^{(j-1/2)} and orthant submodularity)
Δo(j),i(j)f^(𝒔(j1))+Δe(j),if^(𝒔(j1))+4ε\displaystyle\leq\Delta_{o^{(j)},i^{(j)}}\hat{f}(\boldsymbol{s}^{(j-1)})+\Delta_{e^{(j)},i^{\prime}}\hat{f}(\boldsymbol{s}^{(j-1)})+4\varepsilon (38)
2Δe(j),i(j)f^(𝒔(j1))+4ε\displaystyle\leq 2\Delta_{e^{(j)},i^{(j)}}\hat{f}(\boldsymbol{s}^{(j-1)})+4\varepsilon (39)
=2(f^(𝒔(j))f^(𝒔(j1)))+4ε.\displaystyle=2(\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)}))+4\varepsilon. (40)

where the last inequality follows from greedy rule and that both pair (o(j),i(j))\left(o^{(j)},i^{(j)}\right) and (e(j),i)\left(e^{(j)},i^{\prime}\right) are available after we selected 𝒔(j)\boldsymbol{s}^{(j)}, which we verify as follows: since o(j)Si(j)(j)o^{(j)}\in S_{i^{(j)}}^{(j)}, o(j)o^{(j)} is still available; i(j)i^{(j)} is the type of the next greedy pair, thus available; e(j)e^{(j)} is the item of the next greedy pair, thus available; e(j)Si(j)e^{(j)}\in S_{i^{\prime}}^{(j)} indicates ii^{\prime} is still available.

Case 2: In this case, e(j)Si(j)e^{(j)}\notin S_{i^{\prime}}^{(j)} for any ii(j)i^{\prime}\neq i^{(j)}.

f(𝒐(j1))f(𝒐(j))\displaystyle f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)}) =Δo(j),i(j)f(𝒐(j1/2))Δe(j),i(j)f(𝒐(j1/2))\displaystyle=\Delta_{o^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)}) (by construction)
Δo(j),i(j)f(𝒐(j1/2))\displaystyle\leq\Delta_{o^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)}) (monotonicity)
Δo(j),i(j)f(𝒔(j1))\displaystyle\leq\Delta_{o^{(j)},i^{(j)}}f(\boldsymbol{s}^{(j-1)}) (𝒔(j1)𝒐(j1/2)\boldsymbol{s}^{(j-1)}\preceq\boldsymbol{o}^{(j-1/2)} and orthant submodularity)
Δo(j),i(j)f^(𝒔(j1))+2ε\displaystyle\leq\Delta_{o^{(j)},i^{(j)}}\hat{f}(\boldsymbol{s}^{(j-1)})+2\varepsilon (41)
Δe(j),i(j)f^(𝒔(j1))+2ε\displaystyle\leq\Delta_{e^{(j)},i^{(j)}}\hat{f}(\boldsymbol{s}^{(j-1)})+2\varepsilon (greedy rule and the pair (o(j),i(j))\left(o^{(j)},i^{(j)}\right) is available)
2(f^(𝒔(j))f^(𝒔(j1)))+4ε.\displaystyle\leq 2(\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)}))+4\varepsilon. (42)

Combining (40) and (42) we have in both cases,

f(𝒐(j1))f(𝒐(j))2(f^(𝒔(j))f^(𝒔(j1)))+4ε.\displaystyle f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})\leq 2(\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)}))+4\varepsilon. (43)

Finally,

f(𝒐)f(𝒔)\displaystyle f(\boldsymbol{o})-f(\boldsymbol{s}) =j=1B(f(𝒐(j1))f(𝒐(j)))\displaystyle=\sum_{j=1}^{B}\left(f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})\right) (44)
2j=1B(f^(𝒔(j))f^(𝒔(j1)))+4Bε\displaystyle\leq 2\sum_{j=1}^{B}\left(\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)})\right)+4B\varepsilon (using (43))
=2(f^(𝒔)f^(𝟎))+4Bε\displaystyle=2(\hat{f}(\boldsymbol{s})-\hat{f}(\mathbf{0}))+4B\varepsilon
2(f(𝒔)f(𝟎))+4(B+1)ε\displaystyle\leq 2(f(\boldsymbol{s})-f(\mathbf{0}))+4(B+1)\varepsilon
2f(𝒔)+4(B+1)ε,\displaystyle\leq 2f(\boldsymbol{s})+4(B+1)\varepsilon,

Rearranging, we get f(𝒔)13f(𝒐)43(B+1)εf(\boldsymbol{s})\geq\frac{1}{3}f(\boldsymbol{o})-\frac{4}{3}(B+1)\varepsilon.

Algorithm 5 kk-submodular maximization with a matroid constraint (Sakaue, 2017)
1:  Input: value oracle access for a kk-submodular function f:(k+1)V+f:(k+1)^{V}\rightarrow\mathbb{R}_{+}, a matroid (V,)(V,\mathcal{F}), given by the evaluation and independence oracle respectively.
2:  Output: a vector 𝒔\boldsymbol{s} satisfying supp(𝒔)\operatorname{supp}(\boldsymbol{s})\in\mathcal{B}.
3:  Initialize 𝒔𝟎\boldsymbol{s}\leftarrow\mathbf{0}.
4:  for j[M]j\in[M] do
5:     Construct E(𝒔)E(\boldsymbol{s}) using the independence oracle.
6:     (e,i)argmaxeE(𝒔),i[k]Δe,if(𝒔)(e,i)\leftarrow\operatorname*{arg\,max}_{e\in E(\boldsymbol{s}),i\in[k]}\Delta_{e,i}f(\boldsymbol{s}).
7:     𝒔(e)i\boldsymbol{s}(e)\leftarrow i.
8:  end for
9:  Return 𝒔\boldsymbol{s}.

A.5 Proof of Proposition 6.1

Denote \mathcal{B} as the set of all bases associated with a matroid (E,)(E,\mathcal{F}). We first state two important lemmas that will be used in the proof. We refer to Sakaue (2017) for the detailed proofs.

Lemma A.2.

(Sakaue, 2017) The size of any maximal optimal solution for maximizing a monotone kk-submodular function under a matroid constraint is MM.

Lemma A.3.

(Sakaue, 2017) Suppose AA\in\mathcal{F} and BB\in\mathcal{B} satisfy ABA\subsetneq B. Then, for any eAe\notin A satisfying A{e}A\cup\{e\}\in\mathcal{F}, there exists eBAe^{\prime}\in B\setminus A such that {B{e}}{e}\{B\setminus\{e^{\prime}\}\}\cup\{e\}\in\mathcal{B}.

Now we prove Proposition 6.1.

Proof.

Let (e(j),i(j))\left(e^{(j)},i^{(j)}\right) be the pair chosen greedily at the jj th iteration using the surrogate function f^\hat{f}, and 𝒔(j)\boldsymbol{s}^{(j)} be the solution after the jjth iteration. Let s(0)=𝟎s^{(0)}=\mathbf{0} and s=s(M)s=s^{(M)}, the output of Algorithm 5. Define S(j):=supp(𝒔(j))S^{(j)}:=\operatorname{supp}(\boldsymbol{s}^{(j)}) for each j[M]j\in[M]. Let 𝒐\boldsymbol{o} be a maximal optimal solution and let O(j):=supp(𝒐(j))O^{(j)}:=\operatorname{supp}(\boldsymbol{o}^{(j)}) for each j[M]j\in[M]. Now we will construct a sequence of vectors 𝒐(0)=𝒐,𝒐(1),,𝒐(M1),𝒐(M)=𝒔\boldsymbol{o}^{(0)}=\boldsymbol{o},\boldsymbol{o}^{(1)},\ldots,\boldsymbol{o}^{(M-1)},\boldsymbol{o}^{(M)}=\boldsymbol{s} satisfying the following:

𝒔(j)𝒐(j) if\displaystyle\boldsymbol{s}^{(j)}\prec\boldsymbol{o}^{(j)}\quad\text{ if } j=0,1,,M1, and 𝒔(j)=𝒐(j)=𝒔 if j=M.\displaystyle j=0,1,\ldots,M-1,\quad\text{ and }\quad\boldsymbol{s}^{(j)}=\boldsymbol{o}^{(j)}=\boldsymbol{s}\quad\text{ if }j=M. (45)
O(j) for\displaystyle O^{(j)}\in\mathcal{B}\quad\text{ for } j=0,1,,M.\displaystyle j=0,1,\ldots,M. (46)

More specifically, we see how to obtain 𝒐(j)\boldsymbol{o}^{(j)} from 𝒐(j1)\boldsymbol{o}^{(j-1)} satisfying (45) and (46). Note that 𝒔(0)=𝟎\boldsymbol{s}^{(0)}=\mathbf{0} and 𝒐(0)=𝒐\boldsymbol{o}^{(0)}=\boldsymbol{o} satisfy (45) and (46). We now describe how to obtain 𝒐(j)\boldsymbol{o}^{(j)} from 𝒐(j1)\boldsymbol{o}^{(j-1)}, assuming that 𝒐(j1)\boldsymbol{o}^{(j-1)} satisfies

𝒔(j1)𝒐(j1), and O(j1).\boldsymbol{s}^{(j-1)}\prec\boldsymbol{o}^{(j-1)}\text{, and }O^{(j-1)}\in\mathcal{B}.

Since 𝒔(j1)𝒐(j1)\boldsymbol{s}^{(j-1)}\prec\boldsymbol{o}^{(j-1)} means S(j1)O(j1)S^{(j-1)}\subsetneq O^{(j-1)}, and e(j)e^{(j)} is chosen to satisfy S(j1){e(j)}S^{(j-1)}\cup\left\{e^{(j)}\right\}\in\mathcal{F}, we see from Lemma A.3 that there exists eO(j1)S(j1)e^{\prime}\in O^{(j-1)}\setminus S^{(j-1)} satisfying {O(j1){e}}{e(j)}\{O^{(j-1)}\setminus\{e^{\prime}\}\}\cup\{e^{(j)}\}\in\mathcal{B}. We let o(j)=eo^{(j)}=e^{\prime} and define 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} as the vector obtained by assigning 0 to the o(j)o^{(j)} th element of 𝒐(j1)\boldsymbol{o}^{(j-1)}. We then define 𝒐(j)\boldsymbol{o}^{(j)} as the vector obtained from 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} by assigning type i(j)i^{(j)} to element e(j)e^{(j)}. The vector thus constructed, 𝒐(j)\boldsymbol{o}^{(j)}, satisfies

O(j)={O(j1){o(j)}}{e(j)}.O^{(j)}=\{O^{(j-1)}\setminus\{o^{(j)}\}\}\cup\{e^{(j)}\}\in\mathcal{B}.

Furthermore, since 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} satisfies

𝒔(j1)𝒐(j1/2),\boldsymbol{s}^{(j-1)}\preceq\boldsymbol{o}^{(j-1/2)},

we have the following property for 𝒐(j)\boldsymbol{o}^{(j)} :

𝒔(j)𝒐(j) if j=1,,M1, and 𝒔(j)=𝒐(j)=𝒔 if j=M,\boldsymbol{s}^{(j)}\prec\boldsymbol{o}^{(j)}\quad\text{ if }j=1,\ldots,M-1,\quad\text{ and }\quad\boldsymbol{s}^{(j)}=\boldsymbol{o}^{(j)}=\boldsymbol{s}\text{ if }j=M,

where the strictness of the inclusion for j[M1]j\in[M-1] can be easily confirmed from |S(j)|=j<M=|O(j)||S^{(j)}|=j<M=|O^{(j)}|. Thus, applying the above discussion for j=1,,Mj=1,\ldots,M iteratively, we see the obtained sequence of vectors 𝒐(0),𝒐(1),,𝒐(M)\boldsymbol{o}^{(0)},\boldsymbol{o}^{(1)},\ldots,\boldsymbol{o}^{(M)} satisfies (45) and (46). We now prove the following inequality for j[M]j\in[M] :

f^(𝒔(j))f^(𝒔(j1))f(𝒐(j1))f(𝒐(j))2ε.\displaystyle\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)})\geq f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})-2\varepsilon. (47)

From S(j1)O(j1)S^{(j-1)}\subsetneq O^{(j-1)} and o(j)O(j1)\S(j1)o^{(j)}\in O^{(j-1)}\backslash S^{(j-1)}, we get S(j1){o(j)}O(j1)S^{(j-1)}\cup\left\{o^{(j)}\right\}\subseteq O^{(j-1)}\in\mathcal{B} for each j[M]j\in[M]. Thus we obtain the following inclusion from (M2) for each j[M]j\in[M] :

S(j1){o(j)}.S^{(j-1)}\cup\{o^{(j)}\}\in\mathcal{F}.

Hence, from o(j)S(j1)o^{(j)}\notin S^{(j-1)}, we get o(j)E(𝒔(j1))o^{(j)}\in E\left(\boldsymbol{s}^{(j-1)}\right), where E(𝒔)E(\boldsymbol{s}) is constructed by the independence oracle and contains the available elements given current solution 𝒔\boldsymbol{s}. Therefore, for the pair (e(j),i(j))\left(e^{(j)},i^{(j)}\right), which is chosen greedily, we have

Δe(j),i(j)f^(𝒔(j1))Δo(j),𝒐(j1)(o(j))f^(𝒔(j1)).\displaystyle\Delta_{e^{(j)},i^{(j)}}\hat{f}(\boldsymbol{s}^{(j-1)})\geq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}\hat{f}(\boldsymbol{s}^{(j-1)}). (48)

Furthermore, since 𝒔(j1)𝒐(j1/2)\boldsymbol{s}^{(j-1)}\preceq\boldsymbol{o}^{(j-1/2)} holds, orthant submodularity implies

Δo(j),𝒐(j1)(o(j))f(𝒔(j1))Δo(j),𝒐(j1)(o(j))f(𝒐(j1/2)).\displaystyle\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{s}^{(j-1)})\geq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{o}^{(j-1/2)}). (49)

Using (48) and (49), we have:

f(𝒐(j1))f(𝒐(j))\displaystyle f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)}) =Δo(j),𝒐(j1)(o(j))f(𝒐(j1/2))Δe(j),i(j)f(𝒐(j1/2))\displaystyle=\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)}) (by construction)
Δo(j),𝒐(j1)(o(j))f(𝒐(j1/2))\displaystyle\leq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{o}^{(j-1/2)}) (monotonicity)
Δo(j),𝒐(j1)(o(j))f(𝒔(j1))\displaystyle\leq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{s}^{(j-1)}) (using (49))
Δo(j),𝒐(j1)(o(j))f^(𝒔(j1))+2ε\displaystyle\leq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}\hat{f}(\boldsymbol{s}^{(j-1)})+2\varepsilon
Δe(j),i(j)f^(𝒔(j1))+2ε\displaystyle\leq\Delta_{e^{(j)},i^{(j)}}\hat{f}(\boldsymbol{s}^{(j-1)})+2\varepsilon (using (48))
=f^(𝒔(j))f^(𝒔(j1))+2ε,\displaystyle=\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)})+2\varepsilon, (50)

and we get (47). Finally,

f(𝒐)f(𝒔)\displaystyle f(\boldsymbol{o})-f(\boldsymbol{s}) =j=1M(f(𝒐(j1))f(𝒐(j)))\displaystyle=\sum_{j=1}^{M}\left(f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})\right) (51)
j=1M(f^(𝒔(j))f^(𝒔(j1)))+2Mε\displaystyle\leq\sum_{j=1}^{M}\left(\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)})\right)+2M\varepsilon (using (47))
=(f^(𝒔)f^(𝟎))+2Mε\displaystyle=(\hat{f}(\boldsymbol{s})-\hat{f}(\mathbf{0}))+2M\varepsilon
(f(𝒔)f(𝟎))+2(M+1)ε\displaystyle\leq(f(\boldsymbol{s})-f(\mathbf{0}))+2(M+1)\varepsilon
f(𝒔)+2(M+1)ε,\displaystyle\leq f(\boldsymbol{s})+2(M+1)\varepsilon,

Rearranging, we get f(𝒔)12f(𝒐)(M+1)εf(\boldsymbol{s})\geq\frac{1}{2}f(\boldsymbol{o})-(M+1)\varepsilon.

A.6 Proof of Proposition 7.1

The following lemma guarantees that there exists an optimal solution with all items included (so we just need to decide on the types).

Lemma A.4.

(Sun et al., 2022) The size of any maximal optimal solution for maximizing a non-monotone kk-submodular function under a matroid constraint is still MM.

Now we prove Proposition 7.1.

Proof.

We construct the sequence 𝒐(j)\boldsymbol{o}^{(j)}, 𝒐(j1/2)\boldsymbol{o}^{(j-1/2)} the same way as in the proof for Proposition 6.1. By the same construction, we have that

𝒔(j)𝒐(j) if\displaystyle\boldsymbol{s}^{(j)}\prec\boldsymbol{o}^{(j)}\quad\text{ if } j=0,1,,M1, and 𝒔(j)=𝒐(j)=𝒔 if j=M.\displaystyle j=0,1,\ldots,M-1,\quad\text{ and }\quad\boldsymbol{s}^{(j)}=\boldsymbol{o}^{(j)}=\boldsymbol{s}\quad\text{ if }j=M. (52)
O(j) for\displaystyle O^{(j)}\in\mathcal{B}\quad\text{ for } j=0,1,,M.\displaystyle j=0,1,\ldots,M. (53)

and we still have (48) and (49). Since we do not have monotonicity anymore, we will exploit the pairwise submodularity property. Besides the chosen type i(j)i^{(j)} for the item e(j)e^{(j)} at each iteration of Algorithm 5, we consider another type h(j)i(j)h^{(j)}\neq i^{(j)}. From pariwise monotonicity we have

Δe(j),i(j)f(𝒐(j1/2))+Δe(j),h(j)f(𝒐(j1/2))0.\displaystyle\Delta_{e^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)})+\Delta_{e^{(j)},h^{(j)}}f(\boldsymbol{o}^{(j-1/2)})\geq 0. (54)

We perform similar calculation as in the proof for Proposition 6.1:

f^(𝒔(j))f^(𝒔(j1))\displaystyle\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)}) =Δe(j),i(j)f^(𝒔(j1))\displaystyle=\Delta_{e^{(j)},i^{(j)}}\hat{f}(\boldsymbol{s}^{(j-1)})
Δo(j),𝒐(j1)(o(j))f^(𝒔(j1))\displaystyle\geq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}\hat{f}(\boldsymbol{s}^{(j-1)}) (using (48))
Δo(j),𝒐(j1)(o(j))f(𝒔(j1))2ε\displaystyle\geq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{s}^{(j-1)})-2\varepsilon
Δo(j),𝒐(j1)(o(j))f(𝒐(j1/2))2ε\displaystyle\geq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{o}^{(j-1/2)})-2\varepsilon (using (49))
Δo(j),𝒐(j1)(o(j))f(𝒐(j1/2))Δe(j),i(j)f(𝒐(j1/2))Δe(j),h(j)f(𝒐(j1/2))2ε\displaystyle\geq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},h^{(j)}}f(\boldsymbol{o}^{(j-1/2)})-2\varepsilon (using (54))
Δo(j),𝒐(j1)(o(j))f(𝒐(j1/2))Δe(j),i(j)f(𝒐(j1/2))Δe(j),h(j)f(𝒔(j1))2ε\displaystyle\geq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},h^{(j)}}f(\boldsymbol{s}^{(j-1)})-2\varepsilon (𝒔(j1)𝒐(j1/2)\boldsymbol{s}^{(j-1)}\preceq\boldsymbol{o}^{(j-1/2)} as shown in previous section)
Δo(j),𝒐(j1)(o(j))f(𝒐(j1/2))Δe(j),i(j)f(𝒐(j1/2))Δe(j),i(j)f(𝒔(j1))2ε\displaystyle\geq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},i^{(j)}}f(\boldsymbol{s}^{(j-1)})-2\varepsilon (greedy rule)
Δo(j),𝒐(j1)(o(j))f(𝒐(j1/2))Δe(j),i(j)f(𝒐(j1/2))Δe(j),i(j)f^(𝒔(j1))4ε\displaystyle\geq\Delta_{o^{(j)},\boldsymbol{o}^{(j-1)}(o^{(j)})}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},i^{(j)}}f(\boldsymbol{o}^{(j-1/2)})-\Delta_{e^{(j)},i^{(j)}}\hat{f}(\boldsymbol{s}^{(j-1)})-4\varepsilon
=f(𝒐(j1))f(𝒐(j1/2))f(𝒐(j))+f(𝒐(j1/2))(f^(𝒔(j))f^(𝒔(j1)))4ε\displaystyle=f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j-1/2)})-f(\boldsymbol{o}^{(j)})+f(\boldsymbol{o}^{(j-1/2)})-(\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)}))-4\varepsilon
=f(𝒐(j1))f(𝒐(j))(f^(𝒔(j))f^(𝒔(j1)))4ε.\displaystyle=f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})-(\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)}))-4\varepsilon.

Adding both sides by f^(𝒔(j))f^(𝒔(j1))\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)}) we get

2(f^(𝒔(j))f^(𝒔(j1)))f(𝒐(j1))f(𝒐(j))4ε.\displaystyle 2(\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)}))\geq f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})-4\varepsilon. (55)

Finally,

f(𝒐)f(𝒔)\displaystyle f(\boldsymbol{o})-f(\boldsymbol{s}) =j=1M(f(𝒐(j1))f(𝒐(j)))\displaystyle=\sum_{j=1}^{M}\left(f(\boldsymbol{o}^{(j-1)})-f(\boldsymbol{o}^{(j)})\right) (56)
2j=1M(f^(𝒔(j))f^(𝒔(j1)))+4Bε\displaystyle\leq 2\sum_{j=1}^{M}\left(\hat{f}(\boldsymbol{s}^{(j)})-\hat{f}(\boldsymbol{s}^{(j-1)})\right)+4B\varepsilon (using (55))
=2(f^(𝒔)f^(𝟎))+4Mε\displaystyle=2(\hat{f}(\boldsymbol{s})-\hat{f}(\mathbf{0}))+4M\varepsilon
2(f(𝒔)f(𝟎))+4(M+1)ε\displaystyle\leq 2(f(\boldsymbol{s})-f(\mathbf{0}))+4(M+1)\varepsilon
2f(𝒔)+4(M+1)ε,\displaystyle\leq 2f(\boldsymbol{s})+4(M+1)\varepsilon,

Rearranging, we get f(𝒔)13f(𝒐)43(M+1)εf(\boldsymbol{s})\geq\frac{1}{3}f(\boldsymbol{o})-\frac{4}{3}(M+1)\varepsilon. ∎

Appendix B Missing Offline Algorithms

We present the pseudo-codes that is missing in the main sections. Algorithm 3 is adapted from Iwata et al. (2015) for the problem of maximizing an unconstrained monotone kk-submobular function; Algorithm 4 is proposed in Ohsaka & Yoshida (2015) for the problem of maximizing a monotone kk-submobular function under IS constraints; and Algorithm 5 is proposed in Sakaue (2017) for the problem of maximizing a monotone kk-submobular function under a matroid constraint. Later Sun et al. (2022) showed that Algorithm 5 can also handle non-monotone objective functions. One thing to be noticed is that in Line 4, Algorithm 5 requires the value of MM, the rank of the matroid. However, in practice, we need not calculate the value of MM beforehand. Instead, we continue the iteration while E(𝒔)E(\boldsymbol{s}) is nonempty, which we check in Line 5. We can confirm that this modification does not change the output as follows. As long as |supp(𝒔)|<M|\operatorname{supp}(\boldsymbol{s})|<M, exactly one element is added to supp(𝒔)\operatorname{supp}(\boldsymbol{s}) at each iteration due to (M3), and, if |supp(𝒔)|=M|\operatorname{supp}(\boldsymbol{s})|=M, the iteration stops since supp(𝒔)\operatorname{supp}(\boldsymbol{s}) is a maximal independent set.

Appendix C More Related Works

Offline kk-submodular function maximization:

kk-submodular functions were first introduced by Huber & Kolmogorov (2012) as a generalization of bisubmodular functions, which correspond to k=2k=2. For unconstrained kk-submodular maximization, Iwata et al. (2015) showed that even achieving an approximation ratio α(k+12k,1]\alpha\in(\frac{k+1}{2k},1] is NP-hard. If the objective is monotone, there exists a deterministic algorithm achieving 1/21/2 approximation (Ward & Živný, 2014) and a randomized algorithm achieving k2k1\frac{k}{2k-1} approximation guarantee (Iwata et al., 2015). When the objective is non-monotone, Ward & Živný (2014) achieved max{1/3,1/(1+a)}\max\{1/3,1/(1+a)\} approximation guarantee using 𝒪(kn)\mathcal{O}(kn) number of function evaluations, where a=max{1,(k1)/4}a=\max\{1,\sqrt{(k-1)/4}\}. It has been further improved to 1/21/2 (Iwata et al., 2015) and k2+12k2+1\frac{k^{2}+1}{2k^{2}+1} (Oshima, 2021).

For monotone size constraints, two types of constraints have been considered in the literature, namely total size (TS) constraints, where the number of items selected shares a common budget, and individual size (IS) constraints, where each of the kk types has a budget. Ohsaka & Yoshida (2015) analyzed the greedy algorithm and obtained 1/21/2 and 1/31/3 approximation guarantees for total size (TS) constraints and individual size (IS) constraints, respectively. Nie et al. (2023b) proposed threshold greedy algorithms for monotone TS and IS with improved query complexity.

For maximizing a monotone kk-submodular function under a matroid constraint, Sakaue (2017) showed the greedy algorithm can achieve 1/21/2 approximation, and Matsuoka & Ohsaka (2021) proposed an algorithm that achieves 11+c\frac{1}{1+c} approximation, where cc is the curvature. When the objective function is non-monotone, a 1/31/3 approximation algorithm is presented in Sun et al. (2022). When the curvature cc is known, Matsuoka & Ohsaka (2021) proposed an algorithm that achieves a 11+c\frac{1}{1+c}-approximation for matroid constraints and 11+2c\frac{1}{1+2c}-approximation for individual size constraints.

For other constraints, Tang et al. (2022); Chen et al. (2022) proposed algorithms inspired by Khuller et al. (1999); Sviridenko (2004) for knapsack constraints. Xiao et al. (2023) considered knapsack constraints under both monotone or non-monotone cases. Yu et al. (2023) considered intersection of knapsack and matroid constraints. Pham et al. (2021) proposed an algorithm for the streaming setting under knapsack constraints. While the term “streaming” often implies an online context, it is important to note the setup is distinct from CMAB problems. In streaming, an exact value oracle is available, the action space is restricted to (subsets of) the elements revealed so far, and the evaluation is after the stream complete. Here, the term “online” specifically refers to the arrival of data has some ordering.

Appendix D Applications

In this section, we give some real world appications that motivates our problem of interests.

Application 1 (Influence maximization with kk topics): Let G=(V,E)G=(V,E) be a social network with an edge probability pu,vip_{u,v}^{i} for each edge (u,v)E(u,v)\in E, representing the probability of uu influencing vv on the ii-th topic. Given a seed set (S1,,Sk)(S_{1},\cdots,S_{k}), the diffusion process of the rumor about the ii-th topic starts by activating vertices in SiS_{i}, and propagates independently from other topics. The goal is to maximize the total influence (number of users that is influenced by at least one topic). In the well-studied models of influence propagation — the independent cascades and the linear threshold models — the influence function is monotone and kk-submodular (Ohsaka & Yoshida, 2015). In an online version of the problem, the underlying graph structure is not known to the agent. For each time step, the agent selects a seed set, and observes a reward representing the number of influenced users.

Application 2 (Sensor placement): We are given kk types of sensors, each of which provides different measurements, and we have BiB_{i} sensors of type ii for each i[k]i\in[k]. The ground set VV is a set of possible locations for the sensors. The goal is to equip each location with at most one sensor in order to maximize the total information gained from the sensors. This problem can also be modeled as kk-submodular maximization with an individual size constraint (Ohsaka & Yoshida, 2015). In an online version of this problem, for each time step, we select locations to place the sensors, then we collect data to calculate entropy, and in the next time step we repeat the same process by re-allocating sensors to different locations.

Application 3 (Ad allocation): We have kk advertisers that are known in advance and nn ad impressions that arrive online one at a time. Each advertiser i[k]i\in[k] has a contract of BiB_{i} impressions. For each impression jj and each advertiser ii, there is a non-negative weight wjiw_{ji} that captures how much value advertiser ii accrues from being allocated impression jj. When impression jj arrives, the values {wji:i[k]}\left\{w_{ji}:i\in[k]\right\} are revealed, and the algorithm needs to allocate impression jj to at most one advertiser. Letting XiX_{i} denote the set of impressions allocated to advertiser ii, the total revenue is i=1kmax{jSiwji:SiXi,|Si|Bi}\sum_{i=1}^{k}\max\{\sum_{j\in S_{i}}w_{ji}:S_{i}\subseteq X_{i},\left|S_{i}\right|\leq B_{i}\}. This problem can be formulated as a special case of kk-submodular maximization with a partition matroid constraint (Ene & Nguyen, 2022).