This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Submodular Bandit Problem Under Multiple Constraints

Sho Takemori    Masahiro Sato    Takashi Sonoda    Janmajay Singh    Tomoko Ohkuma
(Fuji Xerox Co., Ltd.)
Abstract

The linear submodular bandit problem was proposed to simultaneously address diversified retrieval and online learning in a recommender system. If there is no uncertainty, this problem is equivalent to a submodular maximization problem under a cardinality constraint. However, in some situations, recommendation lists should satisfy additional constraints such as budget constraints, other than a cardinality constraint. Thus, motivated by diversified retrieval considering budget constraints, we introduce a submodular bandit problem under the intersection of ll knapsacks and a kk-system constraint. Here kk-system constraints form a very general class of constraints including cardinality constraints and the intersection of kk matroid constraints. To solve this problem, we propose a non-greedy algorithm that adaptively focuses on a standard or modified upper-confidence bound. We provide a high-probability upper bound of an approximation regret, where the approximation ratio matches that of a fast offline algorithm. Moreover, we perform experiments under various combinations of constraints using a synthetic and two real-world datasets and demonstrate that our proposed methods outperform the existing baselines.

1 INTRODUCTION

The multi-armed bandit (MAB) problem has been widely used for practical applications. Examples include interactive recommender systems, Internet advertising, portfolio selection, and clinical trials. In a typical MAB problem, the agent selects one arm in each round. However, in practice, it is more convenient to select more than one arm in each round. Such a problem is called a combinatorial bandit problem [Chen et al., 2013]. For example, in [Yue and Guestrin, 2011, Radlinski et al., 2008], they considered the problem where in each round, the agent proposes multiple news articles or web documents to a user.

When recommending multiple items to a user, agents should select well-diversified items to maximize coverage of the information the user finds interesting [Yue and Guestrin, 2011] or to reduce item similarity in the list [Ziegler et al., 2005]. Recommending redundant items leads to diminishing returns in terms of utility [Yu et al., 2016]. It is well-known that properties such as diversity or diminishing returns are well captured by submodular set functions [Krause and Golovin, 2014]. To simultaneously address diversified retrieval and online learning in a recommender system, Yue and Guestrin [2011] proposed a combinatorial bandit problem (or more specifically a semi-bandit problem), called the linear submodular bandit problem, where in each round a sequence rewards are generated by an unknown submodular function.

For a real-world application, recommendation lists should satisfy several constraints. We explain this by using a news article recommendation example. For a comfortable user experience while selecting news articles from a recommendation list, the length of the list should not be excessively long, which implies that the list should satisfy a cardinality constraint. Furthermore, a user may not wish to spend more than a certain amount of time by reading news articles. This can be modeled as a knapsack constraint. With only a knapsack constraint, a system can recommend a long list of short (or low cost) news articles. However, due to the space constraint of the web site, such a list cannot be displayed. Therefore, it is necessary to consider a submodular bandit problem under the intersection of the knapsack and cardinality constraints.

Yue and Guestrin [2011] introduced a submodular bandit problem under a cardinality constraint and proposed an algorithm called LSBGreedy. Later, Yu et al. [2016] considered a submodular bandit problem under a knapsack constraint and proposed two greedy algorithms called MCSGreedy and CGreedy. However, such existing algorithms fail to properly optimize the objective function under complex constraints. In fact, we theoretically and empirically show that such simple greedy algorithms can perform poorly.

Under a simple constraint such as a cardinality or a knapsack constraint, there is a simple rule to select elements. This rule is called the upper confidence bound (UCB) rule or the modified UCB rule if the constraint is a cardinality or a knapsack constraint, respectively [Yu et al., 2016]. For example, with the UCB rule, the algorithm selects the element with the largest UCB sequentially in each round. Considering that our problem is a generalization of both the problems, we should generalize both the rules.

In this study, we solve the problem under a more generalized constraint, i.e., the intersection of ll knapsacks and kk-system constraints. Here, the kk-system constraints form a very general class of constraints, including cardinality constraints and the intersection of kk matroid constraints. For example, when recommending news articles, we can restrict the number of news articles from each topic with a kk-system constraint. To solve the problem, we propose a non-greedy algorithm that adaptively focuses on the UCB and modified UCB rules. Since the submodular maximization problem is NP-hard, we theoretically evaluate our method by an α\alpha-approximation regret, where α(0,1)\alpha\in(0,1) is an approximation ratio. In this study, we provide an upper bound of the α\alpha-approximation regret in the case when α=1(1+ε)(k+2l+1)\alpha=\frac{1}{(1+\varepsilon)(k+2l+1)}, where ε\varepsilon is a parameter of the algorithm. We note that the approximation ratio matches that of an offline algorithm [Badanidiyuru and Vondrák, 2014]. To the best of our knowledge, no known offline algorithm achieves a better approximation ratio than α\alpha above and better computational complexity than the offline algorithm, simultaneously 111After we submitted this paper to the conference, Li and Shroff [2020] have updated their preprint. They proposed an offline submodular maximization algorithm and improved the approximation ratio of [Badanidiyuru and Vondrák, 2014] to 1/(k+74l+1)ε1/(k+\frac{7}{4}l+1)-\varepsilon. . More precisely, our contributions are stated as follows:

OUR CONTRIBUTIONS

  1. 1.

    We propose a submodular bandit problem with semi-bandit feedback under the intersection of ll knapsacks and kk-system constraints (Section 4). This is the first attempt to solve the submodular bandit problem under such complex constraints. The problem is new even when the kk-system constraint is a cardinality constraint.

  2. 2.

    We propose a novel algorithm called AFSM-UCB that Adaptively Focuses on a Standard or Modified Upper Confidence Bound (Section 5).

  3. 3.

    We provide a high-probability upper bound of an approximation regret for AFSM-UCB (Section 6). We prove that the α\alpha-approximation regret Regα(T)\mathrm{Reg}_{\alpha}\left(T\right) is given by O(mTln(mT/δ))O(\sqrt{mT}\ln(mT/\delta)) with probability in least 1δ1-\delta and the computational complexity in each round is given as O(m|𝒩|ln|𝒩|/ln(1+ε))O(m|\mathcal{N}|\ln|\mathcal{N}|/\ln(1+\varepsilon)), where α=1(1+ε)(k+2l+1)\alpha=\frac{1}{(1+\varepsilon)(k+2l+1)}, ε\varepsilon is a parameter of the algorithm, TT is the time horizon, mm is the cardinality of a maximal feasible solution, and 𝒩\mathcal{N} is the ground set (e.g., the set of all news articles in the news recommendation example). We note that no known offline fast222We refer to Section 2.1 for the meaning of “fast”. algorithm achieves a better approximation ratio than above333See footnote in page 1..

  4. 4.

    We empirically prove the effectiveness of our proposed method by comprehensively evaluating it on a synthetic and two real-world datasets. We show that our proposed method outperforms the existing greedy baselines such as LSBGreedy and CGreedy.

2 RELATED WORK

2.1 SUBMODULAR MAXIMIZATION

Although submodular maximization has been studied over four decades, we introduce only recent results relevant to our work. Badanidiyuru and Vondrák [2014] provided a maximization algorithm for a non-negative, monotone submodular function with ll knapsack constraints and a kk-system constraint that achieves 1(1+ε)(k+2l+1)\frac{1}{(1+\varepsilon)(k+2l+1)}-approximation solution. Based on this work and Gupta et al. [2010], Mirzasoleiman et al. [2016] proposed a maximization algorithm called FANTOM under the same constraint in the case when the objective function is not necessarily monotone. Our proposed method is inspired by these two offline algorithms. However, because of uncertainty due to semi-bandit feedback, we need a nontrivial modification. A key feature of our method and aforementioned two offline algorithms is that they filter out “bad” elements via a threshold. Such a threshold method is also used for other problem settings such as streaming submodular maximization under a cardinality constraint [Badanidiyuru et al., 2014]. Some algorithms [Sarpatwar et al., 2019, Chekuri et al., 2010, 2014] achieves better approximation ratios than that of [Badanidiyuru and Vondrák, 2014] under narrower classes of constraints (e.g., a matroid + ll knapsacks). However, these algorithms are not “fast” because their computational complexity is O(poly(|𝒩|))O(\mathrm{poly}(|\mathcal{N}|)) with a polynomial of high degree, while that of [Badanidiyuru and Vondrák, 2014] is O(|𝒩|ε2ln2|𝒩|ε)O(\frac{|\mathcal{N}|}{\varepsilon^{2}}\ln^{2}\frac{|\mathcal{N}|}{\varepsilon}). For example, the computational complexity of the algorithm provided in [Sarpatwar et al., 2019] is O~(|𝒩|6)\widetilde{O}(|\mathcal{N}|^{6}) when k=1k=1. We refer to [Sarpatwar et al., 2019, Mirzasoleiman et al., 2016] for further comparison with respect to an approximation ratio and computational complexity.

2.2 SUBMODULAR BANDIT PROBLEMS

Yue and Guestrin [2011] introduced the linear submodular bandit problem to solve a diversification problem in a retrieval system and proposed a greedy algorithm called LSBGreedy. Later, Yu et al. [2016] considered a variant of the problem, that is, the linear submodular bandit problem with a knapsack constraint and proposed two greedy algorithms called MCSGreedy and CGreedy. Chen et al. [2017] generalized the linear submodular bandit problem to an infinite dimensional case, i.e., in the case where the marginal gain of the score function belongs to a reproducing kernel Hilbert space (RKHS) and has a bounded norm in the space. Then, they proposed a greedy algorithm called SM-UCB. Recently, Hiranandani et al. [2019] studied a model combining linear submodular bandits with a cascading model [Craswell et al., 2008]. Strictly speaking, their objective function is not a submodular function. Table 1 shows a comparison with other submodular bandit problems with respect to constraints.

Table 1: Comparison of other submodular bandit algorithms with respect to constraints.
Methods Cardinality Knapsack kk-system
LSBGreedy
CGreedy
SM-UCB
Our method

3 DEFINITION

In this section, we provide definitions of terminology used in this paper. Throughout this paper, we fix a finite set 𝒩\mathcal{N} called a ground set that represents the set of the entire news articles in the news article recommendation example.

3.1 SUBMODULAR FUNCTION

In this subsection, we define submodular functions. We refer to [Krause and Golovin, 2014] for an introduction to this subject.

We denote by 2𝒩2^{\mathcal{N}} the set of subsets of 𝒩\mathcal{N}. For e𝒩e\in\mathcal{N} and S𝒩S\subseteq\mathcal{N}, we write S+e=S{e}S+e=S\cup\left\{e\right\}. Let f:2𝒩f:2^{\mathcal{N}}\rightarrow\mathbb{R} be a set function. We call ff a submodular function if ff satisfies Δf(e|A)Δf(e|B)\Delta f(e|A)\geq\Delta f(e|B) for any A,B2𝒩A,B\in 2^{\mathcal{N}} with ABA\subseteq B and for any e𝒩Be\in\mathcal{N}\setminus B. Here, Δf(e|A)\Delta f(e|A) is the marginal gain when ee is added to AA and defined as f(A+e)f(A)f(A+e)-f(A). We note that a linear combination of submodular functions with non-negative coefficients is also submodular. A submodular function ff on 2𝒩2^{\mathcal{N}} is called monotone if f(B)f(A)f(B)\geq f(A) for any A,B2𝒩A,B\in 2^{\mathcal{N}} with ABA\subseteq B. A set function ff on 2𝒩2^{\mathcal{N}} is called non-negative if f(S)0f(S)\geq 0 for any S𝒩S\subseteq\mathcal{N}. Although non-monotone submodular functions have important applications [Mirzasoleiman et al., 2016], we consider only non-negative, monotone submodular functions in this study as in the preceding work [Yue and Guestrin, 2011, Yu et al., 2016, Chen et al., 2017].

Many useful and interesting functions satisfy submodularity. Examples include coverage functions, probabilistic coverage functions [El-Arini et al., 2009], entropy and mutual information [Krause and Guestrin, 2005] under an assumption and ROUGE [Lin and Bilmes, 2011].

3.2 MATROID, kk-SYSTEM, AND KNAPSACK CONSTRAINTS

For succinctness, we omit formal definitions of the matroid and kk-system. Instead, we introduce examples of matroids and remark that the intersection of kk matroids is a kk-system. For definitions of these notions, we refer to [Calinescu et al., 2011].

First, we provide an important example of a matroid. Let 𝒩i𝒩\mathcal{N}_{i}\subseteq\mathcal{N} (i=1,,ni=1,\dots,n) be a partition of 𝒩\mathcal{N}, that is 𝒩\mathcal{N} is the disjoint union of these subsets. For 1in1\leq i\leq n, we fix a non-negative integer did_{i} and let 𝒫={S2𝒩|S𝒩i|di,i}\mathcal{P}=\left\{S\in 2^{\mathcal{N}}\mid|S\cap\mathcal{N}_{i}|\leq d_{i},\ \forall i\right\}. Then, the pair (𝒩,𝒫)(\mathcal{N},\mathcal{P}) is an example of a matroid and called a partition matroid. Let dd be a non-negative integer and put 𝒰={S2𝒩|S|d}\mathcal{U}=\left\{S\in 2^{\mathcal{N}}\mid|S|\leq d\right\}. Then (𝒩,𝒰)(\mathcal{N},\mathcal{U}) is a special case of partition matroids and called a uniform matroid. Let (𝒩,i)(\mathcal{N},\mathcal{M}_{i}) for 1ik1\leq i\leq k be kk matroids, where i2𝒩\mathcal{M}_{i}\subseteq 2^{\mathcal{N}}. The intersection of matroids (𝒩,i=1ki)(\mathcal{N},\cap_{i=1}^{k}\mathcal{M}_{i}) is not necessarily a matroid but a kk-system (or more specifically it is a kk-extendible system) [Calinescu et al., 2011, Mestre, 2006, 2015]. In particular, any matroid is a 11-system. For a kk-system (𝒩,)(\mathcal{N},\mathcal{I}) with 2𝒩\mathcal{I}\subseteq 2^{\mathcal{N}} and a subset S𝒩S\subseteq\mathcal{N}, we say that SS satisfies the kk-system constraint if and only if SS\in\mathcal{I}. Trivially, a uniform matroid constraint is equivalent to a cardinality constraint.

Next, we provide a definition of knapsack constraint. Let c:𝒩>0c:\mathcal{N}\rightarrow\mathbb{R}_{>0} be a function. For e𝒩e\in\mathcal{N}, we suppose c(e)c(e) represents the cost of ee. Let b>0b\in\mathbb{R}_{>0} be a budget and S𝒩S\subseteq\mathcal{N} a subset. We say that SS satisfies the knapsack constraint with the budget bb if c(S):=eSc(e)bc(S):=\sum_{e\in S}c(e)\leq b. Without loss of generality, it is sufficient to consider the unit budget case, i.e., b=1b=1.

4 PROBLEM FORMULATION

Throughout this paper, we consider the following intersection of ll knapsacks and kk-system constraints:

cj(S)1(1jl) and Sc_{j}(S)\leq 1\ (1\leq\forall j\leq l)\quad\text{ and }S\in\mathcal{I} (1)

Here for 1jl1\leq j\leq l, cj:𝒩>0c_{j}:\mathcal{N}\rightarrow\mathbb{R}_{>0} is a cost and (𝒩,)(\mathcal{N},\mathcal{I}) is a kk-system.

In this study, we consider the following sequential decision-making process for times steps t=1,,Tt=1,\dots,T.

(i) The algorithm selects a list St={et(1),,et(mt)}𝒩S_{t}=\left\{e^{(1)}_{t},\allowbreak\dots,\allowbreak e^{(m_{t})}_{t}\right\}\subseteq\mathcal{N} satisfying the constraints (1).

(ii) The algorithm receives noisy rewards yt(1),,yt(mt)y_{t}^{(1)},\dots,y_{t}^{(m_{t})} as follows:

yt(i)=Δf(et(i)St(1:i1))+εt(i), for i=1,,mt,y_{t}^{(i)}=\Delta f\left(e_{t}^{(i)}\mid S_{t}^{(1:i-1)}\right)+\varepsilon_{t}^{(i)},\text{ for }i=1,\dots,m_{t},

Here ff is a submodular function unknown to the algorithm, St(1:i1)={et(1),,et(i1)}S_{t}^{(1:i-1)}=\left\{e_{t}^{(1)},\allowbreak\dots,\allowbreak e_{t}^{(i-1)}\right\} and εt(i)\varepsilon_{t}^{(i)} is a noise. We regard St(1:i1)S_{t}^{(1:i-1)}, et(i1)e_{t}^{(i-1)} and εt(i)\varepsilon_{t}^{(i)} as random variables. The objective of the algorithm is to maximize the sum of rewards t=1Tf(St)\sum_{t=1}^{T}f(S_{t}).

Following [Yue and Guestrin, 2011], we explain this problem by using the news article recommendation example. In each round, the user scans the list of the recommended items St={et(1),,et(mt)}S_{t}=\left\{e_{t}^{(1)},\dots,e_{t}^{(m_{t})}\right\} one-by-one in top-down fashion, where mtm_{t} is the cardinality of StS_{t} at round tt. We assume that the marginal gain Δf(et(i)St(1:i1))\Delta f(e_{t}^{(i)}\mid S_{t}^{(1:i-1)}) represents the new information covered by et(i)e_{t}^{(i)} and not covered by St(1:i1)S_{t}^{(1:i-1)}. The noisy rewards yt(1),,yt(mt)y_{t}^{(1)},\dots,y_{t}^{(m_{t})} are binary random variables and the user likes et(i)e_{t}^{(i)} with probability Δf(et(i)St(1:i1))\Delta f(e_{t}^{(i)}\mid S_{t}^{(1:i-1)}).

4.1 ASSUMPTIONS ON THE SCORE FUNCTION ff

Following [Yue and Guestrin, 2011], we assume that there exist dd known submodular functions f1,,fdf_{1},\dots,f_{d} on 2𝒩2^{\mathcal{N}} that are linearly independent and the objective submodular function ff can be written as a linear combination f=i=1dwifif=\sum_{i=1}^{d}w_{i}f_{i}, where the coefficients w1,,wdw_{1},\dots,w_{d} are non-negative and unknown to the algorithm. We fix a parameter B>0B>0 and assume that i=1dwi2B\sqrt{\sum_{i=1}^{d}w_{i}^{2}}\leq B. We also assume that for some A>0A>0, the L2L^{2}-norm of vector [Δfi(eS)]i=1d[\Delta f_{i}(e\mid S)]_{i=1}^{d} is bounded above by A\sqrt{A} for any e𝒩e\in\mathcal{N} and S2𝒩S\in 2^{\mathcal{N}}.

We note that this can be generalized to an infinite dimensional case as in [Chen et al., 2017]. We discuss this setting more in detail in the supplemental material and provide a theoretical result in this setting.

4.2 ASSUMPTIONS ON NOISE STOCHASTIC PROCESS

We assume that there exists m>0m\in\mathbb{Z}_{>0} such that mtmm_{t}\leq m for all tt and consider the lexicographic order on the set {(t,i)t=1,, 1im}\left\{(t,i)\mid t=1,\allowbreak\dots,\allowbreak\ 1\leq i\leq m\right\}, i.e., (t,i)(t,i)(t,i)\leq(t^{\prime},i^{\prime}) if and only if either t<tt<t^{\prime} or t=tt=t^{\prime} and iii\leq i^{\prime}. Then, we can identify the set with the set of natural numbers (as ordered sets) and can regard {εt(i)}t,i\{\varepsilon_{t}^{(i)}\}_{t,i} as a sequence. We assume that the stochastic process {εt(i)}t,i\left\{\varepsilon_{t}^{(i)}\right\}_{t,i} is conditionally RR-sub-Gaussian for a fixed constant R0R\geq 0, i.e., 𝐄[exp(ξεt(i))t,i]exp(ξ2R22),\mathbf{E}\left[\exp\left(\xi\varepsilon_{t}^{(i)}\right)\mid\mathcal{F}_{t,i}\right]\leq\exp\left(\frac{\xi^{2}R^{2}}{2}\right), for any (t,i)(t,i) and any ξ\xi\in\mathbb{R}. Here, t,i\mathcal{F}_{t,i} is the σ\sigma-algebra generated by {Su(1:j)(u,j)<(t,i)}\left\{S_{u}^{(1:j)}\mid(u,j)<(t,i)\right\} and {εu(j)(u,j)<(t,i)}\left\{\varepsilon_{u}^{(j)}\mid(u,j)<(t,i)\right\}. This is a standard assumption on the noise sequence [Chowdhury and Gopalan, 2017, Abbasi-Yadkori et al., 2011]. For example, if {εt(i)}\{\varepsilon_{t}^{(i)}\} is a martingale difference sequence and |εt(i)|R|\varepsilon_{t}^{(i)}|\leq R or {εt(i)}\{\varepsilon_{t}^{(i)}\} is conditionally Gaussian with zero mean and variance R2R^{2}, then the condition is satisfied [Lattimore and Szepesvári, 2019].

4.3 APPROXIMATION REGRET

As usual in the combinatorial bandit problem, we evaluate bandit algorithms by a regret called α\alpha-approximation regret (or α\alpha-regret in short), where α(0,1)\alpha\in(0,1). The approximation regret is necessary for meaningful evaluation. Even if the submodular function ff is completely known, it has been proved that no algorithm can achieve the optimal solution by evaluating ff in polynomial time [Nemhauser and Wolsey, 1978].

We denote by OPTOPT the optimal solution, i.e., OPT=argmaxSf(S),OPT=\operatorname{argmax}_{S}f(S), where SS runs over 2𝒩2^{\mathcal{N}} satisfying the constraint (1). We define the α\alpha-regret as follows:

Regα(T)=t=1T{αf(OPT)f(St)}.\mathrm{Reg}_{\alpha}\left(T\right)=\sum_{t=1}^{T}\left\{\alpha f(OPT)-f(S_{t})\right\}.

This definition is slightly different from that given in [Yue and Guestrin, 2011] because our definition does not include noise as in [Chowdhury and Gopalan, 2017]. In either case, one can prove a similar upper bound. For the proof in the cardinality constraint case, we refer to Lemma 4 in the supplemental material of [Yue and Guestrin, 2011].

In this study, we take the same approximation ratio α=1(1+ε)(k+2l+1)\alpha=\frac{1}{(1+\varepsilon)(k+2l+1)} as that of a fast algorithm in the offline setting [Badanidiyuru and Vondrák, 2014, Theorem 6.1]. As mentioned in Section 2, there exist offline algorithms that achieve better approximation ratios than above, but they have high computational complexity. Later, we remark that our proposed method is also “fast”.

5 ALGORITHM

In this section, following [Yue and Guestrin, 2011, Yu et al., 2016], we first define a UCB score of the marginal gain Δf(eS)\Delta f(e\mid S) and introduce a modified UCB score. With a UCB score, one can balance the exploitation and exploration tradeoff with bandit feedback. Then, we propose a non-greedy algorithm (Algorithm 2) that adaptively focuses on the UCB score and modified UCB score.

5.1 UCB SCORES

For e𝒩e\in\mathcal{N} and S2𝒩S\in 2^{\mathcal{N}}, we define a column vector x(eS)x(e\mid S) by (Δfi(eS))i=1dd\left(\Delta f_{i}(e\mid S)\right)_{i=1}^{d}\in\mathbb{R}^{d} and put xt(i)=x(et(i)St(1:i1))x_{t}^{(i)}=x\left(e^{(i)}_{t}\mid S_{t}^{(1:i-1)}\right). Here, we use the same notation as in Section 4. We define bt,wtdb_{t},w_{t}\in\mathbb{R}^{d} and Mtd×dM_{t}\in\mathbb{R}^{d\times d} as follows:

bt\displaystyle b_{t} :=s=1ti=1msys(i)xs(i),\displaystyle:=\sum_{s=1}^{t}\sum_{i=1}^{m_{s}}y_{s}^{(i)}x_{s}^{(i)},
Mt\displaystyle M_{t} :=λI+s=1ti=1msxs(i)xs(i),\displaystyle:=\lambda I+\sum_{s=1}^{t}\sum_{i=1}^{m_{s}}x_{s}^{(i)}\otimes x_{s}^{(i)}, wt:=Mt1bt,\displaystyle w_{t}:=M_{t}^{-1}b_{t},

Here, λ>0\lambda>0 is a parameter of the model and for a column vector xdx\in\mathbb{R}^{d}, we denote by xxd×dx\otimes x\in\mathbb{R}^{d\times d} the Kronecker product of xx and xx.

For e𝒩e\in\mathcal{N} and S2𝒩S\in 2^{\mathcal{N}}, we define μ(eS):=wtx(eS)\mu(e\mid S):=w_{t}\cdot x(e\mid S) and σ(eS):=x(e|S)TMt1x(e|S).\sigma(e\mid S):=\sqrt{x(e|S)^{\mathrm{T}}\hskip 2.0ptM_{t}^{-1}\hskip 2.0ptx(e|S)}. Then, we define a UCB score of the marginal gain by

ucbt(eS)=μt1(eS)+βt1σt1(eS),\mathrm{ucb}_{t}(e\mid S)=\mu_{t-1}(e\mid S)+\beta_{t-1}\sigma_{t-1}(e\mid S),

and a modified UCB score by ucbt(eS)/c(e)\mathrm{ucb}_{t}(e\mid S)/c(e). Here,

βt:=B+Rlndet(λ1Mt)+2+2ln(1/δ),\beta_{t}:=B+R\sqrt{\ln\det\left(\lambda^{-1}M_{t}\right)+2+2\ln(1/\delta)},

and c(e):=j=1lcj(e)c(e):=\sum_{j=1}^{l}c_{j}(e). It is well-known that ucbt(eϕ,S)\mathrm{ucb}_{t}(e\mid\phi,S) is an upper confidence bound for Δfϕ(eS)\Delta f_{\phi}(e\mid S). More precisely, we have the following result.

Proposition 1.

We assume there exists m1m\in\mathbb{Z}_{\geq 1} such that mtmm_{t}\leq m for all 1tT1\ \leq t\leq T. We also assume that λL\lambda\geq L. Then, with probability at least 1δ1-\delta, the following inequality holds:

|μt1(eS)Δf(eS)|βt1σt1(eS),\left|\mu_{t-1}(e\mid S)-\Delta f(e\mid S)\right|\leq\beta_{t-1}\sigma_{t-1}(e\mid S),

for any t,S,t,S, and ee.

Proposition 1 follows from the proof of [Chowdhury and Gopalan, 2017, Theorem 2]. We note that this theorem is a more generalized result than the statement above (they do not assume that the objective function is linear but belongs to an RKHS). In the linear kernel case, an equivalent result to Proposition 1 was proved in [Abbasi-Yadkori et al., 2011].

We also define the UCB score for a list S={e(1),,e(m)}S\allowbreak=\left\{e^{(1)},\dots,e^{(m)}\right\} by ucbt(S)=μt1(S)+3βt1σt1(S).\mathrm{ucb}_{t}(S)\allowbreak=\mu_{t-1}(S)+3\beta_{t-1}\sigma_{t-1}(S). Here μt(S)\mu_{t}(S) and σt(S)\sigma_{t}(S) are defined as i=1mμt(e(i)S(1:i1))\sum_{i=1}^{m}\mu_{t}(e^{(i)}\mid S^{(1:i-1)}) and i=1mσt(e(i)S(1:i1)),\sum_{i=1}^{m}\sigma_{t}(e^{(i)}\mid S^{(1:i-1)}), respectively. The factor 33 in the definition of ucbt(S)\mathrm{ucb}_{t}(S) is due to a technical reason as clarified by the proof of Lemma 11 in the supplemental material.

5.2 AFSM-UCB

Input : Threshold ρ\rho, round tt
Output : A list SS satisfying the constraints (1)
Set S=S=\emptyset, i=1i=1
while True\mathrm{True} do
       𝒩S={e𝒩S+e satisfies the constraint (1)}\mathcal{N}_{S}=\left\{e\in\mathcal{N}\mid S+e\text{ satisfies the constraint \eqref{eq:the-constraints}}\right\}.
       𝒩S,ρ={e𝒩Sucbt(eS)/c(e)ρ anducbt(e)/c(e)ρ}\mathcal{N}_{S,\geq\rho}=\left\{e\in\mathcal{N}_{S}\mid\begin{subarray}{l}\mathrm{ucb}_{t}(e\mid S)/c(e)\geq\rho\text{ and}\\ \mathrm{ucb}_{t}(e\mid\emptyset)/c(e)\geq\rho\end{subarray}\right\}.
       if 𝒩S,ρ=\mathcal{N}_{S,\geq\rho}=\emptyset then
             break;
       ei=argmaxe𝒩S,ρucbt(eS)e_{i}=\operatorname{argmax}_{e\in\mathcal{N}_{S,\geq\rho}}\mathrm{ucb}_{t}(e\mid S).
       Add eie_{i} to SS. Set ii+1i\leftarrow i+1
      
end while
Return SS
Algorithm 1 GM-UCB (Sub-algorithm)
Input : Parameters B,R,λ,δ,ν,ν,εB,R,\lambda,\delta,\nu,\nu^{\prime},\varepsilon
Output : A list SS satisfying the constraints (1)
for t=1,,Tt=1,\dots,T do
       U=U=\emptyset, r=2k+2l+1r=\frac{2}{k+2l+1}, ρ=r(1+ε)1ν\rho=r(1+\varepsilon)^{-1}\nu
       while ρrν|𝒩|\rho\leq r\nu^{\prime}|\mathcal{N}| do
             S=Algorithm1(ρ,t)S=\mathrm{Algorithm1}(\rho,t)
             Add SS to UU
             Set ρ(1+ϵ)ρ\rho\leftarrow(1+\epsilon)\rho
            
       end while
      Select St=argmaxSUucbt(S)S_{t}=\operatorname{argmax}_{S\in U}\mathrm{ucb}_{t}(S)
       Receive rewards yt(1),,yt(mt)y_{t}^{(1)},\dots,y_{t}^{(m_{t})}
      
end for
Algorithm 2 AFSM-UCB (Main Algorithm)

In this subsection, we propose a UCB-type algorithm for our problem. We call our proposed method AFSM-UCB and its pseudo code is outlined in Algorithm 2. Algorithm 2 calls a sub-algorithm called GM-UCB (an algorithm that Greedily selects elements with Modified UCB scores larger than a threshold, outlined in Algorithm 1). Algorithm 1 takes a threshold ρ\rho as a parameter and returns a list of elements satisfying the constraint 1. Algorithm 1 selects elements greedily from the elements whose modified UCB scores ucbt(eS)/c(e)\mathrm{ucb}_{t}(e\mid S)/c(e) and ucbt(e)/c(e)\mathrm{ucb}_{t}(e\mid\emptyset)/c(e) are larger or equal to the threshold ρ\rho. If the threshold ρ\rho is small, then this algorithm is almost the same as a greedy algorithm, such as LSBGreedy [Yue and Guestrin, 2011]. If the threshold ρ\rho is large, then the elements with large modified UCB scores will be selected. Thus, the threshold ρ\rho controls the importance of the standard and modified scores. The main algorithm 2 calls Algorithm 1 repeatedly by changing the threshold ρ\rho and returns a list with the largest UCB score. We prove that there exists a good list among these candidates lists.

As remarked before, Algorithm 2 is inspired by submodular maximization algorithms in the the offline setting [Badanidiyuru and Vondrák, 2014, Mirzasoleiman et al., 2016]. However, we need a nontrivial modification since the diminishing return property does not hold for ucbt(eS)\mathrm{ucb}_{t}(e\mid S) unlike the marginal gain Δf(eS)\Delta f(e\mid S). We note that ucbt(eS)\mathrm{ucb}_{t}(e\mid S) can be large not only when the estimated value of Δf(eS)\Delta f(e\mid S) is large but also if the uncertainty in adding ee to SS is high. Therefore, we need additional filter conditions to ensure that ee is a “good” element. Natural candidates for the condition are that ucbt(eS(1:i))/c(e)ρ\mathrm{ucb}_{t}(e\mid S^{(1:i)})/c(e)\geq\rho for some indices ii. In Algorithm 2, we require ucbt(e)/c(e)ρ\mathrm{ucb}_{t}(e\mid\emptyset)/c(e)\geq\rho in addition to ucbt(eS)/c(e)ρ\mathrm{ucb}_{t}(e\mid S)/c(e)\geq\rho.

In the algorithm, we introduce parameters ν\nu and ν\nu^{\prime}. The parameter ν\nu (resp. ν\nu^{\prime}) is used for defining the initial (resp. terminal) value of the threshold ρ\rho. In the next section, for a theoretical guarantee, we assume that νmaxe𝒩f({e})ν.\nu\leq\max_{e\in\mathcal{N}}f(\left\{e\right\})\leq\nu^{\prime}. If the upper bound of the reward is known, then we can take ν\nu^{\prime} as the known upper bound. In practice, it is plausible that most users are interested in at least one item in the entire item set 𝒩\mathcal{N}, which implies maxe𝒩f({e})\max_{e\in\mathcal{N}}f(\left\{e\right\}) is not very small. In addition, the number of iterations in the while loop in Algorithm 2 is given by O(ln(ν|𝒩|/ν))O(\ln\left(\nu^{\prime}|\mathcal{N}|/\nu\right)). Therefore, taking a very small ν\nu does not increase the number of iterations as much.

5.3 COMPUTATIONAL COMPLEXITY

We discuss the computational complexity of Algorithm 2 and that of existing methods. We consider a greedy algorithm by applying LSBGreedy to our problem; i.e., we consider a greedy algorithm that selects the element with the largest UCB score until the constraint is satisfied. By abuse of terminology, we call this algorithm LSBGreedy. Similarly, when we apply CGreedy (resp. MCSGreedy) to our problem, we also call this algorithm CGreedy (resp. MCSGreedy). In each round, the expected number of times to compute ucbt(eS)\mathrm{ucb}_{t}(e\mid S) in Algorithm 2 is given by O(m|𝒩|ln(ν|𝒩|/ν)/ln(1+ϵ))O(m|\mathcal{N}|\ln(\nu^{\prime}|\mathcal{N}|/\nu)/\ln(1+\epsilon)), while that of LSBGreedy is given by O(m|𝒩|)O(m|\mathcal{N}|). The computational complexity of MCSGreedy and CGreedy is given as O(|𝒩|3)O(|\mathcal{N}|^{3}) and O(m|𝒩|)O(m|\mathcal{N}|) respectively. Therefore, ignoring unimportant parameters, our algorithms incur an additional factor ln|𝒩|/ln(1+ε)\ln|\mathcal{N}|/\ln(1+\varepsilon) compared to that of LSBGreedy and CGreedy.

6 MAIN RESULTS

The main challenge of this paper is to provide a strong theoretical result for AFSM-UCB. In this section, under the assumptions stated as in the previous section, we provide an upper bound for the approximation regret of AFSM-UCB and give a sketch of the proof. We also show that existing greedy methods incur linear approximation regret in the worst case for our problem.

6.1 STATEMENT OF THE MAIN RESULTS

Theorem 1.

Let the notation and assumptions be as previously mentioned. We also assume that λm\lambda\geq m. We let α=1(1+ε)(k+2l+1)\alpha=\frac{1}{(1+\varepsilon)\left(k+2l+1\right)}. Then, with probability at least 1δ1-\delta, the proposed algorithm achieves the following α\alpha-regret bound:

Regα(T)8AβT2mTlndet(λ1MT).\mathrm{Reg}_{\alpha}\left(T\right)\leq 8A\beta_{T}\sqrt{2mT\ln\det\left(\lambda^{-1}M_{T}\right)}.

In particular, ignoring A,B,RA,B,R, we have Regα(T)=O(dmTlnmTδ)\mathrm{Reg}_{\alpha}\left(T\right)=O(d\sqrt{mT}\ln\frac{mT}{\delta}) with probability at least 1δ1-\delta.

Remark 1.
  1. 1.

    In the published version of UAI 2020, the assumption “λm\lambda\geq m” was missing. However, we need this assumption as in the previous work [Yue and Guestrin, 2011, Lemma 5].

  2. 2.

    There is a tradeoff between the approximation ratio and computational complexity. As discussed in Section 5.3, the computational complexity of the algorithm is given as O(m|𝒩|ln(|𝒩|)/ln(1+ϵ))O(m|\mathcal{N}|\ln(|\mathcal{N}|)/\ln(1+\epsilon)) in each round, while the approximation of the algorithm is given as 1(1+ε)(k+2l+1)\frac{1}{(1+\varepsilon)\left(k+2l+1\right)}.

  3. 3.

    We assume the score function ff is a linear combination of known submodular functions. We can relax the assumption to the case when the function (e,S)Δf(e|S)(e,S)\rightarrow\Delta f(e|S) belongs to an RKHS and has a bounded norm in the space as in [Chen et al., 2017]. We discuss this setting more in detail and provide a generalized result in the supplemental material.

In the setting of [Yue and Guestrin, 2011, Yu et al., 2016], greedy methods have good theoretical properties. However, we show that for any α>0\alpha>0, these greedy methods incur linear α\alpha-regret in the worst case for our problem. We denote by Regα,MCS(T)\mathrm{Reg}_{\alpha,\mathrm{MCS}}(T) and Regα,LSB(T)\mathrm{Reg}_{\alpha,\mathrm{LSB}}(T) the α\alpha-regret of MCSGreedy and that of LSBGreedy, respectively. Then the following proposition holds.

Proposition 2.

For any α>0\alpha>0, there exists cost c1c_{1}, kk-system \mathcal{I}, a submodular function ff, T0>0T_{0}>0 and a constant C>0C>0 such that with probability at least 1δ1-\delta,

Regα,MCS(T)>CT,\mathrm{Reg}_{\alpha,\mathrm{MCS}}(T)>CT,

for any T>T0T>T_{0}. Moreover, the same statement holds for Regα,LSB(T)\mathrm{Reg}_{\alpha,\mathrm{LSB}}(T).

We provide the proof in the supplemental material.

6.2 SKETCH OF THE PROOF OF THEOREM 1

We provide a sketch of the proof of Theorem 1 and provide a detailed and generalized proof in the supplemental material. Throughout the proof, we fix the event \mathcal{F} on which the inequality in Proposition 1 holds.

We evaluate the solution StS_{t} by AFSM-UCB in each round tt. The following is a key result for our proof of Theorem 1.

Proposition 3.

Let C𝒩C\subseteq\mathcal{N} be any set satisfying the constraint (1). Let SS be a set returned by GM-UCB at time step tt. Then, on the event \mathcal{F}, we have

f(S)+2βt1σt1(S)min{ρ2,1k+1f(SC)lρk+1}.f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\min\left\{\frac{\rho}{2},\frac{1}{k+1}f(S\cup C)-\frac{l\rho}{k+1}\right\}.
sketch of proof.

This can be proved in a similar way to the proof of [Badanidiyuru and Vondrák, 2014, Theorem 6.1] or [Mirzasoleiman et al., 2016, Theorem 5.1]. However, because of uncertainty and lack of diminishing property of the UCB score, we need further analysis. We divide the proof into two cases.

Case One. This is the case when GM-UCB terminates because there exists an element ee such that ucbt(eS)ρc(e)\mathrm{ucb}_{t}(e\mid S)\geq\rho c(e) and ucbt(e)ρc(e)\mathrm{ucb}_{t}(e\mid\emptyset)\geq\rho c(e), but any element ee satisfying ucbt(eS),ucbt(e)ρc(e)\mathrm{ucb}_{t}(e\mid S),\mathrm{ucb}_{t}(e\mid\emptyset)\geq\rho c(e) does not satisfy the knapsack constraints, i.e., cj(S+e)>1c_{j}(S+e)>1 for some 1jl1\leq j\leq l. We fix an element ee^{\prime} satisfying ucbt(eS),ucbt(e)ρc(e)\mathrm{ucb}_{t}(e^{\prime}\mid S),\mathrm{ucb}_{t}(e^{\prime}\mid\emptyset)\geq\rho c(e^{\prime}). Because any element of SS has enough modified UCB score, by Proposition 1, we have f(S)+2βt1σt1(S)ρc(S).f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\rho c(S). By the definition of ee^{\prime}, we also have ucbt(e)ρc(e)\mathrm{ucb}_{t}(e^{\prime}\mid\emptyset)\geq\rho c(e^{\prime}). Because f(S)+2βt1σt1(S)ucbt(e)ρc(e)f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\mathrm{ucb}_{t}(e^{\prime}\mid\emptyset)\geq\rho c(e^{\prime}) and S+eS+e^{\prime} does not satisfy the knapsack constraint, we have f(S)+2βt1σt1(S)ρ2c(S+e)ρ/2.f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\frac{\rho}{2}c(S+e^{\prime})\geq\rho/2.

Case Two. This is the case when GM-UCB terminates because for any element ee satisfying ucbt(eS),ucbt(e)ρc(e)\mathrm{ucb}_{t}(e\mid S),\mathrm{ucb}_{t}(e\mid\emptyset)\geq\rho c(e), ee satisfies the knapsack constraints but S+eS+e does not satisfy the kk-system constraint. We note that this case includes the case when there does not exist an element ee satisfying ucbt(eS),ucbt(e)ρc(e)\mathrm{ucb}_{t}(e\mid S),\mathrm{ucb}_{t}(e\mid\emptyset)\geq\rho c(e).

We define a set C<ρC_{<\rho} as

{eCi such that ucbt(eS(1:i))<ρc(e)},\displaystyle\left\{e\in C\mid\exists i\text{ such that }\mathrm{ucb}_{t}(e\mid S^{(1:i)})<\rho c(e)\right\},

and Cρ=CC<ρC_{\geq\rho}=C\setminus C_{<\rho}. Let eC<ρe\in C_{<\rho}. Then on the event \mathcal{F}, by Proposition 1 and submodularity, we have

Δf(C<ρS)eC<ρΔf(eS)eC<ρρc(e)lρ.\Delta f(C_{<\rho}\mid S)\leq\sum_{e\in C_{<\rho}}\Delta f(e\mid S)\leq\sum_{e\in C_{<\rho}}\rho c(e)\leq l\rho. (2)

Next, we consider CρC_{\geq\rho}. Running the greedy algorithm (with respect to the UCB score) on SCρS\cup C_{\geq\rho} under only the kk-system constraint, we obtain SS by the assumption of this case. Then, it can be proved that f(S)+2βt1σt1(S)1k+1f(SCρ).f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\frac{1}{k+1}f(S\cup C_{\geq\rho}). We note that this is a variant of the result proved in [Calinescu et al., 2011, Appendix B]. By this inequality, inequality (2), and submodularity, we can derive the desired result. ∎

Using Proposition 3, we can bound the approximation regret above by the sum of uncertainty βt1σt1(St)\beta_{t-1}\sigma_{t-1}(S_{t}). Because the algorithm selects StS_{t} and obtain feedbacks for StS_{t}, the sum of uncertainty can be bounded above by a sub-linear function of TT.

7 EXPERIMENTAL ANALYSIS

Refer to caption
Figure 1: Cumulative average rewards on the synthetic news article recommendation dataset
Refer to caption
Figure 2: Cumulative average rewards on the MovieLens dataset
Refer to caption
Figure 3: Cumulative average rewards on the Million Song Dataset

In this section, we empirically evaluate our methods by a synthetic dataset that simulates an environment for news article recommendation and two real-world datasets (MovieLens100K [Grouplens, 1998] and the Million Song Dataset [Bertin-Mahieux et al., 2011]).

We compare our proposed algorithm to the following baselines:

  • RANDOM. In each round, this algorithm selects elements uniform randomly until no element satisfies the constraints.

  • LSBGreedy. This was proposed in [Yue and Guestrin, 2011] to solve the submodular bandit problem under a cardinality constraint. In the linear kernel case, SM-UCB [Chen et al., 2017] is equivalent to LSBGreedy.

  • CGreedy. This is an algorithm for a submodular bandit problem under a knapsack constraint and was proposed in [Yu et al., 2016]. They also proposed an algorithm called MCSGreedy. However because MCSGreedy is computationally expensive (in each round it calls functions f1,,fdf_{1},\dots,f_{d} for O(|𝒩|3)O(|\mathcal{N}|^{3}) times) and their experimental results show that both algorithms have a similar empirical performance, we do not add MCSGreedy to the baselines.

In Proposition 2, we showed that these greedy algorithms incur linear approximation regret in the worst case. However, even without theoretical guarantee, it is empirically known that a greedy algorithm achieve a good experimental performance. In this section, we demonstrate that our algorithm outperforms these greedy algorithms under various combinations of constraints. As a special case, such constraints include the case when there is a sufficiently large budget for knapsack constraints and the case when the kk-system constraint is sufficiently mild. The greedy algorithms are algorithms for such cases. We also show that our proposed method performs no worse than the baselines even in these cases.

As in the preceding work [Yue and Guestrin, 2011], we assume the score function ff is a linear combination of known probabilistic coverage functions. We assume there exists a set of topics (or genres) 𝒢\mathcal{G} with |𝒢|=d|\mathcal{G}|=d and for each item e𝒩e\in\mathcal{N}, there is a feature vector x(e):=(Pg(e))g𝒢dx(e):=(P_{g}(e))_{g\in\mathcal{G}}\in\mathbb{R}^{d} that represents the information coverage on different genres. For each genre gg, we define the probabilistic coverage function fg(S)f_{g}(S) by 1eS(1Pg(e))1-\prod_{e\in S}(1-P_{g}(e)) and we assume f=iwifif=\sum_{i}w_{i}f_{i} with unknown linear coefficients wiw_{i}. The vector w:=[w1,,wd]w:=[w_{1},\dots,w_{d}] represents user preference on genres. We assume that the noisy rewards yt(i)y_{t}^{(i)} are sampled by yt(i)Ber(Δf(et(i)St(1:i1)))y_{t}^{(i)}\sim\mathrm{Ber}\left(\Delta f(e_{t}^{(i)}\mid S_{t}^{(1:i-1)})\right). Below, we define these feature vectors x(e)x(e), ww, and constraints explicitly. We note that in the experiments, we use an un-normalized knapsack constraint c(S)bc(S)\leq b. In the following experiments, using 100 users (100 vectors ww), we compute cumulative average rewards for each algorithm. When taking the average, we repeated this experiment 10 times for each user.

7.1 NEWS ARTICLE RECOMMENDATION

In this synthetic dataset, we assume d=15d=15 and |𝒩|=1000|\mathcal{N}|=1000. We define x(e)x(e) and costs for a knapsack constraint in a similar manner in [Yu et al., 2016]. We sample each entry of x(e)x(e) from two types of uniform distributions. We assume that for each item ee, the number of genres that have high information coverage is limited to two. More precisely, we randomly select two indices of x(e)x(e) and sample entries from U(0.5,0.8)U(0.5,0.8) and sample other entries from U(0.0,0.01)U(0.0,0.01). We generate 100 user preference vectors ww in a similar way to x(e)x(e). We also sample the costs of items uniform randomly from U(0.0,1.0)U(0.0,1.0). In this dataset, we consider the intersection of a cardinality constraint and a knapsack constraint. The result is shown in Figure 1.

7.2 MOVIE RECOMMENDATION

We perform a similar experiment in [Mirzasoleiman et al., 2016] but with a semi-bandit feedback. In MovieLens100K, there are 943 users and 1682 movies. We take 𝒩\mathcal{N} as the set of 1682 movies in the dataset. There are d=18d=18 genres in this dataset. First, we fill the ratings for all the user-item pairs using matrix factorization [Koren et al., 2009] and we normalized the ratings rr so that r[0,1]r\in[0,1]. For each movie e𝒩e\in\mathcal{N}, we denote by re[0,1]r_{e}\in[0,1] the mean of the ratings of the movie for all users. We define P(ge)=re/|𝒢e|P(g\mid e)=r_{e}/|\mathcal{G}_{e}| if g𝒢eg\in\mathcal{G}_{e}, otherwise we define P(ge)=0P(g\mid e)=0. We normalize P(ge)P(g\mid e) as previously mentioned, because if wi=1w_{i}=1 for all ii, then we have P({e})=reP(\{e\})=r_{e}.

We define a similar knapsack, cardinality, and matroid constraints to those of [Mirzasoleiman et al., 2016]. For e𝒩e\in\mathcal{N}, the cost c(e)c(e) is defined as c(e)=FBeta(10,2)(re)c(e)=F_{\mathrm{Beta}(10,2)}(r_{e}), where FBeta(10,2)F_{\mathrm{Beta}(10,2)} is the cumulative distribution function of the Beta(10,2)\mathrm{Beta}(10,2). For a budget b>0b\in\mathbb{R}_{>0}, we consider a knapsack constraint c(S)bc(S)\leq b. The beta distribution lets us differentiate the highly rated movies from those with lower ratings [Mirzasoleiman et al., 2016]. We generate 100 user preference vectors ww in a similar way to the news article recommendation example. In this dataset, we consider the following constraints on genres in addition to the knapsack c(S)bc(S)\leq b and cardinality |S|m|S|\leq m constraints, There are kk genres in MovieLens100K, where k=d=18k=d=18. For each genre gg, we fix a non-negative integer aa and consider the constraint |{eSe has genre g}|a|\left\{e\in S\mid e\text{ has genre }g\right\}|\leq a for S𝒩S\subseteq\mathcal{N}. This can be regarded as a partition matroid constraint. Therefore, the intersection of the constraints for all genres is a kk-system constraint. One can prove that the intersection of this kk-system constraint and a cardinality constraint is also a kk-system constraint. The results are displayed in Figure 2 in the case of the matroid limit a=3a=3.

7.3 MUSIC RECOMMENDATION

From the Million Song Dataset, we select 1000 most popular songs and 30 most popular genres. Thus, we have |𝒩|=1000|\mathcal{N}|=1000 and d=30d=30. For active 100 users, we compute Pg(e)P_{g}(e) and user preference vector ww in almost the same way as w¯(e,g)\overline{w}(e,g) and θ\theta^{*} in [Hiranandani et al., 2019] respectively. They assume that a user likes a song ee if the user listened to the song at least five times, however, we assume that a user likes the song if the user listened to the song at least two times. We consider the intersection of a cardinality and a knapsack constraint c(S)bc(S)\leq b. We define a cost cc for the knapsack constraint by the length (in seconds) of the song in the dataset. The costs represent the length of time spent by users before they decide to listen to the song and we assume that it is proportional to the length of the song 444We can also assume that users listen to the song and give feedbacks later.. The results are displayed in Figure 3. We do not show the performance of RANDOM in the figure since it achieves only very low rewards.

7.4 RESULTS

In Figures 1a, 2a, 3a, we plot the cumulative average rewards for each algorithm up to time step T=100T=100. In Figures 1b, 2b, and, 3b (resp. 1c, 2c, and, 3c), we show the cumulative average rewards at the final round by changing the budget bb (resp. by changing the cardinality limit mm) and fixing the cardinality limit mm (resp. fixing the budget bb). These results shows that overall our proposed method outperforms the baselines. We note that Figure 3 shows different tendency as compared to other datasets since popular items in the Million Song Dataset have high information coverage for multiple genres and about 47 %\% of the items have low information coverage (less than 0.01) for all genres. Figures 1b, 2b, and 3b also show the results for the case when the budget is sufficiently large. This is the case when LSBGreedy performs well and our experimental results show that even in this case, our method have comparable performance to greedy algorithms. Moreover, Figures 1c, 2c, and 3c also show the results in the case when the cardinality constraints are sufficiently mild. In this case, CGreedy performs well since the constraints are almost same as a knapsack constraint. The experimental results show that our method tends to have better performance than that of CGreedy even in this case.

8 CONCLUSIONS

In this study, motivated by diversified retrieval considering cost of items, we introduced the submodular bandit problem under the intersection of a kk-system and knapsack constraints. Then, we proposed a non-greedy algorithm to solve the problem and provide a strong theoretical guarantee. We demonstrated our proposed method outperforms the greedy baselines using synthetic and two real-world datasets.

A possible generalization of this work is a generalization to the full bandit setting. In this setting, a leaner observes only a value f(St)+ϵf(S_{t})+\epsilon in each round. Since it needs much work to derive a theoretical guarantee, we leave this setting for future work.

APPENDIX

In this appendix, we generalize the reward model considered in the main article to the kernelized setting as in [Chen et al., 2017]. We also provide parameters used in the experiments.

9 PROBLEM FORMULATION UNDER A GENERALIZED REWARD MODEL

In this appendix, we consider the same reward model as in [Chen et al., 2017], but subject to the intersection of knapsacks and kk-system constraint as in the main article. SM-UCB [Chen et al., 2017] is based on CGP-UCB [Krause and Ong, 2011] or GP-UCB [Srinivas et al., 2010]. However, recently, [Chowdhury and Gopalan, 2017] improved the assumption and the regret analysis of GP-UCB [Srinivas et al., 2010]. We follow setting of [Chowdhury and Gopalan, 2017].

Let Φ\Phi be a compact subset of some Euclidean space, which represents the space of contexts. We consider the following sequential decision making process for times steps t=1,,Tt=1,\dots,T.

  1. 1.

    The algorithm observes context ϕtΦ\phi_{t}\in\Phi and selects a list St={et(1),,et(mt)}𝒩S_{t}=\left\{e^{(1)}_{t},\dots,e^{(m_{t})}_{t}\right\}\subseteq\mathcal{N} satisfying the constraints.

  2. 2.

    The algorithm receives noisy rewards yt(1),,yt(mt)y_{t}^{(1)},\dots,y_{t}^{(m_{t})} as follows:

    yt(i)=Δfϕt(et(i)St(1:i1))+εt(i),y_{t}^{(i)}=\Delta f_{\phi_{t}}\left(e_{t}^{(i)}\mid S_{t}^{(1:i-1)}\right)+\varepsilon_{t}^{(i)},

    for i=1,,mti=1,\dots,m_{t}. Here fϕtf_{\phi_{t}} is a non-negative, monotone submodular function unknown to the algorithm, St(1:i1)={et(1),,et(i1)}S_{t}^{(1:i-1)}=\left\{e_{t}^{(1)},\dots,e_{t}^{(i-1)}\right\} and εt(i)\varepsilon_{t}^{(i)} is a noise.

Here, we regard ϕt,St(1:i1)\phi_{t},S_{t}^{(1:i-1)}, et(i1)e_{t}^{(i-1)} and εt(i)\varepsilon_{t}^{(i)} as random variables.

9.1 ASSUMPTIONS REGARDING THE SCORE FUNCTION fϕf_{\phi}

The linear model considered in the main article can be generalized to an infinite dimensional case [Chen et al., 2017]. We let D:=Φ×2𝒩×𝒩D:=\Phi\times 2^{\mathcal{N}}\times\mathcal{N} and define χ:D\chi:D\rightarrow\mathbb{R} by χ((ϕ,S,e))=Δfϕ(eS)\chi((\phi,S,e))=\Delta f_{\phi}(e\mid S). We assume that there exists an RKHS (reproducing kernel Hilbert space) \mathcal{H} on DD with a positive definite (or linear) kernel κ\kappa and χ\chi belongs to \mathcal{H} and the norm |χ||\chi|_{\mathcal{H}} is bounded by B>0B>0. We assume that κ\kappa is continuos on D×DD\times D. We also assume that κ(x,x)1\kappa(x,x)\leq 1 for any xDx\in D. If κ(x,x)c\kappa(x,x)\leq c, then our α\alpha-regret would increase by a factor of c\sqrt{c}.

9.2 ASSUMPTIONS REGARDING NOISE STOCHASTIC PROCESS

As for noises, we consider the same assumption as in the main article.

10 DEFINITION OF UCB SCORES

In this section, following [Chen et al., 2017, Chowdhury and Gopalan, 2017], we generalize UCB scores defined in the main article.

We let xt(i)=(ϕt,St(1:i1),et(i))Dx_{t}^{(i)}=(\phi_{t},\allowbreak S_{t}^{(1:i-1)},\allowbreak e_{t}^{(i)})\in D and 𝐱(1:t)=(x1(1),,x1(m1),,xt(1),xt(mt))DM\mathbf{x}_{(1:t)}=(x_{1}^{(1)},\allowbreak\dots,\allowbreak x_{1}^{(m_{1})},\allowbreak\dots,\allowbreak x_{t}^{(1)},\allowbreak\dots x_{t}^{(m_{t})})\in D^{M}, where M=s=1tmsM=\sum_{s=1}^{t}m_{s}. We also define 𝐲(1:t)M\mathbf{y}_{(1:t)}\in\mathbb{R}^{M} as (y1(1),,y1(m1),,yt(1),yt(mt))(y_{1}^{(1)},\allowbreak\dots,\allowbreak y_{1}^{(m_{1})},\allowbreak\dots,\allowbreak y_{t}^{(1)},\allowbreak\dots y_{t}^{(m_{t})}). For a sequence ξ=(ξ1,,ξs)Ds\xi=(\xi_{1},\dots,\xi_{s})\in D^{s}, we define K(ξ)=(κ(ξi,ξj))1i,jss×sK(\xi)=\left(\kappa(\xi_{i},\xi_{j})\right)_{1\leq i,j\leq s}\in\mathbb{R}^{s\times s} and κ(ξ,x)=[κ(ξ1,x),,κ(ξs,x)]Ts\kappa(\xi,x)=[\kappa(\xi_{1},x),\dots,\kappa(\xi_{s},x)]^{\mathrm{T}}\in\mathbb{R}^{s}. We also let κt(x)=κ(𝐱(1:t),x)M\kappa_{t}(x)=\kappa(\mathbf{x}_{(1:t)},x)\in\mathbb{R}^{M} and Kt=K(𝐱(1:t))M×MK_{t}=K(\mathbf{x}_{(1:t)})\in\mathbb{R}^{M\times M}. Then, we define μt(x)\mu_{t}(x)\in\mathbb{R} and σt2(x)>0\sigma_{t}^{2}(x)\in\mathbb{R}_{>0} as follows:

μt(x)\displaystyle\mu_{t}(x) =κt(x)T(Kt+λI)1𝐲1:t,\displaystyle=\kappa_{t}(x)^{\mathrm{T}}\left(K_{t}+\lambda I\right)^{-1}\mathbf{y}_{1:t},
κt(x,x)\displaystyle\kappa_{t}(x,x^{\prime}) =κ(x,x)κt(x)T(Kt+λI)1κt(x),\displaystyle=\kappa(x,x^{\prime})-\kappa_{t}(x)^{\mathrm{T}}\left(K_{t}+\lambda I\right)^{-1}\kappa_{t}(x^{\prime}),
σt2(x)\displaystyle\sigma_{t}^{2}(x) =κt(x,x).\displaystyle=\kappa_{t}(x,x).

Here, λ>0\lambda\in\mathbb{R}_{>0} is a parameter of the model. If x=(ϕ,S,e)x=(\phi,S,e), we also write μt(eϕ,S)=μt(x)\mu_{t}(e\mid\phi,S)=\mu_{t}(x) and σt(eϕ,S)=σt(x)\sigma_{t}(e\mid\phi,S)=\sigma_{t}(x).

We define a UCB score of the marginal gain by

ucbt(eϕ,S)=μt1(eϕ,S)+βt1σt1(eϕ,S),\mathrm{ucb}_{t}(e\mid\phi,S)=\mu_{t-1}(e\mid\phi,S)+\beta_{t-1}\sigma_{t-1}(e\mid\phi,S),

and a modified UCB score by ucbt(eϕ,S)/c(e)\mathrm{ucb}_{t}(e\mid\phi,S)/c(e). Here, βt\beta_{t} is defined as B+R2(γMt+1+ln(1/δ))B+R\sqrt{2\left(\gamma_{M_{t}}+1+\ln(1/\delta)\right)} and Mt=s=1tmsM_{t}=\sum_{s=1}^{t}m_{s}. Here γs\gamma_{s} is the maximum information gain [Srinivas et al., 2010, Chowdhury and Gopalan, 2017] after observing ss rewards. We refer to [Chowdhury and Gopalan, 2017] for the definition.

The following proposition is a generalization of Proposition 1 and is a direct consequence of (the proof of) [Chowdhury and Gopalan, 2017, Theorem 2].

Proposition 4.

We assume there exists m1m\in\mathbb{Z}_{\geq 1} such that mtmm_{t}\leq m for all 1tT1\ \leq t\leq T. Then, with probability at least 1δ1-\delta, the following inequality holds:

|μt1(eϕ,S)Δfϕ(eS)|βt1σt1(eϕ,S),\left|\mu_{t-1}(e\mid\phi,S)-\Delta f_{\phi}(e\mid S)\right|\leq\beta_{t-1}\sigma_{t-1}(e\mid\phi,S),

for any t,ϕ,S,t,\phi,S, and ee.

11 STATEMENT OF THE MAIN THEOREM

With generalized UCB scores, we consider the same algorithm in the main article. Then, we provide a statement for the generalized version of Theorem 1.

Theorem 2.

Let the notation and assumptions be as previously mentioned. We also assume that λm\lambda\geq m. We let α=1(1+ε)(k+2l+1)\alpha=\frac{1}{(1+\varepsilon)\left(k+2l+1\right)}, and define α\alpha-regret as

Regα(T)=t=1Tαfϕt(OPTt)fϕt(St),\mathrm{Reg}_{\alpha}\left(T\right)=\sum_{t=1}^{T}\alpha f_{\phi_{t}}(OPT_{t})-f_{\phi_{t}}(S_{t}),

where OPTtOPT_{t} is a feasible optimal solution at round tt. Then, with probability at least 1δ1-\delta, the proposed algorithm achieves the following α\alpha-regret bound:

Regα(T)8βTmTγmT.\mathrm{Reg}_{\alpha}\left(T\right)\leq 8\beta_{T}\sqrt{mT\gamma_{mT}}.

In particular, with at least probability 1δ1-\delta, the α\alpha-regret Regα(T)\mathrm{Reg}_{\alpha}\left(T\right) is given as

O(BTγT+RTγT(γT+1+ln(1/δ))),O\left(B\sqrt{T^{\prime}\gamma_{T^{\prime}}}+R\sqrt{T^{\prime}\gamma_{T^{\prime}}\left(\gamma_{T^{\prime}}+1+\ln(1/\delta)\right)}\right),

where T=mTT^{\prime}=mT.

Remark 2.
  1. 1.

    The maximum information gain γT\gamma_{T} is O(dlnT)O(d\ln T) and O((lnT)d+1)O((\ln T)^{d+1}) if the kernel is a dd-dimensional linear and Squared Exponential kernel, respectively [Srinivas et al., 2010]. They also showed that a similar result for the Matèrn kernel. Thus, if the kernel is a dd-dimensional kernel, up to a polylogarithmic factor, we obtain Theorem 1 in the main article as a corollary.

  2. 2.

    In the main article, we assume that the norm of vector [Δfi(eS)]i=1d[\Delta f_{i}(e\mid S)]_{i=1}^{d} is bounded above by AA. Moreover, the factor AA appears in the regret bound in Theorem 1. However, since we normalize the kernel so that κ(x,x)1\kappa(x,x)\leq 1, a factor corresponding to AA does not appear in Theorem 2.

12 PROOF OF THE MAIN THEOREM

Throughout the proof, we assume that assumptions of Proposition 4 hold and fix the event \mathcal{F} on which the inequality in Proposition 4 holds.

12.1 GREEDY ALGORITHM UNDER A kk-SYSTEM CONSTRAINT

In this subsection, we fix time step tt and context ϕ\phi and consider the greedy algorithm under only kk-system constraint as shown in Algorithm 3. Here, we drop ϕ\phi from notation. We denote by (𝒩,)(\mathcal{N},\mathcal{I}) the kk-system.

S=S=\emptyset
while True do
       𝒩={e𝒩S+e}\mathcal{N}^{\prime}=\left\{e\in\mathcal{N}\mid S+e\in\mathcal{I}\right\}
       if 𝒩\mathcal{N}^{\prime}\neq\emptyset then
             e=argmaxe𝒩ucbt(eϕ,S)e=\operatorname{argmax}_{e\in\mathcal{N}^{\prime}}\mathrm{ucb}_{t}(e\mid\phi,S)
             Add ee to SS
            
      else
             break
            
       end if
      
end while
Return SS
Algorithm 3 GREEDY UCB
Proposition 5.

Let SS be a set returned by Algorithm 3. Then for any feasible set CC, on the event \mathcal{F}, the following inequality holds:

f(S)+2kβt1k+1σt1(S)1k+1f(SC).f(S)+\frac{2k\beta_{t-1}}{k+1}\sigma_{t-1}(S)\geq\frac{1}{k+1}f(S\cup C).
Proof.

This can be proved in a similar way to the proof in [Calinescu et al., 2011, Appendix B]. We note the proof given in Appendix B works even if we replace the optimal solution OO by any feasible set CC. We write S={e1,,em}S=\left\{e_{1},\dots,e_{m}\right\}, where e1,,eme_{1},\dots,e_{m} are added by Algorithm 3 with this order. We construct a partition C1,,CmC_{1},\dots,C_{m} of CC in the same way to the construction of OiO_{i} in [Calinescu et al., 2011]. Then |Ci|k|C_{i}|\leq k for any ii and S(1:i1)+eS^{(1:i-1)}+e is feasible for any eCie\in C_{i}. By the choice of the greedy algorithm and Proposition 4, with probability at least 1δ1-\delta, we have

Δf(eS(1:i1))\displaystyle\Delta f(e\mid S^{(1:i-1)})
μt1(eS(1:i1))+βt1σt1(eS(1:i1))\displaystyle\leq\mu_{t-1}(e\mid S^{(1:i-1)})+\beta_{t-1}\sigma_{t-1}(e\mid S^{(1:i-1)})
μt1(eiS(1:i1))+βt1σt1(eiS(1:i1))\displaystyle\leq\mu_{t-1}(e_{i}\mid S^{(1:i-1)})+\beta_{t-1}\sigma_{t-1}(e_{i}\mid S^{(1:i-1)})
Δf(eiS(1:i1))+2βt1σt1(eiS(1:i1)),\displaystyle\leq\Delta f(e_{i}\mid S^{(1:i-1)})+2\beta_{t-1}\sigma_{t-1}(e_{i}\mid S^{(1:i-1)}),

for any ii and any eCie\in C_{i}. Here we use Proposition 4 in the first and third inequality. The second inequality follows from the choice of the greedy algorithm and the fact that S(1:i1)+eS^{(1:i-1)}+e is feasible. Noting that |Ci|k|C_{i}|\leq k and taking the sum of both sides for eCie\in C_{i}, we have

k(Δf(eiS(1:i1))+2βt1σt1(eiS(1:i1)))\displaystyle k\left(\Delta f(e_{i}\mid S^{(1:i-1)})+2\beta_{t-1}\sigma_{t-1}(e_{i}\mid S^{(1:i-1)})\right)
eCiΔf(eS(1:i1))\displaystyle\geq\sum_{e\in C_{i}}\Delta f(e\mid S^{(1:i-1)})
Δf(CiS(1:i1))Δf(CiS).\displaystyle\geq\Delta f(C_{i}\mid S^{(1:i-1)})\geq\Delta f(C_{i}\mid S).

Here for subsets A,B𝒩A,B\in\mathcal{N}, we define Δf(AB)=f(AB)f(B)\Delta f(A\mid B)=f(A\cup B)-f(B) and in the second and third inequality, we use the submodularity. By taking the sum of both sides, we have

k(f(S)f()+2βt1i=1mσt1(eiS(1:i1)))\displaystyle k\left(f(S)-f(\emptyset)+2\beta_{t-1}\sum_{i=1}^{m}\sigma_{t-1}(e_{i}\mid S^{(1:i-1)})\right)
i=1mΔf(CiS)Δf(CS)=f(CS)f(S).\displaystyle\geq\sum_{i=1}^{m}\Delta f(C_{i}\mid S)\geq\Delta f(C\mid S)=f(C\cup S)-f(S).

Here the second inequality follows from submodularity of ff. Thus by non-negativity of ff, we have our assertion. ∎

12.2 SOLUTIONS OF AFSM-UCB AT EACH ROUND

In this subsection, we assume that assumptions of Theorem 2 are satisfied. The objective in this subsection is to provide a lower bound of the score of the set returned by AFSM-UCB at each round. In this subsection, we also fix time step tt.

In the next proposition, we consider sets returned by GM-UCB.

Proposition 6.

Let C𝒩C\subseteq\mathcal{N} be any set satisfying knapsack and kk-system constraints. Let SS be a set returned by GM-UCB. Then with probability at least 1δ1-\delta, we have

f(S)+2βt1σt1(S)min{ρ2,1k+1f(SC)lρk+1}.f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\min\left\{\frac{\rho}{2},\frac{1}{k+1}f(S\cup C)-\frac{l\rho}{k+1}\right\}.
Proof.

This can be proved in a similar way to the proof of [Badanidiyuru and Vondrák, 2014, Theorem 6.1] or [Mirzasoleiman et al., 2016, Theorem 5.1]. However, because of uncertainty, we need further analysis. We divide the proof into two cases.

Case One. This is the case when GM-UCB terminates because there exists an element ee such that ucbt(eS)ρc(e)\mathrm{ucb}_{t}(e\mid S)\geq\rho c(e) and ucbt(e)ρc(e)\mathrm{ucb}_{t}(e\mid\emptyset)\geq\rho c(e), but any element ee satisfying ucbt(eS),ucbt(e)ρc(e)\mathrm{ucb}_{t}(e\mid S),\mathrm{ucb}_{t}(e\mid\emptyset)\geq\rho c(e) does not satisfy the knapsack constraints, i.e., cj(S+e)>1c_{j}(S+e)>1 for some 1jl1\leq j\leq l. We fix an element ee^{\prime} satisfying ucbt(eS),ucbt(e)ρc(e)\mathrm{ucb}_{t}(e^{\prime}\mid S),\mathrm{ucb}_{t}(e^{\prime}\mid\emptyset)\geq\rho c(e^{\prime}). We write S={e1,,em}S=\left\{e_{1},\dots,e_{m}\right\}. Then by assumption, for any ii, we have ucbt(eiS(1:i1))ρc(ei)\mathrm{ucb}_{t}(e_{i}\mid S^{(1:i-1)})\geq\rho c(e_{i}). By Proposition 4, on the event \mathcal{F}, we have Δf(eiS(1:i1))+2βt1σt1(eiS(1:i1))ρc(ei)\Delta f(e_{i}\mid S^{(1:i-1)})+2\beta_{t-1}\sigma_{t-1}(e_{i}\mid S^{(1:i-1)})\geq\rho c(e_{i}) for any t,it,i. By summing the both sides, we obtain

f(S)+2βt1σt1(S)ρc(S).f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\rho c(S).

On the event \mathcal{F}, we also have

f(S)+2βt1σt1(S)\displaystyle f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq
ucbt(e1)ucbt(e)ρc(e).\displaystyle\quad\mathrm{ucb}_{t}(e_{1}\mid\emptyset)\geq\mathrm{ucb}_{t}(e^{\prime}\mid\emptyset)\geq\rho c(e^{\prime}).

Therefore, we have

f(S)+2βt1σt1(S)ρmax{c(S),c(e)}ρ2(c(S)+c(e))\displaystyle f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\rho\max\left\{c(S),c(e^{\prime})\right\}\geq\frac{\rho}{2}\left(c(S)+c(e^{\prime})\right)
=ρ2j(cj(S)+cj(e))ρ2.\displaystyle=\frac{\rho}{2}\sum_{j}\left(c_{j}(S)+c_{j}(e^{\prime})\right)\geq\frac{\rho}{2}.

Here the last inequality holds because S+eS+e^{\prime} does not satisfy the knapsack constraints.

Case Two. This is the case when GM-UCB terminates because for any element ee satisfying ucbt(eS),ucbt(e)ρc(e)\mathrm{ucb}_{t}(e\mid S),\mathrm{ucb}_{t}(e\mid\emptyset)\geq\rho c(e), ee satisfies knapsack constraints but S+eS+e does not satisfies the kk-system constraint. We note that this case includes the case when there does not exist an element ee satisfying ucbt(eS),ucbt(e)ρc(e)\mathrm{ucb}_{t}(e\mid S),\mathrm{ucb}_{t}(e\mid\emptyset)\geq\rho c(e).

We divide CC into two sets C<ρC_{<\rho} and CρC_{\geq\rho}, that is, we define

C<ρ\displaystyle C_{<\rho} ={eCi such that ucbt(eS(1:i))<ρc(e)},\displaystyle=\left\{e\in C\mid\exists i\text{ such that }\mathrm{ucb}_{t}(e\mid S^{(1:i)})<\rho c(e)\right\},
Cρ\displaystyle C_{\geq\rho} =CC<ρ\displaystyle=C\setminus C_{<\rho}
={eCiucbt(eS(1:i))ρc(e)}.\displaystyle=\left\{e\in C\mid\forall i\quad\mathrm{ucb}_{t}(e\mid S^{(1:i)})\geq\rho c(e)\right\}.

Let eC<ρe\in C_{<\rho}. Then on the event \mathcal{F}, we have Δf(eS)Δf(eS(1:i))ucbt(eS(1:i))<ρc(e)\Delta f(e\mid S)\leq\Delta f(e\mid S^{(1:i)})\leq\mathrm{ucb}_{t}(e\mid S^{(1:i)})<\rho c(e) for some ii. Here the first inequality follows from submodularity and the second inequality follows from Proposition 4. Therefore, the following inequality holds:

Δf(C<ρS)eC<ρΔf(eS)eC<ρρc(e)lρ.\Delta f(C_{<\rho}\mid S)\leq\sum_{e\in C_{<\rho}}\Delta f(e\mid S)\leq\sum_{e\in C_{<\rho}}\rho c(e)\leq l\rho. (3)

Here the first inequality follows from the submodularity of ff and the second inequality follows from the fact that C<ρC_{<\rho} is a feasible set.

Next, we consider CρC_{\geq\rho}. We run Algorithm 3 on SCρS\cup C_{\geq\rho}. By assumption of this case, Algorithm 3 returns SS. Proposition 5 implies that f(S)+2βt1σt1(S)1k+1f(SCρ).f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\frac{1}{k+1}f(S\cup C_{\geq\rho}). Rewriting, we have

Δf(CρS)kf(S)+2(k+1)βt1σt1(S).\Delta f(C_{\geq\rho}\mid S)\leq kf(S)+2(k+1)\beta_{t-1}\sigma_{t-1}(S).

By this, equation (2), and submodularity of ff, we have

Δf(CS)Δf(C<ρS)+Δf(CρS)\displaystyle\Delta f(C\mid S)\leq\Delta f(C_{<\rho}\mid S)+\Delta f(C_{\geq\rho}\mid S)
lρ+kf(S)+2(k+1)βt1σt1(S).\displaystyle\leq l\rho+kf(S)+2(k+1)\beta_{t-1}\sigma_{t-1}(S).

Noting that Δf(CS)=f(CS)f(S)\Delta f(C\mid S)=f(C\cup S)-f(S), we have

f(S)+2βt1σt1(S)1k+1f(SC)lρk+1.f(S)+2\beta_{t-1}\sigma_{t-1}(S)\geq\frac{1}{k+1}f(S\cup C)-\frac{l\rho}{k+1}.

This completes the proof. ∎

The following is a key lemma for the proof of our main Theorem.

Lemma 1.

Let α=1(1+ε)(k+2l+1)\alpha=\frac{1}{(1+\varepsilon)\left(k+2l+1\right)}. Let SS be the set returned by AFSM-UCB and OPTOPT the optimal feasible set. Then on the event \mathcal{F}, SS satisfies the following:

f(S)+4βt1σt1(S)αf(OPT).f(S)+4\beta_{t-1}\sigma_{t-1}(S)\geq\alpha f(OPT).
Proof.

By monotonicity and assumption, we have νmaxe𝒩f({e})f(OPT)\nu\leq\max_{e\in\mathcal{N}}f(\{e\})\leq f(OPT). Here, we note that by assumptions any singleton {e}\{e\} for e𝒩e\in\mathcal{N} is a feasible set. By submodularity of ff and the assumption of ν\nu^{\prime}, we have

f(OPT)eOPTf({e})|OPT|ν|𝒩|ν.f(OPT)\leq\sum_{e\in OPT}f(\{e\})\leq|OPT|\nu^{\prime}\leq|\mathcal{N}|\nu^{\prime}.

We put r=2k+2l+1r=\frac{2}{k+2l+1} and rOPT=rf(OPT)r_{OPT}=rf(OPT). By bounds of f(OPT)f(OPT) above, there exists ρ\rho in the while loop in AFSM-UCB such that (1+ε)1rOPTρrOPT(1+\varepsilon)^{-1}r_{OPT}\leq\rho\leq r_{OPT}. We denote such a ρ\rho by ρ\rho^{*}. Moreover, if ρ=rOPT\rho=r_{OPT}, then we have ρ/2=1k+1f(OPT)lρk+1\rho/2=\frac{1}{k+1}f(OPT)-\frac{l\rho}{k+1}. By Proposition 6, on the event \mathcal{F}, we have

μt1(Sρ)+3βt1σt1(Sρ)f(Sρ)+2βt1σt1(Sρ)rOPT2(1+ε)=αf(OPT).\mu_{t-1}(S_{\rho^{*}})+3\beta_{t-1}\sigma_{t-1}(S_{\rho^{*}})\geq f(S_{\rho^{*}})+2\beta_{t-1}\sigma_{t-1}(S_{\rho^{*}})\geq\frac{r_{OPT}}{2(1+\varepsilon)}=\alpha f(OPT).

Here we denote by SρS_{\rho} the returned set of GM-UCB with threshold ρ\rho. Because AFSM-UCB returns a set with the largest UCB score in {Sρ}ρ\{S_{\rho}\}_{\rho}, we have our assertion. ∎

Remark 3.

Suppose that the cardinality of any feasible solution is bounded by m>0m\in\mathbb{Z}_{>0}. Then, by the proof of Lemma 1, we see that in AFSM-UCB, the condition in the while loop ρrν|𝒩|\rho\leq r\nu^{\prime}|\mathcal{N}| can be replaced to ρrνm\rho\leq r\nu^{\prime}m. We obtain the same theoretical guarantee for this modified algorithm. Following FANTOM [Mirzasoleiman et al., 2016], we use the condition ρrν|𝒩|\rho\leq r\nu^{\prime}|\mathcal{N}|.

12.3 PROOF OF THE MAIN THEOREM

In this subsection, first, we introduce a lemma similar to [Chowdhury and Gopalan, 2017, Lemma 4] and prove the main Theorem.

Lemma 2.

Let StS_{t} be the set selected by AFSM-UCB at time step tt. We assume that λm\lambda\geq m. Then, we have

t=1Tσt1(Stϕt)2mTγmT.\sum_{t=1}^{T}\sigma_{t-1}(S_{t}\mid\phi_{t})\leq 2\sqrt{mT\gamma_{mT}}.
Proof.

First, suppose the kernel is a linear kernel. Then by [Yue and Guestrin, 2011, Lemma 5], the definition of γT\gamma_{T}, and monotonicity of TγTT\mapsto\gamma_{T} [Krause and Guestrin, 2005], we have t=1Tσt12(Stϕt)4γmT.\sum_{t=1}^{T}\sigma_{t-1}^{2}(S_{t}\mid\phi_{t})\leq 4\gamma_{mT}. And the statement of the lemma follows from this inequality and the Cauchy-Schwartz inequality. Suppose that the kernel is a positive definite. Let {ψi}i=1\{\psi_{i}\}_{i=1}^{\infty} be an orthonormal basis of the RKHS. Then for xDx\in D, we have κ(x,)=i=1ai(x)ψi\kappa(x,\cdot)=\sum_{i=1}^{\infty}a_{i}(x)\psi_{i}, where ai(x)a_{i}(x)\in\mathbb{R}. Therefore, for each x,yDx,y\in D, we have κ(x,y)=i=1ai(x)ai(y)\kappa(x,y)=\sum_{i=1}^{\infty}a_{i}(x)a_{i}(y). Since for fixed TT, only finitely many values of κ\kappa are appeared in the both sides of the inequality of the lemma, the statement for positive definite kernels can be proved by taking the limit of the inequality in the linear kernel case (see the proof of [Chowdhury and Gopalan, 2017, Theorem 2]). ∎

Proof of the main Theorem.

Because tβtt\mapsto\beta_{t} is non-decreasing and by Lemma 1, we see that the α\alpha-regret is bounded by 4βTt=1Tσ(Stϕ)4\beta_{T}\sum_{t=1}^{T}\sigma(S_{t}\mid\phi) with probability at least 1δ1-\delta . Then the assertion of the main result follows from Lemma 2. We also note that the second statement in the main article follows from the proof of [Srinivas et al., 2010, Theorem 8]. ∎

Remark 4.

We consider only stationary constraints for simplicity, i.e., the constraints (1) do not depend on time step tt. However, it is clear from the proof that only kk and ll should be stationary and we can consider non-stationary constraints.

13 PROOF OF PROPOSITION 2

In this section, we prove that MCSGreedy [Yu et al., 2016] and LSBGreedy [Yue and Guestrin, 2011] perform arbitrary poorly in some situations. We denote by Regα,MCS(T)\mathrm{Reg}_{\alpha,\mathrm{MCS}}(T) and Regα,LSB(T)\mathrm{Reg}_{\alpha,\mathrm{LSB}}(T) the α\alpha-regret of MCSGreedy and LSBGreedy respectively.

Proposition 7.

For any α>0\alpha>0, there exists cost cc, kk-system \mathcal{I}, a submodular function ff, T0>0T_{0}>0 and a constant C>0C>0 such that with probability at least 1δ1-\delta,

Regα,MCS(T)>CT,\mathrm{Reg}_{\alpha,\mathrm{MCS}}(T)>CT,

for any T>T0T>T_{0}. Moreover, the same statement holds for Regα,LSB(T)\mathrm{Reg}_{\alpha,\mathrm{LSB}}(T).

Proof.

We prove the statement only for MCSGreedy. We skip the proof for LSBGreedy because it is similar and simpler. We take a kk-system constraint as the cardinality constraint |S|m|S|\leq m, where we choose mm later. We consider 𝒩\mathcal{N} as the disjoint union of 𝒩1\mathcal{N}_{1} and 𝒩2\mathcal{N}_{2}, where |𝒩1|=|𝒩2|=m|\mathcal{N}_{1}|=|\mathcal{N}_{2}|=m. We define cost c:𝒩c:\mathcal{N}\rightarrow\mathbb{R} as

c(e)={1mif e𝒩1,1m2if e𝒩2.c(e)=\begin{cases}\frac{1}{m}&\text{if }e\in\mathcal{N}_{1},\\ \frac{1}{m^{2}}&\text{if }e\in\mathcal{N}_{2}.\end{cases}

For each e𝒩e\in\mathcal{N}, we define vev_{e} as follows:

v(e)={1mif e𝒩1,1+ϵm2if e𝒩2.v(e)=\begin{cases}\frac{1}{m}&\text{if }e\in\mathcal{N}_{1},\\ \frac{1+\epsilon}{m^{2}}&\text{if }e\in\mathcal{N}_{2}.\end{cases}

Here ϵ\epsilon is a small positive number. We assume that the objective function f:2𝒩f:2^{\mathcal{N}}\rightarrow\mathbb{R} is a modular function, i.e, f(A)=eAv(e)f(A)=\sum_{e\in A}v(e) for A𝒩A\subseteq\mathcal{N}. For e𝒩e\in\mathcal{N}, we define a function χe:2𝒩\chi_{e}:2^{\mathcal{N}}\rightarrow\mathbb{R} as

χe(A)={1if eA,0otherwise\chi_{e}(A)=\begin{cases}1&\text{if }e\in A,\\ 0&\text{otherwise}\end{cases}

Then we have f(A)=e𝒩v(e)χe(A)f(A)=\sum_{e\in\mathcal{N}}v(e)\chi_{e}(A). Therefore, this is the linear kernel case. Moreover, because feature vectors [χe(e|A)]e𝒩[\chi_{e}(e^{\prime}|A)]_{e\in\mathcal{N}}, for eAe^{\prime}\not\in A are orthogonal, this case can be reduced to the MAB setting. Therefore, we can use the UCB in Lemma 6 [Abbasi-Yadkori et al., 2011].

Denote by StS_{t} the set selected by MCSGreedy and by OPTOPT the optimal solution. Because c(e)1/mc(e)\leq 1/m for all e𝒩e\in\mathcal{N}, we have |St|=m|S_{t}|=m. It is easy to see that OPT=𝒩1OPT=\mathcal{N}_{1}. Therefore we have f(OPT)=1f(OPT)=1.

We take sufficiently large mm so that

(m3)(1+ϵ)m2+3m<α.\frac{(m-3)(1+\epsilon)}{m^{2}}+\frac{3}{m}<\alpha.

For e𝒩e\in\mathcal{N} and 1tT1\leq t\leq T, we denote by Ne,tN_{e,t} the number of times ee is selected in time steps τ=1,,t\tau=1,\dots,t. Then by Lemma 6 [Abbasi-Yadkori et al., 2011], UCB of v(e)v(e) for e𝒩e\in\mathcal{N} is given as μe,t+σe,t\mu_{e,t}+\sigma_{e,t} where μe,t\mu_{e,t} is the mean of feedbacks for v(e)v(e) and σe,t\sigma_{e,t} is given as follows:

σe,t=1+Ne,tNe,t2(1+2ln(d(1+Ne,t)1/2/δ)),\sigma_{e,t}=\sqrt{\frac{1+N_{e,t}}{N_{e,t}^{2}}\left(1+2\ln\left(d(1+N_{e,t})^{1/2}/\delta\right)\right)},

where d=2md=2m. We note that at each round MCSGreedy selects elements with the largest modified UCB (μe,t+σe,t)/c(e)(\mu_{e,t}+\sigma_{e,t})/c(e) except the first three elements. Suppose that the modified UCB of an element in 𝒩1\mathcal{N}_{1} is greater than that of an element in 𝒩2\mathcal{N}_{2}, that is,

m(μe1,t+σe1,t)>m2(μe2,t+σe2,t).m(\mu_{e_{1},t}+\sigma_{e_{1},t})>m^{2}(\mu_{e_{2},t}+\sigma_{e_{2},t}).

By Lemma 6 [Abbasi-Yadkori et al., 2011], we have with probability at least 1δ1-\delta,

m(v(e1)+2σe1,t)m(μe1,t+σe1,t)>m2(μe2,t+σe2,t)m2v(e2).m(v(e_{1})+2\sigma_{e_{1},t})\geq m(\mu_{e_{1},t}+\sigma_{e_{1},t})>m^{2}(\mu_{e_{2},t}+\sigma_{e_{2},t})\geq m^{2}v(e_{2}).

Thus we have

2mσe1,t>ϵ.2m\sigma_{e_{1},t}>\epsilon.

Therefore, there exists N(δ,ϵ,m)>0N(\delta,\epsilon,m)\in\mathbb{Z}_{>0} such that

Ne1,t<N(δ,ϵ,m).N_{e_{1},t}<N(\delta,\epsilon,m).

We assume that TT is sufficiently large compared to N(δ,ϵ,m)N(\delta,\epsilon,m). We see that MCSGreedy selects at least (m3)TmN(δ,ϵ,m)(m-3)T-mN(\delta,\epsilon,m) elements from 𝒩2\mathcal{N}_{2}. Thus with probability at least 1δ1-\delta, we have

t=1Tf(St)((m3)TmN(δ,ϵ,m))1+ϵm2+(3T+mN(δ,ϵ,m))1m.\sum_{t=1}^{T}f(S_{t})\leq\left((m-3)T-mN(\delta,\epsilon,m)\right)\frac{1+\epsilon}{m^{2}}+\left(3T+mN(\delta,\epsilon,m)\right)\frac{1}{m}.

Because

αt=1Tf(OPT)=αT,\alpha\sum_{t=1}^{T}f(OPT)=\alpha T,

we have our assertion. ∎

14 PARAMETERS IN THE EXPERIMENTS

Because this is the linear kernel case, by Theorem 5 [Srinivas et al., 2010], we have γt=O(klnt)\gamma_{t}=O(k\ln t). Thus we modify the definition of βt\beta_{t} as βt=B+R1R2klnmt+1+ln(1/δ),\beta_{t}=B+R_{1}\sqrt{R_{2}k\ln m_{t}+1+\ln(1/\delta)}, where B,R1B,R_{1}, and R2R_{2} parameters. For all datasets, we take ν=0.01\nu=0.01 and ν=1.0\nu^{\prime}=1.0. We take ε=0.3\varepsilon=0.3, ε=1.0\varepsilon=1.0, and ε=0.1\varepsilon=0.1 for the news article recommendation dataset, MovieLens100k, and the Million Song Dataset respectively. We tune other parameters by using different users. Explicitly, we take β=0.01,R1=0.1,R2=1.0,λ=0.1\beta=0.01,R_{1}=0.1,R_{2}=1.0,\lambda=0.1 for the news article recommendation example, β=0.01,R1=0.1,R2=1.0,λ=1.0\beta=0.01,R_{1}=0.1,R_{2}=1.0,\lambda=1.0 for MovieLens100k, and β=0.01,R1=0.01,R2=1.0,λ=3.0\beta=0.01,R_{1}=0.01,R_{2}=1.0,\lambda=3.0 for the Million Song Dataset.

References

  • Abbasi-Yadkori et al. [2011] Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pages 2312–2320, 2011.
  • Badanidiyuru and Vondrák [2014] Ashwinkumar Badanidiyuru and Jan Vondrák. Fast algorithms for maximizing submodular functions. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1497–1514. Society for Industrial and Applied Mathematics, 2014.
  • Badanidiyuru et al. [2014] Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 671–680, 2014.
  • Bertin-Mahieux et al. [2011] Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The million song dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011), 2011.
  • Calinescu et al. [2011] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766, 2011.
  • Chekuri et al. [2010] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Dependent randomized rounding via exchange properties of combinatorial structures. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 575–584. IEEE, 2010.
  • Chekuri et al. [2014] Chandra Chekuri, Jan Vondrák, and Rico Zenklusen. Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM Journal on Computing, 43(6):1831–1879, 2014.
  • Chen et al. [2017] Lin Chen, Andreas Krause, and Amin Karbasi. Interactive submodular bandit. In Advances in Neural Information Processing Systems, pages 141–152, 2017.
  • Chen et al. [2013] Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework and applications. In Proceedings of the 30th International Conference on Machine Learning, pages 151–159, 2013.
  • Chowdhury and Gopalan [2017] Sayak Ray Chowdhury and Aditya Gopalan. On kernelized multi-armed bandits. In Proceedings of the 34th International Conference on Machine Learning, pages 844–853, 2017. supplementary material.
  • Craswell et al. [2008] Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining, pages 87–94. ACM, 2008.
  • El-Arini et al. [2009] Khalid El-Arini, Gaurav Veda, Dafna Shahaf, and Carlos Guestrin. Turning down the noise in the blogosphere. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 289–298. ACM, 2009.
  • Grouplens [1998] Grouplens. Movielens100k dataset. 1998. URL https://grouplens.org/datasets/movielens/100k/.
  • Gupta et al. [2010] Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In International Workshop on Internet and Network Economics, pages 246–257. Springer, 2010.
  • Hiranandani et al. [2019] Gaurush Hiranandani, Harvineet Singh, Prakhar Gupta, Iftikhar Ahamath Burhanuddin, Zheng Wen, and Branislav Kveton. Cascading linear submodular bandits: Accounting for position bias and diversity in online learning to rank. In Proceedings of Uncertainty in AI., 2019.
  • Koren et al. [2009] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, (8):30–37, 2009.
  • Krause and Golovin [2014] Andreas Krause and Daniel Golovin. Submodular function maximization., 2014. In Tractability: Practical Approaches to Hard Problems, Cambridge University Press.
  • Krause and Guestrin [2005] Andreas Krause and Carlos E Guestrin. Near-optimal nonmyopic value of information in graphical models. In Proceedings of Uncertainty in AI., 2005.
  • Krause and Ong [2011] Andreas Krause and Cheng S Ong. Contextual gaussian process bandit optimization. In Advances in Neural Information Processing Systems, pages 2447–2455, 2011.
  • Lattimore and Szepesvári [2019] Tor Lattimore and Csaba Szepesvári. Bandit Algorithm. Cambridge University Press, Forthcoming, 2019. https://tor-lattimore.com/downloads/book/book.pdf.
  • Li and Shroff [2020] Wenxin Li and Ness Shroff, Efficient Algorithms and Lower Bound for Submodular Maximization. arXiv preprint, arXiv:1804.08178v3, 2020.
  • Lin and Bilmes [2011] Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 510–520. Association for Computational Linguistics, 2011.
  • Mestre [2006] Julián Mestre. Greedy in approximation algorithms. In European Symposium on Algorithms, pages 528–539. Springer, 2006.
  • Mestre [2015] Julián Mestre. On the intersection of independence systems. Operations Research Letters, 43(1):7–9, 2015.
  • Mirzasoleiman et al. [2016] Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. Fast constrained submodular maximization: Personalized data summarization. In Proceedings of The 33rd International Conference on Machine Learning, pages 1358–1367, 2016.
  • Nemhauser and Wolsey [1978] George L Nemhauser and Laurence A Wolsey. Best algorithms for approximating the maximum of a submodular set function. Mathematics of operations research, 3(3):177–188, 1978.
  • Radlinski et al. [2008] Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning, pages 784–791, 2008.
  • Sarpatwar et al. [2019] Kanthi K Sarpatwar, Baruch Schieber, and Hadas Shachnai. Constrained submodular maximization via greedy local search. Operations Research Letters, 47(1):1–6, 2019.
  • Srinivas et al. [2010] Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: no regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, pages 1015–1022. Omnipress, 2010.
  • Yu et al. [2016] Baosheng Yu, Meng Fang, and Dacheng Tao. Linear submodular bandits with a knapsack constraint. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
  • Yue and Guestrin [2011] Yisong Yue and Carlos Guestrin. Linear submodular bandits and their application to diversified retrieval. In Advances in Neural Information Processing Systems, pages 2483–2491, 2011.
  • Ziegler et al. [2005] Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web, pages 22–32. ACM, 2005.