Stochastic -Submodular Bandits with Full Bandit Feedback
Abstract
In this paper, we present the first sublinear -regret bounds for online -submodular optimization problems with full-bandit feedback, where is a corresponding offline approximation ratio. Specifically, we propose online algorithms for multiple -submodular stochastic combinatorial multi-armed bandit problems, including (i) monotone functions and individual size constraints, (ii) monotone functions with matroid constraints, (iii) non-monotone functions with matroid constraints, (iv) non-monotone functions without constraints, and (v) monotone functions without constraints. We transform approximation algorithms for offline -submodular maximization problems into online algorithms through the offline-to-online framework proposed by Nie et al. (2023a). A key contribution of our work is analyzing the robustness of the offline algorithms.
1 INTRODUCTION
In various sequential decision problems, including sensor placement, influence maximization, and clinical trials, decisions are made by sequentially selecting a subset of elements, making assignments for those elements, and then observing the outcomes. These scenarios often exhibit diminishing returns based on the choice of the subset of elements and their corresponding assignments.
Consider a multi-agent scenario where multiple companies (agents) cooperate to spread different types of content across a social network. Each company can select a subset of users and control how it distributes content to initiate propagation. The collective goal is to maximize the overall spread of all content types across the network. However, each company’s decisions impact not only its own success but also that of others. For instance, if two companies target users with overlapping follower networks, their efforts may lead to redundant influence, limiting the potential additional spread. Conversely, well-coordinated strategies could generate synergistic effects, amplifying the overall reach. This redundancy effect exists even for individual companies, making it crucial to target diverse users to avoid diminishing returns.
Each type of content follows an independent propagation process. Due to privacy restrictions, the companies only know the total number of users influenced by the content, without visibility into the underlying diffusion process. Additionally, because the diffusion is inherently random, the observations may vary, necessitating a balance between exploiting successful configurations (exploitation) and experimenting with different users and assignments to discover better strategies (exploration). See Appendix D in the supplementary material for more motivating applications.
For each company, an offline variant of this sequential decision problem (i.e., where the agent has access to an exact value oracle and is only evaluated on a final output) can be modeled as a (un)constrained -submodular optimization problem (Huber & Kolmogorov, 2012). The class of -submodular functions generalizes the class of submodular set functions, an important non-linear function class for numerous optimization problems exhibiting a notion of diminishing returns, by not only accounting for the value of a subset of elements (e.g., which users) but also the assignment (e.g., which types of content to which users) as well. Maximizing a -submodular function is known to be NP-hard even for unconstrained problems (Ward & Živný, 2014). There has been significant progress in developing approximation algorithms for various offline -submodular maximization problems (Iwata et al., 2015; Ohsaka & Yoshida, 2015; Sakaue, 2017).
The sequential decision problem described above can be modeled as a stochastic combinatorial Multi-Armed Bandit (CMAB) problem with (i) an expected reward that is -submodular, (ii) an action space formed by the constraints (if any) on elements and their assignments, and (iii) bandit feedback. Multi-Armed Bandits is a classical framework for sequential decision-making under uncertainty. Combinatorial MAB is a specialized branch where each action is a combination of base arms, known as a “super arm”, and the number of super-arms is prohibitively large to try each one.
For CMAB problems with non-linear rewards, the nature of the feedback significantly affects the problem complexity. “Semi-bandit” feedback involves observing additional information (such as the value of individual arms) aside from joint reward of the selected super arm, which greatly simplifies learning. In contrast, “bandit” or “full-bandit” feedback observes only the (joint) reward for actions. Estimating arm values for non-linear rewards with bandit feedback requires deliberate sampling of sub-optimal actions (e.g., such as actions consisting of individual base arms when the function is monotone), which is not typically done in standard MAB methods, since by design they do not take actions identified (with some confidence) to be sub-optimal.
In this paper, we address the problem of stochastic CMAB with -submodular expected rewards, various constraints, and only bandit feedback.
Our Contributions: We propose and analyze the first CMAB algorithms with sub-linear -regret for -submodular rewards using only full-bandit feedback. For each of the following results, we achieve them in part through analyzing the robustness of respective offline algorithms.
-
•
For non-monotone -submodular rewards and no constraints, we propose a CMAB algorithm with -regret.
-
•
For monotone -submodular rewards and no constraints, we propose a CMAB algorithm with -regret.
-
•
For monotone -submodular rewards under individual size constraints, we propose a CMAB algorithm with -regret.
-
•
For monotone -submodular rewards under a matroid constraint, we propose a CMAB algorithm with -regret. We specialize our CMAB algorithm for monotone functions with a matroid constraints to the case of total size constraints.
-
•
For non-monotone -submodular rewards under a matroid constraint, we propose a CMAB algorithm with -regret.
In the offline setting, it is important to highlight that many of the properties of no longer hold in the presence of noise. For instance, in the case of a monotone function , the marginal gains of (noisy version of ) may no longer be guaranteed to be positive. Similarly, for non-monotone objectives, the property of pairwise monotonicity (see Section 2) for might not remain valid for . As a result, many proof steps in the original paper(s) that introduced those offline algorithms do not go through directly and, consequently, need novel analysis. To address these challenges, this study incorporates bounded error to establish properties akin to the original ones in their respective contexts.
Analyzing the robustness of the offline algorithms is important, even without transforming them into an online setting. For those algorithms that possess inherent robustness, we refrain from modifications and maintain the original offline algorithm to ensure an efficient offline-to-online transformation. However, substantial modifications may be necessary for the original offline algorithm to maintain robustness in the presence of noise (e.g., Section 4). In such cases, we aim to introduce minimal alterations to the offline algorithm. We note that these adjustments require careful design, which is a part of the novelty of this paper.
Remark 1.1.
The regret guarantees we obtained in this work are all of order . It is unknown whether expected cumulative -regret is possible even in the case . The only known results with bandit feedback for are for submodular bandits (Streeter & Golovin, 2008; Niazadeh et al., 2021). While some lower bounds of exist (e.g., in adversarial setup with cardinality constraint (Niazadeh et al., 2021), or stochastic setup with cardinality constraint where regret is defined as gap to greedy algorithm (Tajdini et al., 2023)), they are for specialized setups even for , and thus a general understanding of lower bounds in these setups is an open problem.
Ref. | Objective | Constraint | Our -regret | |||
---|---|---|---|---|---|---|
Iwata et al. (2015) | Non-Monotone | Unconstrained | ||||
Iwata et al. (2015) | Monotone | Unconstrained | ||||
Ohsaka & Yoshida (2015) | Monotone | Total Size | ||||
Sun et al. (2022)* | Non-Monotone | Total Size | ||||
Ohsaka & Yoshida (2015) | Monotone | Individual Size | ||||
Sakaue (2017) | Monotone | Matroid | ||||
Sun et al. (2022) | Non-Monotone | Matroid |
Related works: We briefly highlight closely related works. See Appendix C for an expanded discussion.
-submodular CMAB
To the best of our knowledge, the only prior work for -submodular CMAB for any constraint type and/or feedback model is presented in (Soma, 2019). They considered unconstrained -submodular maximization under semi-bandit feedback in adversarial setting. For the non-monotone and monotone cases, they propose algorithms with and regrets upper bounded by respectively. Their approach is based on Blackwell approachability (Blackwell, 1956). As discussed in the introduction, availability of semi-bandit feedback in CMAB problems with non-linear rewards significantly simplifies the learning problem (i.e., with semi-bandit feedback -regret bound dependence is typically easy to achieve). We also note that the adversarial CMAB setting does not strictly generalize the stochastic setting. In the adversarial setting, the reward functions at each time step must exhibit -submodularity but otherwise can vary widely. In the stochastic setting, the individual ’s are not necessarily -submodular but the expected reward function, represented as , is.
Submodular CMAB
For submodular rewards (i.e., ), Streeter & Golovin (2008) proposed and analyzed an algorithm for adversarial CMAB with submodular rewards, full-bandit feedback, and under a knapsack constraint (though only in expectation, taken over randomness in the algorithm). The authors adapted a simpler greedy algorithm (Khuller et al., 1999), using an -greedy exploration type framework. Niazadeh et al. (2021) proposed a framework for transforming iterative greedy -approximation algorithms for offline problems to online methods in an adversarial bandit setting, for both semi-bandit (achieving -regret) and full-bandit feedback (achieving -regret). In the stochastic setting, Nie et al. (2022) proposed an explore-then-commit type algorithm for online monotone submodular maximization under cardinality constraint. The result is extended with an optimized stochastic-explore-then-commit approach in Fourati et al. (2024b). Fourati et al. (2023) proposed randomized greedy learning algorithm for online non-monotone submodular maximization. Nie et al. (2023a) recently proposed a general framework for adapting offline to online algorithms under full bandit feedback. Their framework require the adapted offline algorithm to satisfy the so called -robustness property.
There are also a number of works that require additional “semi-bandit” feedback. For combinatorial MAB with submodular rewards, a common type of semi-bandit feedback are marginal gains (Lin et al., 2015; Yue & Guestrin, 2011; Yu et al., 2016; Takemori et al., 2020), which enable the learner to take actions of maximal cardinality or budget, receive a corresponding reward, and gain information not just on the set but individual elements.
2 BACKGROUND
2.1 -Submodular Functions:
Let be a positive integer for the number of types (i.e., types of stories) and be the ground set of elements (i.e., users in a social network). Let . A function is called -submodular if, for any and in , we have
where Note that setting in these definitions recovers submodular functions, set intersection, and set union, respectively. We further define a relation on so that, for and in if for every with . We also define the marginal gain of assigning type to element given a current solution (provided that has not been assigned any type in ),
for , and .
Theorem 2.1.
(Ward & Živný, 2014)
A function is -submodular if and only if satisfies the following two conditions:
Orthant submodularity: for any with , and ;
Pairwise monotonicity: for any , and with .
We define the support of as and define the support of type as . Further, a function is called monotone if for any , and .
Matroids
For a finite set and , we say a system is a matroid if the following hold:
-
•
(M1) ,
-
•
(M2) If then ,
-
•
(M3) If and then there exists such that .
The elements of are called independent, and we say is maximal if no satisfies . A maximal independent set is called a basis of the matroid. The rank function of a matroid is defined as . Under the matroid constraint with a matroid , a solution is feasible if .
A simple but important matroid constraint is the uniform matroid in which for a given . It is equivalent to the Total Size (TS) constraint in the problem we consider.
2.2 CMAB
In the CMAB framework, we consider sequential, combinatorial decision-making problems over a finite time horizon . Let denote the ground set of base arms and denote the number of arms. Let denote the subset of feasible actions, for which we presume membership can be efficiently evaluated. We will use the terminologies subset and action interchangeably throughout the paper.
At each time step , the learner selects a feasible action . After the subset is selected, the learner receives a reward . We assume the reward is stochastic, bounded in , and i.i.d. conditioned on the action . Define the expected reward function as . The goal of the learner is to maximize the cumulative reward . To measure the performance of the algorithm, one common metric is to compare the learner to an agent with access to a value oracle for . However, if optimizing over is NP-hard, such a comparison would not be meaningful unless the horizon is exponentially large in the problem parameters. Alternatively, if there exists a known approximation algorithm with an approximation ratio for optimizing over , it is more natural to evaluate the performance of a CMAB algorithm against what could achieve. In this scenario, we consider the expected cumulative -regret , which quantifies the difference between times the cumulative reward of the optimal subset’s expected value and the average received reward, (we write when is understood from context)
(1) |
where OPT is the optimal solution, i.e., and the expectations are with respect to both the rewards and actions (if random).
2.3 Problem Statement
We consider the sequential decision making problem under the stochastic CMAB framework where the expected reward function is -submodular. Each arm consists of a tuple (we call it an item-type pair): , and a super arm is defined as , which is a combination of base arms. We consider full-bandit feedback, where after each time an action is selected, the learner can only observe as reward. We aim to transform various offline algorithms in -submodular optimization literature to online algorithms, thus we consider regret as our performance metric and is defined to be the approximation ratio of the corresponding offline algorithm.
2.4 Offline-to-Online Framework
As we adopt the novel offline-to-online transformation framework proposed in (Nie et al., 2023a), in this section, we briefly introduce the framework. In (Nie et al., 2023a), they introduced a criterion for the robustness of an offline approximation algorithm. They showed that this property alone is sufficient to guarantee that the offline algorithm can be adapted to solve CMAB problems in the corresponding online setting with just bandit feedback and achieve sub-linear regret. More importantly, the CMAB adaptation will not rely on any special structure of the algorithm design, instead employing it as a black box. We restate the robustness definition in the following. This definition of robustness indicates that the algorithm will maintain reasonably good performance when function evaluations have errors.
Definition 2.2 (-Robust Approximation (Nie et al., 2023a)).
An algorithm (possibly random) is an -robust approximation algorithm for the combinatorial optimization problem of maximizing a function over a finite domain if its output using a value oracle for satisfies the relation below with the optimal solution under , provided that for any that for all ,
where is the ground set, the expectation is over the randomness of the algorithm , and algorithm uses at most value oracle queries.
Note that when we have access to the exact oracle (), the definition will give us the same guarantee as the original offline algorithm; if the offline algorithm is exact, . Equipped with a robustness assurance, the authors introduced a stochastic CMAB algorithm named “Combinatorial Explore-Then-Commit” (C-ETC). See Algorithm 1 for the pseudocode. C-ETC interfaces with an offline -robust algorithm denoted as . In the exploration phase, when requests information from the value oracle pertaining to action , C-ETC adopts a strategy of executing action multiple times, specifically times, where is an optimization parameter. Subsequently, C-ETC computes the empirical mean, represented as , of the rewards associated with action and communicates this computed value back to the offline algorithm . In the exploitation phase, C-ETC continuously deploys the solution generated by the algorithm. They showed the following theorem:
Theorem 2.3.
(Nie et al., 2023a) 111Nie et al. (2023a)’s results were for deterministic offline approximation algorithms. See https://arxiv.org/pdf/2301.13326.pdf for an extension to randomized offline approximation algorithms. The expected cumulative -regret of C-ETC using an -robust approximation algorithm as a subroutine is at most with .
This approach allows for a CMAB procedure that does not require any specialized structural characteristics from . It necessitates no intricate construction beyond the execution of , provided the criteria of robustness (Definition 2.2) are met.
Remark 2.4.
We note that this result can be extended to multiple agents, where the exploration ( times play of action ) happens in a distributed manner over the agents as shown in Fourati et al. (2024a).
In the following sections, we transform various offline algorithms for -submodular maximization problems under different constraints to online bandit setting by analyzing the robustness of the offline algorithms. In analyzing the robustness of offline algorithms, we assume the offline algorithm is evaluated with a surrogate function with for all , instead of the exact function . We will highlight some parts of the proof for non-monotone unconstrained problem, and defer all missing proofs to the appendix.
Remark 2.5 (Offline v.s. Online Algorithms).
We note the following major differences between offline algorithms and online algorithms. First, an offline algorithm assumes exact oracle access to the objective function . However, in certain scenarios, the objective might not be predetermined. Nonetheless, if we engage in recurrent tasks while encountering objectives , potentially sampled from a distribution, there is a possibility of acquiring the ability to perform satisfactorily on average over time. Such algorithms are referred to as online algorithms. Second, offline algorithms are designed to optimize an objective in the sense of finding a “final” solution, but the quality of intermediate solutions does not matter. Those offline algorithms care about computational and sample complexity that depends on problem size, but there is no sense of a horizon explicitly. In contrast, the aim of online algorithms is not to output a single solution (there is “simple regret” but we consider more common cumulative regret) but the sum of achieved values, so intermediate solutions matter and the horizon plays an explicit role.
3 Non-monotone Functions without Constraints
In this section, we consider the case where the expected objective function is non-monotone -submodular and there is no constraint in selecting actions. We adopt the offline algorithm proposed in Iwata et al. (2015).
3.1 Algorithm
We first remark the following key fact for (monotone or non-monotone) -submodular maximization without constraints (see Iwata et al. (2013) for the proof):
Proposition 3.1.
For any -submodular function with , there exists a partition of that attains the maximum value of .
This says that there is an optimal solution with all elements included (a combinatorial search is needed to determine their types). Proposition 3.1 remains valid for non-monotone functions due to the special property of -submodular functions (pairwise monotonicity).
Armed with this property, they proposed a randomized offline algorithm for maximizing a non-monotone -submodular maximization without constraint. The algorithm is presented in Algorithm 2 in the appendix. This algorithm iterates through all (in an arbitrary but fixed order) and selects the type randomly according to some carefully designed probabilities for each . The authors showed that this algorithm achieves a -approximation ratio for non-monotone -submodular maximization without constraints.
3.2 Robustness
We show that Algorithm 2 is robust for a certain and . Due to that robustness we can use the framework proposed in Nie et al. (2023a), to transform Algorithm 2 into an online algorithm under bandit feedback achieving sublinear -regret.
Let be an optimal solution with (by Proposition 3.1 such a solution exists). Let be the output of the algorithm with using surrogate function . We consider the -th iteration of the algorithm, and let be the element of considered in the -th iteration, be the probability that -th type is chosen in the -th iteration, and be the solution after the -th iteration, where . Also for , let be the element in obtained from by replacing the coordinates on with those of , and for let be the element in obtained from by changing with 0.
For , let , and let , . Due to pairwise monotonicity, we have and for all with , and thus, while the surrogate function does not necessarily have pairwise monotonicity, we have
(2) |
for all with since both inequalities involve a sum of four function values of which can be off by at most with respect to . Also from which holds by construction, orthant submodularity implies for all and thus,
(3) |
for all . W.l.o.g., we assume .
Before showing the main result, we first show the following lemma, which is a generalization of Lemma 2.2 in Iwata et al. (2015) in the presence of noise.
Lemma 3.2.
Let . Conditioning on , suppose that
(4) |
holds for each with , where . Then .
We defer the proof of this lemma to Section A.1. Then, we show the following proposition:
Proposition 3.3.
Algorithm 2 for maximizing a non-monotone -submodular function is a -robust approximation algorithm.
To show Proposition 3.3, by Definition 2.2, we aim to show that the output of Algorithm 2 satisfies when the function evaluation is based on surrogate function . By Lemma 3.2 it suffices to prove (4) for every for and . For simplicity of the description, we shall omit the superscript if it is clear from the context. Due to limited space, here we only consider the case when since this case needs more effort technically. The proof for other cases is in supplementary material Section A.2.
Part of proof of Proposition 3.3.
() Our goal is to show
(5) |
which is equivalent to (4) with and . By the ordering of and (3), for we have
(6) |
Denote .
Case 1: If , we have . Since and since from Line 11 in Algorithm 2, a positive probability is only assigned to positive types with positive values,
Thus (5) follows.
Case 2: If and , since from Line 11 in Algorithm 2, a positive probability is only assigned to positive values, we have that by (6). When , since for any by definition of , we have . When ,
(by (2)) | ||||
where the last inequality follows from , and , and follows from Line 11 in Algorithm 2 that the largest single probability assigned is .
Case 3: If and , we have
(using (6) and (3)) | |||
(7) |
The first term equals by simple calculations. Hence it suffices to show that the second term of (7) is greater than or equal to .
Sub-case 3.a: First, we consider . By definition of , for all , . The second term of (7) can then be bounded
(8) |
Sub-case 3.b: Second we consider . Since , we have
(by construction of ’s) | |||
(geometric sum) | |||
(9) |
Remark 3.4.
The assumption of bounded noise plays an essential role in the analysis. Although in the presence of noise, the original property (e.g., pairwise monotonicity) of the function does not hold, with bounded noise bridging the and , we can still have a relaxed version of the desired property, leading to similar approximation ratio with only small error. In many cases, showing the relaxed version require completely new steps.
3.3 Regret Bound
Once we have analysed the robustness of Algorithm 2 and identified the number of function evaluations of Algorithm 2 is exactly , we can employ the framework proposed in Nie et al. (2023a) and obtain the following result:
Corollary 3.5.
For an online non-monotone unconstrained -submodular maximization problem, the expected cumulative -regret of C-ETC using Algorithm 2 as a sub-routine is at most given .
4 Monotone Functions without Constraints
In this section, we continue to explore the unconstrained case, but now the objective is monotone. We adopt the offline -approximation algorithm proposed in (Iwata et al., 2015). The pseudo-code is presented in Algorithm 3 in the appendix. Similar to Algorithm 2 for the non-monotone case, the algorithm iterates through all (in an arbitrary but fixed order) and selects the type randomly according to some carefully designed probabilities for each .
One important note to highlight is the modification made on line 6 of the algorithm. In the original algorithm from (Iwata et al., 2015), it was stated as . This particular step is not robust to noise since later in line 9, the probability of is assigned to type . The probability becomes meaningless if and this is possible due to noise. We identified that a sufficient modification is to change this step to . Here, if , and 0 otherwise. Note that when there is no noise, Algorithm 3 reduces to the original algorithm in Iwata et al. (2015) since in the monotone case, always holds.
We show the following robustness guarantee of Algorithm 3:
Proposition 4.1.
Algorithm 3 for maximizing a monotone -submodular function is a -robust approximation.
By Definition 2.2, we aim to show that the output of Algorithm 3 satisfies when the function evaluation is based on surrogate function . Note that in this setting, Lemma 3.2 still holds. Thus, it is sufficient to show
(11) |
which is equivalent to (4) with and . The detailed proof is in Section A.3.
Remark 4.2.
This modification of becomes significant when dealing with a noisy function oracle , where it’s possible that . Our analysis demonstrates that this modification is crucial in handling the case of , which is a trivial case in the noiseless analysis of Iwata et al. (2015). In the presence of noise, the case of becomes non trivial. It also plays a pivotal role in another case (), where we utilized . In the original algorithm, does not necessarily hold due to noise.
With the robustness guarantee for Algorithm 3, we can transfer it to the online bandit setting using the framework proposed in Nie et al. (2023a):
Corollary 4.3.
For an online monotone unconstrained -submodular maximization problem, the expected cumulative -regret of C-ETC is at most given .
5 Monotone Functions with Individual Size Constraints
In this section, we consider Individual Size (IS) constraint. In IS, each type has a limit on the maximum number of pairs of that type , with as the total budget. We consider the offline greedy algorithm proposed in Ohsaka & Yoshida (2015).
The pseudo-code is presented in Algorithm 4 in the appendix. The algorithm builds the solution greedily by iteratively including the element-type pair that yields the largest marginal gain, as long as that pair is still available and the individual budget is not exhausted. We show the following robustness guarantee of Algorithm 4:
Proposition 5.1.
Algorithm 4 for maximizing a monotone -submodular function under individual size constraints is a -robust approximation, where is the total size limit.
The proof is in Section A.4. With the robustness guarantee for Algorithm 4, we can transfer it to the online bandit setting using the framework proposed in Nie et al. (2023a):
Corollary 5.2.
For an online monotone -submodular maximization problem under individual size constraints, the expected cumulative -regret of C-ETC is at most given .
6 Monotone Functions with Matroid Constraints
In this section, we consider matroid constraint with monotone objective functions. We adapt the offline algorithm proposed in Sakaue (2017), which is depicted in Algorithm 5 in the appendix. The algorithm still builds the solution in a greedy manner. At each iteration, the algorithm constructs available elements given the current solution using an assumed independence oracle, and includes the element-type pair that yields the largest marginal gain. We show the following robustness guarantee of Algorithm 5:
Proposition 6.1.
Algorithm 5 for maximizing a monotone -submodular function under a matroid constraint is a -robust approximation, where is the rank of the matroid.
The proof is in Section A.5. With the robustness guarantee for Algorithm 5, we transform it into a CMAB algorithm using the framework proposed in (Nie et al., 2023a):
Corollary 6.2.
For an online monotone -submodular maximization problem under a matroid constraint, the expected cumulative -regret of C-ETC is at most given .
One special case of matroid constraint is Total Size (TS) constraint (Ohsaka & Yoshida, 2015). In TS, there is a limited budget on the total number of element-type pairs that can be selected. This is also called uniform matroid in literature. In this case, the rank of the matroid is the total size budget . We can immediately get an online algorithm as a result of Corollary 6.2:
Corollary 6.3.
For an online monotone -submodular maximization problem under a TS constraint, the expected cumulative -regret of C-ETC is at most given , using as an upper bound of the number of function evaluations for Algorithm 5.
Remark 6.4.
In Sections 5 and 6, although marginal gains are not guaranteed to be non-negative, we prove that there is no need to modify the algorithms as in Section 4. This is due to the inherent design of the algorithms. In Algorithm 3, probabilities are assigned to each item in proportion to their marginal gains for each type (Line 9), so it is necessary to ensure that the sum of the marginal gains is non-negative. However, Algorithms 4 and 5 select item-type pairs by directly comparing marginal gains, regardless of whether they are positive or negative. In cases where monotonicity does not hold due to noise, we use bounded error analysis to derive analogous properties.
7 Non-Monotone Functions with Matroid Constraints
In Sun et al. (2022), it is shown that Algorithm 5 in appendix can achieve approximation ratio even the objective function is non-monotone. We show the following robustness result when Algorithm 5 is fed with surrogate function .
Proposition 7.1.
Algorithm 5 for maximizing a non-monotone -submodular function under a matroid constraint is a -robust approximation, where is the rank of the matroid.
The proof is in Section A.6. We use the offline-to-online transformation framework proposed in Nie et al. (2023a) to adapt Algorithm 5:
Corollary 7.2.
For an online non-monotone -submodular maximization problem under a matroid constraint, the expected cumulative -regret of C-ETC is at most given .
8 Evaluations
In this section, we empirically evaluate the performance of our proposed methods in the context of online influence maximization with topics. This problem is formulated as a monotone -submodular bandit problem (detailed below). As there are no directly comparable baselines in the literature, we select naive UCB (NaiveUCB) and random selection (Random) as benchmarks, with UCB implemented using the standard UCB1 algorithm.
Online Influence Maximization with Topics: The problem involves a social network represented as a directed graph , where is the set of nodes and is the set of edges. Each edge has associated weights , , where denotes the influence strength from user to on topic . The goal is to maximize the number of users influenced by one or more topics. We adopt the -topic independent cascade (-IC) model from Ohsaka & Yoshida (2015), which generalizes the independent cascade model (Kempe et al., 2003).
Specifically, the influence spread in the -IC model is defined as , where is a random variable representing the set of influenced nodes in the diffusion process of topic . As shown in Ohsaka & Yoshida (2015), is monotone -submodular. Given a directed graph , edge probabilities , the task is to select a seed set that maximizes , subject to some constraints (e.g., total size constraints).
Experiment settings: We use the ego-facebook network (Leskovec & Mcauley, 2012). After applying a community detection algorithm (Blondel et al., 2008) to extract a subgraph, we convert it into a directed graph consisting of 350 users and 2,845 edges. Three sets of edge weights, corresponding to three topics (), are generated: Uniform(0, 0.2), Normal(0.1, 0.05), and Exponential(10), with the Normal and Exponential distributions truncated to [0, 1]. We evaluate our algorithms under both TS and IS constraints. The TS budget is set to , while for IS constraints, the budgets for the three topics are all 2. As greedy algorithms offer and approximations for TS and IS respectively (Ohsaka & Yoshida, 2015), we use an offline greedy algorithm to compute the -optimal value, with 100 diffusion simulations approximating the true function value. Since all cases are monotone, NaiveUCB and Random restrict their search space to actions that exhaust the budget. Means and standard deviations are calculated over 10 independent runs.
Results: The instantaneous rewards over time steps is shown in Figure 1. Under both TS and IS constraints, the results can be summarized as follows. In the early stages (before 4,000), ETCG explores suboptimal actions that do not use the full budget, leading to worse initial instantaneous rewards. However, ETCG catches up in later stages and achieves lower cumulative regret over the entire time horizon. We note that NaiveUCB does UCB on all actions that exhaust budgets, which takes approximately for TS and for IS, respectively, to even explore each action once, resulting a bad performance.


9 Conclusion
In this paper, we investigate the problem of online -submodular maximization problem under bandit feedback. Utilizing the framework proposed by Nie et al. (2023a), we propose CMAB algorithms through adapting various offline algorithms and analyzing their robustness. We obtain sublinear regret for online -submodular maximization under bandit feedback under various settings, including monotone and non-monotone objectives without constraint, monotone objectives with individual size constraint, monotone and non-monotone objectives with matroid constraint. Numerical experiments on online influence maximization with topics were conducted to evaluate the effectiveness of our methods.
References
- Blackwell (1956) Blackwell, D. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6:1–8, 1956.
- Blondel et al. (2008) Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008:10008, 2008.
- Chen et al. (2022) Chen, J., Tang, Z., and Wang, C. Monotone k-submodular knapsack maximization: An analysis of the greedy+singleton algorithm. In Ni, Q. and Wu, W. (eds.), Algorithmic Aspects in Information and Management, pp. 144–155, Cham, 2022. Springer International Publishing. ISBN 978-3-031-16081-3.
- Ene & Nguyen (2022) Ene, A. and Nguyen, H. Streaming algorithm for monotone k-submodular maximization with cardinality constraints. In International Conference on Machine Learning, pp. 5944–5967. PMLR, 2022.
- Fourati et al. (2023) Fourati, F., Aggarwal, V., Quinn, C., and Alouini, M.-S. Randomized greedy learning for non-monotone stochastic submodular maximization under full-bandit feedback. In International Conference on Artificial Intelligence and Statistics, pp. 7455–7471. PMLR, 2023.
- Fourati et al. (2024a) Fourati, F., Alouini, M.-S., and Aggarwal, V. Federated combinatorial multi-agent multi-armed bandits. In Forty-first International Conference on Machine Learning, 2024a.
- Fourati et al. (2024b) Fourati, F., Quinn, C. J., Alouini, M.-S., and Aggarwal, V. Combinatorial stochastic-greedy bandit. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 12052–12060, 2024b.
- Huber & Kolmogorov (2012) Huber, A. and Kolmogorov, V. Towards minimizing k-submodular functions. In International Symposium on Combinatorial Optimization, 2012.
- Iwata et al. (2013) Iwata, S., ichi Tanigawa, S., and Yoshida, Y. Bisubmodular function maximization and extensions. Technical Report METR 2013–16, the University of Tokyo, September 2013.
- Iwata et al. (2015) Iwata, S., Tanigawa, S.-i., and Yoshida, Y. Improved approximation algorithms for k-submodular function maximization. In ACM-SIAM Symposium on Discrete Algorithms, 2015.
- Kempe et al. (2003) Kempe, D., Kleinberg, J., and Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 137–146, 2003.
- Khuller et al. (1999) Khuller, S., Moss, A., and Naor, J. S. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39–45, 1999.
- Leskovec & Mcauley (2012) Leskovec, J. and Mcauley, J. Learning to discover social circles in ego networks. volume 25, 2012.
- Lin et al. (2015) Lin, T., Li, J., and Chen, W. Stochastic online greedy learning with semi-bandit feedbacks. In Proceedings of the 29th International Conference on Neural Information Processing Systems, pp. 352–360, 2015.
- Matsuoka & Ohsaka (2021) Matsuoka, T. and Ohsaka, N. Maximization of monotone -submodular functions with bounded curvature and non--submodular functions. In Asian Conference on Machine Learning, pp. 1707–1722. PMLR, 2021.
- Niazadeh et al. (2021) Niazadeh, R., Golrezaei, N., Wang, J. R., Susan, F., and Badanidiyuru, A. Online learning via offline greedy algorithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, pp. 737–738, 2021.
- Nie et al. (2022) Nie, G., Agarwal, M., Umrawal, A. K., Aggarwal, V., and Quinn, C. J. An explore-then-commit algorithm for submodular maximization under full-bandit feedback. In Uncertainty in Artificial Intelligence, pp. 1541–1551. PMLR, 2022.
- Nie et al. (2023a) Nie, G., Nadew, Y. Y., Zhu, Y., Aggarwal, V., and Quinn, C. J. A framework for adapting offline algorithms to solve combinatorial multi-armed bandit problems with bandit feedback. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 26166–26198. PMLR, 23–29 Jul 2023a.
- Nie et al. (2023b) Nie, G., Zhu, Y., Nadew, Y. Y., Basu, S., Pavan, A., and Quinn, C. J. Size-constrained k-submodular maximization in near-linear time. In Evans, R. J. and Shpitser, I. (eds.), Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, volume 216 of Proceedings of Machine Learning Research, pp. 1545–1554. PMLR, 31 Jul–04 Aug 2023b.
- Ohsaka & Yoshida (2015) Ohsaka, N. and Yoshida, Y. Monotone k-submodular function maximization with size constraints. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
- Oshima (2021) Oshima, H. Improved randomized algorithm for k-submodular function maximization. SIAM Journal on Discrete Mathematics, 35(1):1–22, 2021.
- Pham et al. (2021) Pham, C. V., Vu, Q. C., Ha, D. K., and Nguyen, T. T. Streaming algorithms for budgeted k-submodular maximization problem. In Computational Data and Social Networks: 10th International Conference, Proceedings, pp. 27–38. Springer, 2021.
- Sakaue (2017) Sakaue, S. On maximizing a monotone k-submodular function subject to a matroid constraint. Discrete Optimization, 23:105–113, 2017. ISSN 1572-5286.
- Soma (2019) Soma, T. No-regret algorithms for online -submodular maximization. In Chaudhuri, K. and Sugiyama, M. (eds.), Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pp. 1205–1214. PMLR, 16–18 Apr 2019.
- Streeter & Golovin (2008) Streeter, M. and Golovin, D. An online algorithm for maximizing submodular functions. In Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1577–1584, 2008.
- Sun et al. (2022) Sun, Y., Liu, Y., and Li, M. Maximization of -submodular function with a matroid constraint. In Du, D.-Z., Du, D., Wu, C., and Xu, D. (eds.), Theory and Applications of Models of Computation, pp. 1–10, Cham, 2022. Springer International Publishing.
- Sviridenko (2004) Sviridenko, M. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32:41–43, 2004.
- Tajdini et al. (2023) Tajdini, A., Jain, L., and Jamieson, K. Minimax optimal submodular optimization with bandit feedback, 2023.
- Takemori et al. (2020) Takemori, S., Sato, M., Sonoda, T., Singh, J., and Ohkuma, T. Submodular bandit problem under multiple constraints. In Conference on Uncertainty in Artificial Intelligence, pp. 191–200. PMLR, 2020.
- Tang et al. (2022) Tang, Z., Wang, C., and Chan, H. On maximizing a monotone k-submodular function under a knapsack constraint. Operations Research Letters, 50:28–31, 2022.
- Ward & Živný (2014) Ward, J. and Živný, S. Maximizing k-submodular functions and beyond. ACM Transactions on Algorithms, 12:1 – 26, 2014.
- Xiao et al. (2023) Xiao, H., Liu, Q., Zhou, Y., and Li, M. Approximation algorithms for -submodular maximization subject to a knapsack constraint, 2023.
- Yu et al. (2016) Yu, B., Fang, M., and Tao, D. Linear submodular bandits with a knapsack constraint. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
- Yu et al. (2023) Yu, K., Li, M., Zhou, Y., and Liu, Q. On maximizing monotone or non-monotone k-submodular functions with the intersection of knapsack and matroid constraints. J. Comb. Optim., 45(3), apr 2023. ISSN 1382-6905.
- Yue & Guestrin (2011) Yue, Y. and Guestrin, C. Linear submodular bandits and their application to diversified retrieval. Advances in Neural Information Processing Systems, 24, 2011.
Appendix A Missing Proofs
A.1 Proof of Lemma 3.2
We restate the lemma here:
Lemma A.1.
Let . Conditioning on , suppose that
(12) |
holds for each with , where . Then .
Proof.
Fix . Conditioning on , element will be randomly assigned a type following probabilities described in Lines 9-11 in the pseudocode. We first calculate the conditional expected difference between and . Let denote the type of in (and thus in ). By construction, since is obtained from by changing (which is the same as and thus is with probability ) with 0, we have
(13) |
Also, can be obtained from by replacing (which is 0) with (which is ), we have
(14) |
Thus
(using (13) and (14)) | ||||
(definition of ) | ||||
(15) |
where the last equality is due to . can be obtained from by replacing (which is 0) with (which is with probability ), we have
(by construction) | ||||
(definition of ) | ||||
(16) |
where both expectations are taken over the randomness of the algorithm. Combining (15), (16) and (12), we have
(17) |
Note that and by construction. Hence
(by construction) | ||||
(using (17)) | ||||
() |
and we get the statement. ∎
A.2 Proof of Proposition 3.3
We repeat some notations that we will use in our analysis that have already been introduced in the main text.
Let be an optimal solution with (by Proposition 3.1 such a solution exists). Let be the output of the algorithm with using surrogate function . We consider the -th iteration of the algorithm, and let be the element of considered in the -th iteration, be the probability that -th type is chosen in the -th iteration, and be the solution after the -th iteration, where . Also for , let be the element in obtained from by replacing the coordinates on with those of , and for let be the element in obtained from by changing with 0.
The subsequent analyses will involve comparing different marginal gains, which we next introduce. For , let , and let , . Due to pairwise monotonicity, we have and for all with , and thus, while the surrogate function does not necessarily have pairwise monotonicity, we have
, | (18) | |||
. | (19) |
for all with since both inequalities involve a sum of four function values of which can be off by at most with respect to . Also from which holds by construction, orthant submodularity implies for all and thus,
(20) |
for all . Without loss of generality, we assume .
Now we are ready to prove Proposition 3.3.
Proof of Proposition 3.3.
It is trivial to see the number of value oracle queries required is , as it iterates through all item-type pairs. The following proof will be dedicated to show the and terms in Definition 2.2.
By Definition 2.2, we aim to show that the output of Algorithm 2 satisfies when the function evaluation is based on surrogate function . By Lemma 3.2, our goal is to show
(21) |
which is equivalent to (4) in main text with and . For simplicity of the description we shall omit the superscript if it is clear from the context. Recall that and for with , and for (c.f. (18),(19) and (20)). We break up the analysis to three cases: , and .
Case 1:
If , then by Line 9 in Algorithm 2 and for . So (21) specializes to . Since for and are ranked in decreasing order, we have and for . Summing these two inequalities we have .
Sub-case 1.a:
If ,
showing the statement.
Sub-case 1.b:
Case 2:
Sub-case 2.a:
Sub-case 2.b:
Case 3:
When , our goal is to show
(24) |
which is equivalent to (4) with and . By the ordering of and (3), for we have
(25) |
Denote .
Sub-case 3.a: If , we have . Since and since from Line 11 in Algorithm 2, a positive probability is only assigned to positive types with positive values,
Thus (24) follows.
Sub-case 3.b: If and , since from Line 11 in Algorithm 2, a positive probability is only assigned to positive values, we have that by (25). When , since for any by definition of , we have . When ,
(by (2)) | ||||
where the last inequality follows from , and , and follows from Line 11 in Algorithm 2 that the largest single probability assigned is . Thus,
Therefore (24) holds.
Sub-case 3.c: If and , we have
(using (25) and (3)) | ||||
(26) |
The first term equals by construction of ’s and since . Hence it suffices to show that the second term of (26) is greater than or equal to .
Subsub-case 3.c.I: First we consider . By definition of , for all , . The second term of (26) can then be bounded as
(27) |
Subsub-case 3.c.II: Second we consider . Since , we have
(by construction of ’s) | ||||
(geometric sum) | ||||
(28) |
Therefore, if , we get
(using (2) ) | |||
(using (28)) | |||
( and ) |
If . Then by Line 11 in Algorithm 2 since and . Hence, by , the second term of (26) can then be bounded as
(using (2)) | |||
(using (28)) | |||
( and ) |
Thus we conclude that when , we have
(29) |
A.3 Proof of Proposition 4.1
We use the same setup and constructions as in Section A.2 and further, since we are considering the monotone case, we have and for all and . Thus, for the surrogate function , we have
(30) |
Now start to prove Proposition 4.1. Again, we shall omit the superscript if it is clear from the context.
Proof of Proposition 4.1.
It is trivial to see the number of value oracle queries required is , as it iterates through all item-type pairs. The following proof will be dedicated to showing the and terms in Definition 2.2.
By Definition 2.2, we aim to show that the output of Algorithm 3 satisfies when the function evaluation is based on surrogate function . Note that in this setting, Lemma 3.2 still holds. Thus, our goal is to show
(31) |
which is equivalent to (4) in main text with and . Recall that and for with , and for (c.f. (18),(19) and (20)). We break up the analysis into two cases: and .
Case 1:
When , we have for all due to the definition of . In this case, (31) specializes to
(32) |
We have
(using ) | ||||
(using ) | ||||
(using ) | ||||
(using ) | ||||
(using ) |
showing the statement.
Case 2:
When , we have for some . In this case, (31) specializes to
(33) |
Sub-case 2.a:
Sub-case 2.b:
If . Let . We have
(using ) | ||||
(using ) | ||||
(definition of ) | ||||
(34) |
Using the weighted AM-GM inequality, for all , and letting and , we deduce
(35) |
From Holder’s inequality, holds for any non-negative ’s. By setting , we have
(36) |
Plug (36) back into (34), we get
(37) |
which is desired.
∎
A.4 Proof of Proposition 5.1
Proof.
Let be the pair greedily chosen in this iteration using the surrogate function , and let be the solution after this iteration. Let be an optimal solution. For each , we define . Following the construction of Ohsaka & Yoshida (2015), we iteratively define as follows. We have two cases to consider:
Case 1: Suppose that for some . In this case, let be an arbitrary element in . Then, we define as the resulting vector obtained from by assigning 0 to the -th element and the -th element, and then define as the resulting vector obtained from by assigning to the -th element and to the -th element.
Case 2: Suppose that for any . In this case, we set if , and we set to be an arbitrary element in otherwise. Then, we define as the resulting vector obtained from by assigning 0 to the -th element, and then define as the resulting vector obtained from by assigning to the -th element.
Intuitively, we want a transition from to by swapping in item-type pairs in one by one to , in the order of the pair is chosen by the greedy algorithm. In case 1, if the considered item is already in but with another type , we obtain from by assign type 0 to item , and obtain from by assign type to item . In case 2, if there is no type that is assigned to item in , there are two sub-cases. In the first sub-case, if is already in , we set and obtain from by assign type 0 to item . In the second sub-case, if is not assigned any type in , we obtain from by assign type 0 to an arbitrary item with type in , and obtain from by assign type to item . Note that holds for every and , and . Moreover, we have for every . We further denote as analogous to under surrogate function .
We consider in these two cases.
Case 1: In this case, for some . By construction, we have
and
Thus,
(by construction) | ||||
(monotonicity) | ||||
( and orthant submodularity) | ||||
(38) | ||||
(39) | ||||
(40) |
where the last inequality follows from greedy rule and that both pair and are available after we selected , which we verify as follows: since , is still available; is the type of the next greedy pair, thus available; is the item of the next greedy pair, thus available; indicates is still available.
Case 2: In this case, for any .
(by construction) | ||||
(monotonicity) | ||||
( and orthant submodularity) | ||||
(41) | ||||
(greedy rule and the pair is available) | ||||
(42) |
Finally,
(44) | ||||
(using (43)) | ||||
Rearranging, we get .
∎
A.5 Proof of Proposition 6.1
Denote as the set of all bases associated with a matroid . We first state two important lemmas that will be used in the proof. We refer to Sakaue (2017) for the detailed proofs.
Lemma A.2.
(Sakaue, 2017) The size of any maximal optimal solution for maximizing a monotone -submodular function under a matroid constraint is .
Lemma A.3.
(Sakaue, 2017) Suppose and satisfy . Then, for any satisfying , there exists such that .
Now we prove Proposition 6.1.
Proof.
Let be the pair chosen greedily at the th iteration using the surrogate function , and be the solution after the th iteration. Let and , the output of Algorithm 5. Define for each . Let be a maximal optimal solution and let for each . Now we will construct a sequence of vectors satisfying the following:
(45) | ||||
(46) |
More specifically, we see how to obtain from satisfying (45) and (46). Note that and satisfy (45) and (46). We now describe how to obtain from , assuming that satisfies
Since means , and is chosen to satisfy , we see from Lemma A.3 that there exists satisfying . We let and define as the vector obtained by assigning 0 to the th element of . We then define as the vector obtained from by assigning type to element . The vector thus constructed, , satisfies
Furthermore, since satisfies
we have the following property for :
where the strictness of the inclusion for can be easily confirmed from . Thus, applying the above discussion for iteratively, we see the obtained sequence of vectors satisfies (45) and (46). We now prove the following inequality for :
(47) |
From and , we get for each . Thus we obtain the following inclusion from (M2) for each :
Hence, from , we get , where is constructed by the independence oracle and contains the available elements given current solution . Therefore, for the pair , which is chosen greedily, we have
(48) |
Furthermore, since holds, orthant submodularity implies
(49) |
(by construction) | ||||
(monotonicity) | ||||
(using (49)) | ||||
(using (48)) | ||||
(50) |
and we get (47). Finally,
(51) | ||||
(using (47)) | ||||
Rearranging, we get .
∎
A.6 Proof of Proposition 7.1
The following lemma guarantees that there exists an optimal solution with all items included (so we just need to decide on the types).
Lemma A.4.
(Sun et al., 2022) The size of any maximal optimal solution for maximizing a non-monotone -submodular function under a matroid constraint is still .
Now we prove Proposition 7.1.
Proof.
We construct the sequence , the same way as in the proof for Proposition 6.1. By the same construction, we have that
(52) | ||||
(53) |
and we still have (48) and (49). Since we do not have monotonicity anymore, we will exploit the pairwise submodularity property. Besides the chosen type for the item at each iteration of Algorithm 5, we consider another type . From pariwise monotonicity we have
(54) |
We perform similar calculation as in the proof for Proposition 6.1:
(using (48)) | ||||
(using (49)) | ||||
(using (54)) | ||||
( as shown in previous section) | ||||
(greedy rule) | ||||
Adding both sides by we get
(55) |
Finally,
(56) | ||||
(using (55)) | ||||
Rearranging, we get . ∎
Appendix B Missing Offline Algorithms
We present the pseudo-codes that is missing in the main sections. Algorithm 3 is adapted from Iwata et al. (2015) for the problem of maximizing an unconstrained monotone -submobular function; Algorithm 4 is proposed in Ohsaka & Yoshida (2015) for the problem of maximizing a monotone -submobular function under IS constraints; and Algorithm 5 is proposed in Sakaue (2017) for the problem of maximizing a monotone -submobular function under a matroid constraint. Later Sun et al. (2022) showed that Algorithm 5 can also handle non-monotone objective functions. One thing to be noticed is that in Line 4, Algorithm 5 requires the value of , the rank of the matroid. However, in practice, we need not calculate the value of beforehand. Instead, we continue the iteration while is nonempty, which we check in Line 5. We can confirm that this modification does not change the output as follows. As long as , exactly one element is added to at each iteration due to (M3), and, if , the iteration stops since is a maximal independent set.
Appendix C More Related Works
Offline -submodular function maximization:
-submodular functions were first introduced by Huber & Kolmogorov (2012) as a generalization of bisubmodular functions, which correspond to . For unconstrained -submodular maximization, Iwata et al. (2015) showed that even achieving an approximation ratio is NP-hard. If the objective is monotone, there exists a deterministic algorithm achieving approximation (Ward & Živný, 2014) and a randomized algorithm achieving approximation guarantee (Iwata et al., 2015). When the objective is non-monotone, Ward & Živný (2014) achieved approximation guarantee using number of function evaluations, where . It has been further improved to (Iwata et al., 2015) and (Oshima, 2021).
For monotone size constraints, two types of constraints have been considered in the literature, namely total size (TS) constraints, where the number of items selected shares a common budget, and individual size (IS) constraints, where each of the types has a budget. Ohsaka & Yoshida (2015) analyzed the greedy algorithm and obtained and approximation guarantees for total size (TS) constraints and individual size (IS) constraints, respectively. Nie et al. (2023b) proposed threshold greedy algorithms for monotone TS and IS with improved query complexity.
For maximizing a monotone -submodular function under a matroid constraint, Sakaue (2017) showed the greedy algorithm can achieve approximation, and Matsuoka & Ohsaka (2021) proposed an algorithm that achieves approximation, where is the curvature. When the objective function is non-monotone, a approximation algorithm is presented in Sun et al. (2022). When the curvature is known, Matsuoka & Ohsaka (2021) proposed an algorithm that achieves a -approximation for matroid constraints and -approximation for individual size constraints.
For other constraints, Tang et al. (2022); Chen et al. (2022) proposed algorithms inspired by Khuller et al. (1999); Sviridenko (2004) for knapsack constraints. Xiao et al. (2023) considered knapsack constraints under both monotone or non-monotone cases. Yu et al. (2023) considered intersection of knapsack and matroid constraints. Pham et al. (2021) proposed an algorithm for the streaming setting under knapsack constraints. While the term “streaming” often implies an online context, it is important to note the setup is distinct from CMAB problems. In streaming, an exact value oracle is available, the action space is restricted to (subsets of) the elements revealed so far, and the evaluation is after the stream complete. Here, the term “online” specifically refers to the arrival of data has some ordering.
Appendix D Applications
In this section, we give some real world appications that motivates our problem of interests.
Application 1 (Influence maximization with topics): Let be a social network with an edge probability for each edge , representing the probability of influencing on the -th topic. Given a seed set , the diffusion process of the rumor about the -th topic starts by activating vertices in , and propagates independently from other topics. The goal is to maximize the total influence (number of users that is influenced by at least one topic). In the well-studied models of influence propagation — the independent cascades and the linear threshold models — the influence function is monotone and -submodular (Ohsaka & Yoshida, 2015). In an online version of the problem, the underlying graph structure is not known to the agent. For each time step, the agent selects a seed set, and observes a reward representing the number of influenced users.
Application 2 (Sensor placement): We are given types of sensors, each of which provides different measurements, and we have sensors of type for each . The ground set is a set of possible locations for the sensors. The goal is to equip each location with at most one sensor in order to maximize the total information gained from the sensors. This problem can also be modeled as -submodular maximization with an individual size constraint (Ohsaka & Yoshida, 2015). In an online version of this problem, for each time step, we select locations to place the sensors, then we collect data to calculate entropy, and in the next time step we repeat the same process by re-allocating sensors to different locations.
Application 3 (Ad allocation): We have advertisers that are known in advance and ad impressions that arrive online one at a time. Each advertiser has a contract of impressions. For each impression and each advertiser , there is a non-negative weight that captures how much value advertiser accrues from being allocated impression . When impression arrives, the values are revealed, and the algorithm needs to allocate impression to at most one advertiser. Letting denote the set of impressions allocated to advertiser , the total revenue is . This problem can be formulated as a special case of -submodular maximization with a partition matroid constraint (Ene & Nguyen, 2022).