This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Online Learning for Min Sum Set Cover and Pandora’s Box

Evangelia Gergatsouli
UW-Madison
[email protected]
   Christos Tzamos
UW-Madison
[email protected]
  

Two central problems in Stochastic Optimization are Min Sum Set Cover and Pandora’s Box. In Pandora’s Box, we are presented with nn boxes, each containing an unknown value and the goal is to open the boxes in some order to minimize the sum of the search cost and the smallest value found. Given a distribution of value vectors, we are asked to identify a near-optimal search order. Min Sum Set Cover corresponds to the case where values are either 0 or infinity.

In this work, we study the case where the value vectors are not drawn from a distribution but are presented to a learner in an online fashion. We present a computationally efficient algorithm that is constant-competitive against the cost of the optimal search order. We extend our results to a bandit setting where only the values of the boxes opened are revealed to the learner after every round. We also generalize our results to other commonly studied variants of Pandora’s Box and Min Sum Set Cover that involve selecting more than a single value subject to a matroid constraint.

1 Introduction

One of the fundamental problems in stochastic optimization is Pandora’s Box problem, first introduced by Weitzman in [Wei79]. The problem asks to select among nn alternatives, called boxes, one with a low value. In the stochastic version of the problem, it is assumed that values in the boxes are drawn from a known distribution and the actual realization of any box can be revealed at a cost after inspection.

The goal is to design an algorithm that efficiently searches among the nn alternatives to find a low value while also paying a low inspection cost. We thus aim to minimize the sum of the search costs the algorithm pays and the value of the alternative(s) it chooses in the end. In the standard version of Pandora’s Box a single value must be chosen, but we also consider common generalizations that require kk distinct alternatives to be chosen, or alternatives that form a matroid basis.

While most of the literature has focused on the stochastic case, where there is a known distribution of values given in advance, we instead consider an online version of the problem played over TT rounds, where in each round a different realization of values in the boxes is adversarially chosen. The goal of the learner is to pick a good strategy of opening boxes in every round that guarantees low regret compared to choosing in hindsight the optimal policy for the TT rounds from a restricted family of policies.

In this work, we mainly consider policies that fix the order in which boxes are explored but have free arbitrary stopping rules. Such policies are called partially-adaptive and are known to be optimal in many cases, most notably in the stochastic version where the values of the boxes are drawn independently. Such policies are also optimal in the special case of the Pandora’s Box problem where the values in the boxes are either 0 or \infty. This case is known as the Min Sum Set Cover problem (MSSC) and is a commonly studied problem in the area of approximation algorithms.

1.1 Our Results

Our work presents a simple but powerful framework for designing online learning algorithms for Pandora’s Box, MSSC and other related problems. Our framework yields approximately low-regret algorithms for these problems through a three step process:

  1. 1.

    We first obtain convex relaxations of the instances of every round.

  2. 2.

    We then apply online-convex optimization to obtain good fractional solutions to the relaxed instances achieving low regret.

  3. 3.

    We finally round the fractional solutions to integral solutions for the original instances at a small multiplicative loss.

Through this framework, we obtain a

  • 9.229.22-approximate no-regret algorithm for the problem of selecting 11 box.

  • O(1)O(1)-approximate no-regret algorithm for the problem of selecting kk boxes.

  • O(logk)O(\log k)-approximate no-regret algorithm for the problem of selecting a rank kk matroid basis.

We start by presenting these results in the full information setting (section 3) where the values of all boxes are revealed after each round, once the algorithm has made its choices.

A key contribution of our work is to further extend these results to a more-realistic bandit setting (section 4). In this setting, the algorithm only observes the values for the boxes it explored in each round and can only use this information to update its strategy for future rounds. In each round there is also the option of obtaining the full information by paying a price. We show that even under this more pessimistic setting we can obtain approximately no-regret algorithm with the same approximation guarantees as above.

We also provide stronger regret guarantees against more restricted classes of algorithms for the Pandora’s Box and MSSC problems that are non-adaptive (section 5).

All the algorithms we develop in this paper are computationally efficient. As such, the approximation guarantees given above are approximately tight since it is NP-hard to improve on these beyond small constants even when competing with the simpler non-adaptive benchmark. In particular, it was shown in [FLT04] that even the special case of MSSC is APX-hard and cannot be approximated within a smaller factor than 44. It is an interesting open question to what extent these bounds can be improved with unlimited computational power. While in the stochastic version, this would trivialize the problem, in the online setting the obtained approximation factors may still be necessary information theoretically.

1.2 Comparison with Previous Work

Our work is closely related to the work of [CGT+20]. In that work, the authors study a stochastic version of Pandora’s Box with an arbitrarily correlated distribution and aim to approximate the optimal partially adaptive strategies. We directly extend all the results of [CGT+20] in the online non-stochastic setting, where we are required at each round to solve an instance of the problem.

Another very related paper is the work of [FLPS20] that also studies the online learning problem but focuses specifically on the Min Sum Set Cover problem and its generalization (GMSSC) that asks to select kk alternatives instead of one. Our work significantly improves their results in several ways.

  • We provide a simpler algorithm based on online convex optimization that does not rely on calculating gradients. We immediately obtain all our results through the powerful framework that we develop.

  • This allows us to study more complex constraints like matroid rank constraints as well as study the more general Pandora’s Box. It is challenging to extend the results of [FLPS20] to such settings while keeping the required gradient computation task computationally tractable.

  • Finally, we extend their results to a more natural bandit setting, where after each round we only have information about the alternatives that we explored rather than the whole instance.

In another recent work similar to ours, Esfandiari et al. [EHLM19] consider a Multi-armed bandit version of Pandora’s Box problem which however greatly differs with ours in the following ways.

  • In their setting each box has a type, and the algorithm is required to pick one box per type, while in our case the game is independent in each round.

  • Their benchmark is a “prophet” who can choose the maximum reward per type of box, at the end of TT rounds.

  • The decision to pick a box is irrevocable111The algorithm decides when seeing a box whether to select it or not, and cannot “go back” and select the maximum value seen. and they only consider threshold policies, as they relate the problem to prophet inequalities (see surveys [HK92, Luc17, CFH+18] for more details on prophet inequalities).

1.3 Related Work

We model our search problem using Pandora’s Box, which was first introduced by Weitzman in the Economics literature [Wei79]. Since then, there has been a long line of research studying Pandora’s Box and its variants e.g. where boxes can be selected without inspection [Dov18, BK19], there is correlation between the boxes [CGT+20], the boxes have to be inspected in a specific order [BFLL20] or boxes are inspected in an online manner [EHLM19]. Some work is also done in the generalized setting where more information can be obtained for a price [CFG+00, GK01, CJK+15, CHKK15]. Finally a long line of research considers more complex combinatorial constraints like budget constraints [GGM06], packing constraints [GN13], matroid constraints [ASW16], maximizing a submodular function [GNS16, GNS17], an approach via Markov chains [GJSS19] and various packing and covering constraints for both minimization and maximization problems [Sin18]. In the special case of MSSC, the line of work was initiated by [FLT04], and continued with improvements and generalizations to more complex constraints [AGY09, MBMW05, BGK10, SW11].

On the other hand, our work advances a recent line of research on the foundations of data-driven algorithm design, started by Gupta and Roughgarden [GR17], and continued by [BNVW17, BDSV18, BDV18, KLL17, WGS18, AKL+19], where they study parameterized families of algorithms in order to learn parameters to optimize the expected runtime or performance of the algorithm with respect to the underlying distribution. Similar work was done before [GR17] on self-improving algorithms [ACCL06, CMS10].

Furthermore, our results directly simplify and generalize the results of [FLPS20] in the case of partial feedback. Related to the partial feedback setting, [FKM05] consider single bandit feedback and [ADX10] consider multi-point bandit feedback. Both these works focus on finding good estimators for the gradient in order to run a gradient descent-like algorithm. For more pointers to the online convex optimization literature, we refer the reader to the survey by Shalev-Shwartz [Sha12] and the initial primal-dual analysis of the Follow the Regularized Leader family of algorithms by [SS07].

2 Preliminaries

We evaluate the performance of our algorithms using average regret. We define the average regret of an algorithm 𝒜\mathcal{A} against a benchmark OPT, over a time horizon TT as

RegretOPT(𝒜,T)=1Tt=1T(𝒜(t)OPT(t))\text{Regret}_{\text{OPT}}(\mathcal{A},T)=\frac{1}{T}\sum_{t=1}^{T}\left(\mathcal{A}(t)-\text{OPT}(t)\right) (1)

where 𝒜(t)\mathcal{A}(t) and OPT(t)\text{OPT}(t) is the cost at round tt of 𝒜\mathcal{A} and OPT respectively. We similarly define the average α\alpha-approximate regret against a benchmark OPT as

α-RegretOPT(𝒜,T)=1Tt=1T(𝒜(t)αOPT(t)).\text{$\alpha$-Regret}_{\text{OPT}}(\mathcal{A},T)=\frac{1}{T}\sum_{t=1}^{T}\left(\mathcal{A}(t)-\alpha\text{OPT}(t)\right). (2)

We say that an algorithm 𝒜\mathcal{A} is no regret if RegretOPT(𝒜,T)=o(1)\text{Regret}_{\text{OPT}}(\mathcal{A},T)=o(1). Similarly, we say that 𝒜\mathcal{A} is α\alpha-approximate no regret if α-RegretOPT(𝒜,T)=o(1)\text{$\alpha$-Regret}_{\text{OPT}}(\mathcal{A},T)=o(1). Observe the we are always competing with an oblivious adversary, that selects the one option that minimizes the total loss over all rounds.

2.1 Problem Definitions

In Pandora’s Box we are given a set \mathcal{B} of nn boxes with unknown costs and a set of possible scenarios that determine these costs. In each round t[T]t\in[T], an adversary chooses the instantiation of the costs in the boxes, called a scenario. Formally, a scenario at time tt is a vector 𝒄(t)n\bm{c}(t)\in\mathbb{R}^{n} for any t[T]t\in[T], where cisc_{i}^{s} denotes the cost for box ii when scenario ss is instantiated. Note that without loss of generality, we can assume that cinc_{i}\leq n, since if some is more than nn we can ignore them, and if all are above nn we automatically get a 22 approximation222Since opening all boxes to find the minimum value costs us at most n+minicin+\min_{i\in\mathcal{B}}c_{i}, and the optimal also pays at least nn.

The goal of the algorithm at every round is to choose a box of small cost while spending as little time as possible gathering information. The algorithm cannot directly observe the instantiated scenario, however, it is allowed to “open” boxes one at a time. When opening a box, the algorithm observes the cost inside the box. In total, we want to minimize the regret over TT rounds, relative to the optimal algorithm.

Formally, let 𝒫t\mathcal{P}_{t} and citc_{i}^{t} be the set of boxes opened and the cost of the box selected respectively by the algorithm at round t[T]t\in[T]. The cost of the algorithm 𝒜\mathcal{A} at round tt is 𝒜(t)=mini𝒫tcit+|𝒫t|\mathcal{A}(t)=\min_{i\in\mathcal{P}_{t}}c_{i}^{t}+|\mathcal{P}_{t}| and the goal is to minimize regret RegretOPT(𝒜,T)\text{Regret}_{\text{OPT}}(\mathcal{A},T).

Any algorithm can be described by a pair (σ,τ)(\sigma,\tau), where σ\sigma is a permutation of the boxes representing the order in which they are opened, and τ\tau is a stopping rule – the time at which the algorithm stops opening and returns the minimum cost it has seen so far. Observe that in its full generality, an algorithm may choose the next box to open and the stopping time as a function of the identities and costs of the previous opened boxes.

Different Benchmarks.

As observed in [CGT+20], optimizing over the class of all such algorithms is intractable, therefore simpler benchmarks are considered.

  • The Non-adaptive Benchmark (NA): in this case the adversary chooses all the TT scenarios about to come, and selects a fixed set of boxes to open, which is the same in every round. In this case, the OPT(t)\text{OPT}(t) term in the regret does not depend on tt.

  • The Partially-adaptive Benchmark (PA): in this case, the adversary can have a different set of boxes to open in each round, which can depend on the algorithm’s choices in rounds 1,,t11,\ldots,t-1.

An important special case.

A special case of Pandora’s Box is the Min Sum Set Cover problem (MSSC). In this problem, the costs inside the boxes are either 0 or \infty. We say a scenario is covered or satisfied if, in our solution, we have opened a box that has value 0 for this scenario.

General feasibility constraints.

We also study two more complex extensions of the problem. In the first one we are required to select exactly kk boxes for some k1k\geq 1, and in the second, the algorithm is required to select a basis of a given matroid. We design partially-adaptive strategies that are approximately no-regret, for the different constraints and benchmarks described in this section.

2.2 Relaxations

2.2.1 Scenario - aware Relaxation

Observe that the class of partially-adaptive strategies is still too large and complex, since the stopping rule can arbitrarily depend on the costs observed in boxes upon opening. One of the main contributions of [CGT+20], which we are using in this work too, is that they showed it is enough to design a strategy that chooses an ordering of the boxes and performs well, assuming that we know when to stop. This relaxation of partially-adaptive, called scenario-aware partially-adaptive (SPA), greatly diminishes the space of strategies to consider, and makes it possible to design competitive algorithms, at the cost of an extra constant factor. This is formally stated in the lemma below. The proof can be found in [CGT+20] and it is based on a generalization of ski-rental [KMMO90].

Lemma 2.1 (Simplification of Theorem 3.4 from [CGT+20]).

For a polynomial, in the number of boxes, α\alpha-approximate algorithm for scenario-aware partially adaptive strategies, there exists a polynomial time algorithm that is a ee1α\frac{e}{e-1}\alpha-approximation partially-adaptive strategy.

2.2.2 Fractional Relaxation and Rounding

This first relaxation allows us to only focus on designing efficient SPA strategies which only require optimizing over the permutation of boxes. However both MSSC and Pandora’s Box are non-convex problems. We tackle this issue by using a convex relaxation of the problems, given by their linear programming formulation.

Definition 1 (Convex Relaxation).

Let Π\Pi be a minimization problem over a domain XX with g:Xg:X\rightarrow\mathbb{R} as its objective function, we say that a function g¯:X¯\overline{g}:\overline{X}\rightarrow\mathbb{R} is a convex relaxation of gg, if

  1. 1.

    The function g¯\overline{g} and its domain X¯\overline{X} are convex.

  2. 2.

    XX¯X\subseteq\overline{X} and for any xXx\in X, g¯(x)g(x)\overline{g}(x)\leq g(x).

Using this definition, for our partially-adaptive benchmark we relax the domain X¯={x[0,1]n×n:ixit=1 and txit=1}\overline{X}=\{x\in[0,1]^{n\times n}:\sum_{i}x_{it}=1\text{ and }\sum_{t}x_{it}=1\} to be the set of doubly stochastic n×nn\times n matrices. We use a convex relaxation g¯s\overline{g}^{s} similar to the one from the generalized min-sum set cover problem in [BGK10] and [SW11], but scenario dependent; for a given scenario ss, the relaxation g¯s\overline{g}^{s} changes. We denote by 𝒯\mathcal{T} the set of nn time steps, by xitx_{it} the indicator variable for whether box ii is opened at time tt, and by zitsz_{it}^{s} the indicator of whether box ii is selected for scenario ss at time tt. We define the relaxation g¯s(𝒙)\overline{g}^{s}(\bm{x}) as

minz0\displaystyle\text{min}_{z\geq 0}\quad i,t𝒯(t+cis)zits\displaystyle\sum_{i\in\mathcal{B},t\in\mathcal{T}}(t+c_{i}^{s})z_{it}^{s} (Relaxation-SPA)
s.t. t𝒯,izits=1,\displaystyle\sum_{t\in\mathcal{T},i\in\mathcal{B}}z_{it}^{s}=1,
zitsxit,\displaystyle\hskip 19.91684ptz_{it}^{s}\leq x_{it}, i,t𝒯.\displaystyle i\in\mathcal{B},t\in\mathcal{T}.

Similarly, we also relax the problem when we are required to pick kk boxes (Relaxation-SPA-k) and when we are required to pick a matroid basis (Relaxation-SPA-matroid).

Leveraging the results of [CGT+20], in sections C.1, C.2 and C.3 of the appendix, we show how to use a rounding that does not depend on the scenario chosen in order to get an approximately optimal integer solution, given one for the relaxation. Specifically, we define the notion of α\alpha-approximate rounding.

Definition 2 (α\alpha-approximate rounding).

Let Π\Pi be a minimization problem over a domain XX with f:Xf:X\rightarrow\mathbb{R} as its objective function and a convex relaxation f¯:X¯\overline{f}:\overline{X}\rightarrow\mathbb{R}. Let x¯X¯\overline{x}\in\overline{X} be a solution to Π\Pi with cost f¯(x¯)\overline{f}(\overline{x}). Then an α\alpha-approximate rounding is a an algorithm that given x¯\overline{x} produces a solution xXx\in X with cost

f(x)αf¯(x¯)f(x)\leq\alpha\overline{f}(\overline{x})

3 Full Information Setting

We begin by presenting a general technique for approaching Pandora’s Box type of problems via Online Convex Optimization (OCO). Initially we observe, in the following theorem, that we can combine

  1. 1.

    a rounding algorithm with good approximation guarantees,

  2. 2.

    an online minimization algorithm with good regret guarantees

to obtain an algorithm with good regret guarantee.

Theorem 3.1.

Let Π\Pi be a minimization problem over a domain XX and Π¯\overline{\Pi} be the convex relaxation of Π\Pi over convex domain X¯X\overline{X}\supseteq X.

If there exists an α\alpha-approximate rounding algorithm 𝒜:X¯X\mathcal{A}:\overline{X}\rightarrow X for any feasible solution x¯X¯\overline{x}\in\overline{X} to a feasible solution xXx\in X then, any online minimization algorithm for Π¯\overline{\Pi} that achieves regret Regret(T)\text{Regret}(T) against a benchmark OPT, gives α\alpha-approximate regret αRegret(T)\alpha\text{Regret}(T) for Π\Pi.

Proof of Theorem 3.1.

Let f1,,fTf_{1},...,f_{T} be the online sequence of functions presented in problem Π\Pi, in each round t[T]t\in[T], and let f¯1,,f¯T\overline{f}_{1},...,\overline{f}_{T} be their convex relaxations in Π¯\overline{\Pi}.

Let x¯tX¯\overline{x}_{t}\in\overline{X} be the solution the online convex optimization algorithm gives at each round t[T]t\in[T] for problem Π¯\overline{\Pi}. Calculating the total expected cost of Π\Pi, for all time steps t[T]t\in[T] we have that

𝔼[t=1Tft(𝒜(xt¯))]\displaystyle\mathbb{E}\left[\sum_{t=1}^{T}f_{t}\big{(}\mathcal{A}(\overline{x_{t}})\big{)}\right] αt=1Tf¯t(x¯t)\displaystyle\leq\alpha\sum_{t=1}^{T}\overline{f}_{t}(\overline{x}_{t})
α(Regret(T)+minxX¯t=1Tf¯t(x))\displaystyle\leq\alpha\left(\text{Regret}(T)+\min_{x\in\overline{X}}\sum_{t=1}^{T}\overline{f}_{t}(x)\right)
α(Regret(T)+minxXt=1Tft(x)).\displaystyle\leq\alpha\left(\text{Regret}(T)+\min_{x\in X}\sum_{t=1}^{T}f_{t}(x)\right).

By rearranging the terms, we get the theorem. ∎

Given this theorem, in the following sections we show (1) how to design an algorithm with a low regret guarantee for Pandora’s Box (Theorem 3.3) and (2) how to obtain rounding algorithms with good approximation guarantees, using the results of [CGT+20].

3.1 Applications to Pandora’s Box and MSSC

Applying Theorem 3.1 to our problems, in their initial non-convex form, we are required to pick an integer permutation of boxes. The relaxations, for the different benchmarks and constraints, are shown in Relaxation-SPA, Relaxation-SPA-k and Relaxation-SPA-matroid.

We denote by g¯s(𝒙)\overline{g}^{s}(\bm{x}) the objective function of the scenario aware relaxation of the setting we are trying to solve e.g for selecting 11 box we have Relaxation-SPA. Denote by X¯=[0,1]n×n\overline{X}=[0,1]^{n\times n} the solution space. We can view this problem as an online convex optimization one as follows.

  1. 1.

    At every time step tt we pick a vector 𝒙tX¯\bm{x}_{t}\in\overline{X}, where X¯\overline{X} is a convex set.

  2. 2.

    The adversary picks a scenario s𝒮s\in\mathcal{S} and therefore a function fs:X¯f^{s}:\overline{X}\rightarrow\mathbb{R} where fs=g¯sf^{s}=\overline{g}^{s} and we incur loss fs(𝒙t)=g¯s(𝒙t)f^{s}(\bm{x}_{t})=\overline{g}^{s}(\bm{x}_{t}). Note that fsf^{s} is convex in all cases (Relaxation-SPA, Relaxation-SPA-k, Relaxation-SPA-matroid).

  3. 3.

    We observe the function fsf^{s} for all points 𝒙X¯\bm{x}\in\overline{X}.

A family of algorithms that can be applied to solve this problem is called Follow The Regularized Leader (FTRL). These algorithms work by picking, at every step, the solution that would have performed best so far while also adding a regularization term for stability. For the FTRL family of algorithms we have the following guarantees.

Theorem 3.2 (Theorem 2.11 from [Sha12]).

Let f1,,fTf_{1},\ldots,f_{T} be a sequence of convex functions such that each ftf_{t} is LL-Lipschitz with respect to some norm. Assume that FTRL is run on the sequence with a regularization function UU which is η\eta-strongly-convex with respect to the same norm. Then, for all uCu\in C we have that Regret(FTRL,T)TUmaxUmin+TL2η\text{Regret}(\text{FTRL},T)\cdot T\leq U_{\max}-U_{\min}+TL^{2}\eta

Our algorithm works similarly to FTRL, while additionally rounding the fractional solution, in each step, to an integer one. The algorithm is formally described in Algorithm 1, and we show how to choose the regularizer U(𝒙)U(\bm{x}) in Theorem 3.3.

Input: Π=(,OPT):\Pi=(\mathcal{F},\text{OPT}): the problem to solve, 𝒜Π:\mathcal{A}_{\Pi}: the rounding algorithm for Π\Pi
1 Denote by fs(𝒙)=f^{s}(\bm{x})= fractional objective function
2 Select regularizer U(𝒙)U(\bm{x}) according to Theorem 3.3
3 X¯=\overline{X}= space of fractional solutions
4 for Each round t[T]t\in[T] do
5       Set 𝒙t=min𝒙X¯τ=1t1fsτ(𝒙)+U(𝒙)\bm{x}_{t}=\min_{\bm{x}\in\overline{X}}\sum_{\tau=1}^{t-1}f^{s_{\tau}}(\bm{x})+U(\bm{x})
6       Round 𝒙t\bm{x}_{t} to 𝒙tint\bm{x}_{t}^{\text{int}} according to 𝒜Π\mathcal{A}_{\Pi}
7       Receive loss fs(𝒙tint)f^{s}(\bm{x}_{t}^{\text{int}})
8 end for
Algorithm 1 Algorithm 𝒜\mathcal{A} for the full information case.


We show the guarantees of our algorithm above using Theorem 3.2 which provides regret guarantees for FTRL. The proof of Theorem 3.3 is deferred to section A of the appendix.

Theorem 3.3.

The average regret of Algorithm 1 is

RegretPA(𝒜,T)2nlognT\text{Regret}_{PA}(\mathcal{A},T)\leq 2n\sqrt{\frac{\log n}{T}}

achieved by setting U(𝐱)=(i=1nt=1nxitlogxit)/ηU(\bm{x})=\left(\sum_{i=1}^{n}\sum_{t=1}^{n}x_{it}\log x_{it}\right)/\eta as the regularization function, and η=lognT\eta=\sqrt{\frac{\log n}{T}}.

Finally, using Theorem 3.1 we get Corollary 3.3.1 for competing with the partially-adaptive benchmark for all different feasibility constraints (choose 11, choose kk or choose a matroid basis), where we use the results of Corollary C.0.1, to obtain the guarantees for the rounding algorithms.

Corollary 3.3.1 (Competing against PA, full information).

In the full information setting, Algorithm 1 is

  • 9.229.22-approximate no regret for choosing 11 box

  • O(1)O(1)-approximate no regret for choosing kk boxes

  • O(logk)O(\log k)-approximate no regret for choosing a matroid basis

Remark 3.3.1.

In the special case of MSSC, our approach obtains the tight 44-approximation of the offline case [FLT04]. The details of this are deferred to section C.1 of the Appendix. This result improves on the previous work [FLPS20] who obtain a 11.71311.713-approximation.

4 Bandit Setting

Moving on to a bandit setting for our problem, where we do not observe the whole function after each step. Specifically, after choosing 𝒙tX¯\bm{x}_{t}\in\overline{X} in each round tt, we only observe a loss fs(𝒙t)f^{s}(\bm{x}_{t}) at the point 𝒙t\bm{x}_{t} we chose to play and not for every 𝒙X¯\bm{x}\in\overline{X}. This difference prevents us from directly using any online convex optimization algorithm, as in the full information setting of section 3. However, observe that if we decide to open all nn boxes, this is equivalent to observing the function fsf^{s} for all 𝒙X¯\bm{x}\in\overline{X}, since we learn the cost of all permutations.

We exploit this similarity by randomizing between running FTRL and paying nn to open all boxes. Specifically we split [T][T] into T/kT/k intervals and choose a time, uniformly at random in each one, when we are going to open all boxes nn and thus observe the function on all inputs. This process is formally described in Algorithm 2, and we show the following guarantees.

Theorem 4.1.

The average regret for Algorithm 2, for k=(n2L+logn)2/3T1/3k=\left(\frac{n}{2L+\sqrt{\log n}}\right)^{2/3}T^{1/3} and loss functions that are LL-Lipschitz is

𝔼[RegretPA(𝒜PA,T)]2(2Llogn+n)2/3n1/3T1/3.\mathbb{E}\left[\text{Regret}_{PA}(\mathcal{A}_{\text{PA}},T)\right]\leq 2\left(2L\log n+n\right)^{2/3}\cdot n^{1/3}\cdot T^{-1/3}.
1 Get parameter kk from Theorem 4.1
2 Select regularizer U(𝒙)U(\bm{x}) according to Theorem 4.1
3 Split the times [T][T] into T/kT/k intervals 1,T/k\mathcal{I}_{1}\ldots,\mathcal{I}_{T/k}
\mathcal{R}\leftarrow\emptyset // Random times for each i\mathcal{I}_{i}
4 for Every interval i\mathcal{I}_{i} do
5       Pick a tpt_{p} uniformly in i\mathcal{I}_{i}
6       for All times tit\in\mathcal{I}_{i} do
7             if t=tpt=t_{p} then
8                   {tp}\mathcal{R}\leftarrow\mathcal{R}\cup\{t_{p}\}
9                   Open all boxes
10                   Get feedback fstpf^{s_{t_{p}}}
11                  
12             else
13                   𝒙targmin𝒙X¯τfsτ(𝒙)+U(𝒙)\bm{x}_{t}\leftarrow\text{argmin}_{\bm{x}\in\overline{X}}\sum_{\tau\in\mathcal{R}}f^{s_{\tau}}(\bm{x})+U(\bm{x})
14             end if
15            
16       end for
17      
18 end for
Algorithm 2 𝒜PA\mathcal{A}_{PA} minimizing regret against PA


To analyze the regret of Algorithm 2 and prove Theorem 4.1, we consider the regret of two related settings.

  1. 1.

    In the first setting, we consider a full-information online learner that observes at each round tt a single function sampled uniformly among the kk functions of the corresponding interval t\mathcal{I}_{t}. We call this setting random costs.

  2. 2.

    In the second setting, we again consider a full-information online learner that observes at each round tt a single function which is the average of the kk functions in the corresponding interval t\mathcal{I}_{t}. We call this setting average costs.

The following lemma, shows that any online algorithm for the random cost setting yields low regret even for the average costs setting.

Lemma 4.2.

Any online strategy for the random costs setting with expected average regret R(T)R(T) gives expected average regret at most R(T)+n/kTR(T)+n/\sqrt{kT} for the equivalent average costs setting.

Proof of Lemma 4.2.

Denote by f¯t=1ki=1kfti\overline{f}_{t}=\frac{1}{k}\sum_{i=1}^{k}f_{t_{i}} the cost function corresponding to the average costs setting and by ftr=ftif^{r}_{t}=f_{t_{i}} where iU([k])i\sim U\left([k]\right) the corresponding cost function for the random costs setting. Let x=argmin𝒙X¯t=1T/kf¯t(𝒙)x^{*}=\text{argmin}_{\bm{x}\in\overline{X}}\sum_{t=1}^{T/k}\overline{f}_{t}(\bm{x}) be the minimizer of the f¯t\overline{f}_{t} over the T/kT/k rounds.

We also use Xt=f¯t(𝒙t)ftr(𝒙t)X_{t}=\overline{f}_{t}(\bm{x}_{t})-f^{r}_{t}(\bm{x}_{t}), to denote the difference in costs between the two settings for each interval (where 𝒙t\bm{x}_{t} is the action taken at each interval tt by the random costs strategy). Observe that this is a random variable depending on the random choice of time in each interval. We have that

𝔼[t=1T/k|Xt|]\displaystyle\mathbb{E}\left[\sum_{t=1}^{T/k}|X_{t}|\right] (𝔼[(t=1T/kXt)2])1/2\displaystyle\leq\left(\mathbb{E}\left[\left(\sum_{t=1}^{T/k}X_{t}\right)^{2}\right]\right)^{1/2}
=(𝔼[t=1T/kXt2])1/2\displaystyle=\left(\mathbb{E}\left[\sum_{t=1}^{T/k}X_{t}^{2}\right]\right)^{1/2}
nTk.\displaystyle\leq n\sqrt{\frac{T}{k}}.

The two inequalities follow by Jensen’s inequality and the fact that XtX_{t}’s are bounded by nn. The equality is because the random variables XtX_{t} are martingales, i.e. 𝔼[Xt|X1,,Xt1]=0\mathbb{E}\left[X_{t}|X_{1},...,X_{t-1}\right]=0, as the choice of the function at time tt is independent of the chosen point 𝒙t\bm{x}_{t}.

We now look at the average regret of the strategy 𝒙t\bm{x}_{t} for the average cost setting. We have that

1T𝔼[tf¯t(𝒙t)]R(T)nkT\displaystyle\frac{1}{T}\mathbb{E}\left[\sum_{t}\overline{f}_{t}(\bm{x}_{t})\right]-R(T)-\frac{n}{\sqrt{kT}} 1T𝔼[tftr(𝒙t)]R(T)\displaystyle\leq\frac{1}{T}\mathbb{E}\left[\sum_{t}f^{r}_{t}(\bm{x}_{t})\right]-R(T)
1T𝔼[min𝒙tftr(𝒙)]\displaystyle\leq\frac{1}{T}\mathbb{E}\left[\min_{\bm{x}}\sum_{t}f^{r}_{t}(\bm{x})\right]
1T𝔼[tftr(𝒙)]\displaystyle\leq\frac{1}{T}\mathbb{E}\left[\sum_{t}f^{r}_{t}(\bm{x}^{*})\right]
=tf¯t(𝒙)\displaystyle=\sum_{t}\overline{f}_{t}(\bm{x}^{*})

which implies the required regret bound.

Given this lemma, we are now ready to show Theorem 4.1.

Proof of Theorem 4.1.

To establish the result, we note that the regret of our algorithm is equal to the regret achievable in the average cost setting multiplied by kk plus nT/knT/k since we pay nn for opening all boxes once in each of the T/kT/k intervals. Using Lemma 4.2, it suffices to bound the regret in the random costs setting. Let U(𝒙):[0,1]n×nU(\bm{x}):[0,1]^{n\times n}\rightarrow\mathbb{R} be an η/n\eta/n-strongly convex regularizer used in the FTRL algorithm. We are using U(𝒙)=(i=1nt=1nxitlogxit)/ηU(\bm{x})=\left(\sum_{i=1}^{n}\sum_{t=1}^{n}x_{it}\log x_{it}\right)/\eta, which is η/n\eta/n-strongly convex from Lemma A.1 and is at most (nlogn)/η(n\log n)/\eta as we observed in corollary 3.3.1. Then from Theorem 3.2, we get that the average regret for the corresponding random costs setting is 2LlognkT2L\sqrt{\frac{\log n}{kT}}.

Using Lemma 4.2, we get that the total average regret R(T)R(T) of our algorithm is

R(T)k2LlognkT+kn/kT+nk.R(T)\leq k\cdot 2L\sqrt{\frac{\log n}{kT}}+k\cdot n/\sqrt{kT}+\frac{n}{k}.

Setting k=(n2Llogn+n)2/3T1/3k=\left(\frac{n}{2L\log n+n}\right)^{2/3}T^{1/3} the theorem follows. ∎

Finally, using Theorem 4.1 we can get the same guarantees as the full-information setting, using the α\alpha-approximate rounding for each case (Corollary C.0.1).

Corollary 4.2.1 (Competing against PA, bandit).

In the bandit setting, Algorithm 2 is

  • 9.229.22-approximate no regret for choosing 11 box

  • O(1)O(1)-approximate no regret for choosing kk boxes

  • O(logk)O(\log k)-approximate no regret for choosing a matroid basis

5 Competing with the Non-adaptive

We switch gears towards a different benchmark, that of the non-adaptive strategies. Similarly to the partially adaptive benchmark, here we we first present the linear programming for the non-adaptive benchmark as a function f¯:[0,1]n\overline{f}:[0,1]^{n}\rightarrow\mathbb{R} with f¯(𝒙)\overline{f}(\bm{x}) equal to

minz0\displaystyle\text{min}_{z\geq 0}\quad ixi+1|𝒮|i,s𝒮ciszis\displaystyle\sum_{i\in\mathcal{B}}x_{i}+\frac{1}{|\mathcal{S}|}\sum_{i\in\mathcal{B},s\in\mathcal{S}}c_{i}^{s}z_{i}^{s} (LP-NA)
s.t. izis=1,\displaystyle\sum_{i\in\mathcal{B}}z_{i}^{s}=1, s𝒮\displaystyle\forall s\in\mathcal{S}
zisxi\displaystyle\hskip 19.91684ptz_{i}^{s}\leq x_{i} i,s𝒮\displaystyle\forall i\in\mathcal{B},s\in\mathcal{S}

where xix_{i} is an indicator variable for whether box ii is opened and zisz_{i}^{s} indicates whether box ii is assigned to scenario ss.

Note that the algorithms we provided for the partially-adaptive case cannot be directly applied since the objective functions of LP-NA, LP-NA-k and LP-NA-matroid are not nn-Lipschitz. To achieve good regret bounds in this case, we design an algorithm that randomizes over an “explore” and an “exploit” step, similarly to [AKL+19], while remembering the LP structure of the problem given constraints \mathcal{F}. Observe that there is a “global” optimal linear program (which is either LP-NA, LP-NA-k or LP-NA-matroid depending on the constraints \mathcal{F}) defined over all rounds TT. Getting a new instance in each round is equivalent to receiving a new (hidden) set of constraints. We first describe two functions utilised by the algorithm in order to find a feasible fractional solution to the LP and to round it.

  1. 1.

    Ellipsoid(k,𝒫k,\mathcal{LP}): finds and returns a feasible solution to 𝒫\mathcal{LP} of cost at most kk. By starting from a low kk value and doubling at each step, lines 1010-1313 result in us finding a fractional solution within 22 every time.

  2. 2.

    Round(St,S_{t},\mathcal{F}): rounds the fractional feasible solution StS_{t} using the algorithm corresponding to \mathcal{F}. The rounding algorithms are presented in section D of the appendix. For selecting 11 box we have Algorithm 7, for selecting kk boxes Algorithm 8 and for selecting a matroid basis Algorithm 9.

The algorithm works in two steps; in the “explore” step (line 7) opening all boxes results in us exactly learning the hidden constraint of the current round, by paying nn. The “exploit” step uses the information learned from the previous rounds to open boxes and choose one.

1 Input: set of constraints \mathcal{F}
2 𝒫\mathcal{LP}\leftarrow LP-NA or LP-NA-k or LP-NA-matroid (according to \mathcal{F})
𝒞1\mathcal{C}_{1}\leftarrow\emptyset // Constraints of LP
3 for round t[T]t\in[T] do
4       draw cU[0,1]c\in U[0,1]
5       if c>ptc>p_{t} then
6             Open all nn boxes, inducing new constraint cnewc_{new}
7             𝒞t+1𝒞t{cnew}\mathcal{C}_{t+1}\leftarrow\mathcal{C}_{t}\cup\{c_{new}\}
8             k1k\leftarrow 1
9             repeat
10                   (𝒙,𝒛)(\bm{x},\bm{z})\leftarrow Ellipsoid (kk, 𝒫\mathcal{LP})
11                   k2kk\leftarrow 2k
12            until (𝐱,𝐳)(\bm{x},\bm{z}) is feasible;
13       else
14             StSt1S_{t}\leftarrow S_{t-1}
15             π\pi\leftarrowRound(St,S_{t},\mathcal{F})
16             Open boxes according to order π\pi
17       end if
18      
19 end for
Algorithm 3 Algorithm 𝒜\mathcal{A}_{\mathcal{F}} for minimizing regret vs NA

Observe that the cost of Algorithm 3 comes from three different cases, depending on what the result of the flip of the coin cc is in each round.

  1. 1.

    If c>ptc>p_{t}, we and pay nn for opening all boxes.

  2. 2.

    If c<ptc<p_{t} and we pay cost proportional to the LP (we have a feasible solution).

  3. 3.

    If c<ptc<p_{t} and we pay cost proportional to nn (we did not have a feasible solution).

We bound term 3 using mistake bound, and then continue by bounding terms 1 and 2 to get the bound on total regret.

5.1 Bounding the mistakes

We start by formally defining what is mistake bound of an algorithm.

Definition 3 (Mistake Bound of Algorithm 𝒜\mathcal{A}).

Let 𝒜\mathcal{A} be an algorithm that solves problem Π\Pi and runs in t[T]t\in[T] rounds with input xtx_{t} in each one. Then we define 𝒜\mathcal{A}’s mistake bound as

(𝒜,T)=𝔼[t=1T𝟙{xt not feasible for Π}]\mathcal{M}(\mathcal{A},T)=\mathbb{E}\left[\sum_{t=1}^{T}\mathbbm{1}{\{x_{t}\text{ not feasible for }\Pi\}}\right]

where the expectation is taken over the algorithm’s choices.

The main contribution in this section is the following lemma, that bounds the number of mistakes.

Lemma 5.1.

Algorithm 3 has mistake bound

(𝒜,T)O(n2T).\mathcal{M}(\mathcal{A}_{\mathcal{F}},T)\leq O(n^{2}\sqrt{T}).

The mistake bound applies to all the different constraints \mathcal{F} we consider. To achieve this, we leverage the fact that the ellipsoid algorithm, running on the optimal LP corresponding to the constraints \mathcal{F}, needs polynomial in nn time to find a solution. The proof works by showing that every time, with probability ptp_{t}, we make progress towards the solution, and since the ellipsoid in total makes polynomial in nn steps we also cannot make too many mistakes. The proof of Lemma 5.1 is deferred to section B of the appendix.

5.2 Regret for different constraints

Moving on to show regret guarantees of Algorithm 3 for the different types of constraints. We start off with the special case where we are required to pick one box, but all the costs inside the boxes are either 0 or \infty, and then generalize this to arbitrary costs and more complex constraints.

Theorem 5.2 (Regret for 0/0/\infty).

Algorithm 3, with pt=1/Tp_{t}=1/\sqrt{T} has the following average regret, when ={Select 1 box}\mathcal{F}=\{\text{Select 1 box}\} and ci{0,}c_{i}\in\{0,\infty\}.

𝔼[RegretNA(𝒜,T)]OPT+O(n2T).\mathbb{E}\left[\text{Regret}_{NA}(\mathcal{A}_{\mathcal{F}},T)\right]\leq\text{OPT}+O\left(\frac{n^{2}}{\sqrt{T}}\right).
Proof of Theorem 5.2.

Denote by MM the mistake bound term, bounded above in Lemma 5.1. We calculate the total average regret

𝔼[RegretNA(𝒜,T)]\displaystyle\mathbb{E}\left[\text{Regret}_{NA}(\mathcal{A}_{\mathcal{F}},T)\right] +OPT=1T(M+t=1T𝔼[|St|])\displaystyle+\text{OPT}=\frac{1}{T}\left(M+\sum_{t=1}^{T}\mathbb{E}\left[|S_{t}|\right]\right)
=1T(M+t=1Tptn+(1pt)𝔼[|St|])\displaystyle=\frac{1}{T}\left(M+\sum_{t=1}^{T}p_{t}n+(1-p_{t})\mathbb{E}\left[|S_{t}|\right]\right)
1T(M+t=1Tptn+2OPT)\displaystyle\leq\frac{1}{T}\left(M+\sum_{t=1}^{T}p_{t}n+2\text{OPT}\right)
M+2OPT+nt=1Tpt\displaystyle\leq M+2\text{OPT}+n\sum_{t=1}^{T}p_{t}
2OPT+O(n2T)\displaystyle\leq 2\text{OPT}+O\left(\frac{n^{2}}{\sqrt{T}}\right)

where initially we summed up the total regret of Algorithm 3 where the first term is the mistake bound from Lemma 5.1. Then we used the fact that OPTtOPT\text{OPT}_{t}\leq\text{OPT} and the solution found by the ellipsoid is within 22, and in the last line we used Since t=1TptT\sum_{t=1}^{T}p_{t}\leq\sqrt{T} from [AKL+19]. Finally, subtracting OPT from both sides we get the theorem. ∎

Generalizing this to arbitrary values cic_{i}\in\mathbb{R}, we show that when we are given a β\beta approximation algorithm we get the following guarantees, depending on the approximation factor.

Theorem 5.3.

If there exists a partially adaptive algorithm 𝒜\mathcal{A}_{\mathcal{F}} that is β\beta-competitive against the non-adaptive optimal, for the problem with constraints \mathcal{F}, then Algorithm 3, with pt=1/Tp_{t}=1/\sqrt{T} has the following regret.

𝔼[RegretNA(𝒜,T)]2βOPT+O(n2T).\mathbb{E}\left[\text{Regret}_{NA}(\mathcal{A}_{\mathcal{F}},T)\right]\leq 2\beta\text{OPT}+O\left(\frac{n^{2}}{\sqrt{T}}\right).

The proof follows similarly to the 0/0/\infty case, and is deferred to section B of the appendix. Combining the different guarantees against the non-adaptive benchmark with Theorem 5.3 we get the following corollary.

Corollary 5.3.1 (Competing against NA, bandit setting).

In the bandit setting, when competing with the non-adaptive benchmark, Algorithm 3 is

  • 3.163.16-approximate no regret for choosing 11 box (using Theorem D.1)

  • 12.6412.64-approximate no regret for choosing kk boxes (using Theorem D.3)

  • O(logk)O(\log k)-approximate no regret for choosing a matroid basis (using Theorem D.5)

References

  • [ACCL06] Nir Ailon, Bernard Chazelle, Seshadhri Comandur, and Ding Liu. Self-improving algorithms. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 261–270, 2006.
  • [ADX10] Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Adam Tauman Kalai and Mehryar Mohri, editors, COLT 2010 - The 23rd Conference on Learning Theory, Haifa, Israel, June 27-29, 2010, pages 28–40. Omnipress, 2010.
  • [AGY09] Yossi Azar, Iftah Gamzu, and Xiaoxin Yin. Multiple intents re-ranking. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 669–678. ACM, 2009.
  • [AKL+19] Daniel Alabi, Adam Tauman Kalai, Katrina Ligett, Cameron Musco, Christos Tzamos, and Ellen Vitercik. Learning to prune: Speeding up repeated computations. In Alina Beygelzimer and Daniel Hsu, editors, Conference on Learning Theory, COLT 2019, 25-28 June 2019, Phoenix, AZ, USA, volume 99 of Proceedings of Machine Learning Research, pages 30–33. PMLR, 2019.
  • [ASW16] Marek Adamczyk, Maxim Sviridenko, and Justin Ward. Submodular stochastic probing on matroids. Math. Oper. Res., 41(3):1022–1038, 2016.
  • [BDSV18] Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 353–362, 2018.
  • [BDV18] Maria-Florina Balcan, Travis Dick, and Ellen Vitercik. Dispersion for data-driven algorithm design, online learning, and private optimization. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 603–614, 2018.
  • [BFLL20] Shant Boodaghians, Federico Fusco, Philip Lazos, and Stefano Leonardi. Pandora’s box problem with order constraints. In Péter Biró, Jason D. Hartline, Michael Ostrovsky, and Ariel D. Procaccia, editors, EC ’20: The 21st ACM Conference on Economics and Computation, Virtual Event, Hungary, July 13-17, 2020, pages 439–458. ACM, 2020.
  • [BGK10] Nikhil Bansal, Anupam Gupta, and Ravishankar Krishnaswamy. A constant factor approximation algorithm for generalized min-sum set cover. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 1539–1545, 2010.
  • [BK19] Hedyeh Beyhaghi and Robert Kleinberg. Pandora’s problem with nonobligatory inspection. In Anna Karlin, Nicole Immorlica, and Ramesh Johari, editors, Proceedings of the 2019 ACM Conference on Economics and Computation, EC 2019, Phoenix, AZ, USA, June 24-28, 2019, pages 131–132. ACM, 2019.
  • [BNVW17] Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, and Colin White. Learning-theoretic foundations of algorithm configuration for combinatorial partitioning problems. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, Amsterdam, The Netherlands, 7-10 July 2017, pages 213–274, 2017.
  • [CFG+00] Moses Charikar, Ronald Fagin, Venkatesan Guruswami, Jon M. Kleinberg, Prabhakar Raghavan, and Amit Sahai. Query strategies for priced information (extended abstract). In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, May 21-23, 2000, Portland, OR, USA, pages 582–591, 2000.
  • [CFH+18] José R. Correa, Patricio Foncea, Ruben Hoeksma, Tim Oosterwijk, and Tjark Vredeveld. Recent developments in prophet inequalities. SIGecom Exch., 17(1):61–70, 2018.
  • [CGT+20] Shuchi Chawla, Evangelia Gergatsouli, Yifeng Teng, Christos Tzamos, and Ruimin Zhang. Pandora’s box with correlations: Learning and approximation. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 1214–1225. IEEE, 2020.
  • [CHKK15] Yuxin Chen, S. Hamed Hassani, Amin Karbasi, and Andreas Krause. Sequential information maximization: When is greedy near-optimal? In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 338–363, 2015.
  • [CJK+15] Yuxin Chen, Shervin Javdani, Amin Karbasi, J. Andrew Bagnell, Siddhartha S. Srinivasa, and Andreas Krause. Submodular surrogates for value of information. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA., pages 3511–3518, 2015.
  • [CMS10] Kenneth L. Clarkson, Wolfgang Mulzer, and C. Seshadhri. Self-improving algorithms for convex hulls. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 1546–1565, 2010.
  • [Dov18] Laura Doval. Whether or not to open pandora’s box. J. Econ. Theory, 175:127–158, 2018.
  • [EHLM19] Hossein Esfandiari, Mohammad Taghi Hajiaghayi, Brendan Lucier, and Michael Mitzenmacher. Online pandora’s boxes and bandits. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 1885–1892. AAAI Press, 2019.
  • [FKM05] Abraham Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2005, Vancouver, British Columbia, Canada, January 23-25, 2005, pages 385–394. SIAM, 2005.
  • [FLPS20] Dimitris Fotakis, Thanasis Lianeas, Georgios Piliouras, and Stratis Skoulakis. Efficient online learning of optimal rankings: Dimensionality reduction via gradient descent. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  • [FLT04] Uriel Feige, László Lovász, and Prasad Tetali. Approximating min sum set cover. Algorithmica, 40(4):219–234, 2004.
  • [GGM06] Ashish Goel, Sudipto Guha, and Kamesh Munagala. Asking the right questions: model-driven optimization using probes. In Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 26-28, 2006, Chicago, Illinois, USA, pages 203–212, 2006.
  • [GJSS19] Anupam Gupta, Haotian Jiang, Ziv Scully, and Sahil Singla. The markovian price of information. In Integer Programming and Combinatorial Optimization - 20th International Conference, IPCO 2019, Ann Arbor, MI, USA, May 22-24, 2019, Proceedings, pages 233–246, 2019.
  • [GK01] Anupam Gupta and Amit Kumar. Sorting and selection with structured costs. In 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas, Nevada, USA, pages 416–425, 2001.
  • [GLS88] Martin Grötschel, László Lovász, and Alexander Schrijver. Geometric Algorithms and Combinatorial Optimization, volume 2 of Algorithms and Combinatorics. Springer, 1988.
  • [GN13] Anupam Gupta and Viswanath Nagarajan. A stochastic probing problem with applications. In Integer Programming and Combinatorial Optimization - 16th International Conference, IPCO 2013, Valparaíso, Chile, March 18-20, 2013. Proceedings, pages 205–216, 2013.
  • [GNS16] Anupam Gupta, Viswanath Nagarajan, and Sahil Singla. Algorithms and adaptivity gaps for stochastic probing. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 1731–1747, 2016.
  • [GNS17] Anupam Gupta, Viswanath Nagarajan, and Sahil Singla. Adaptivity gaps for stochastic probing: Submodular and XOS functions. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 1688–1702, 2017.
  • [GR17] Rishi Gupta and Tim Roughgarden. A PAC approach to application-specific algorithm selection. SIAM J. Comput., 46(3):992–1017, 2017.
  • [HK92] Theodore P. Hill and Robert P. Kertz. A survey of prophet inequalities in optimal stopping theory. CONTEMPORARY MATHEMATICS, 125, 1992.
  • [KLL17] Robert Kleinberg, Kevin Leyton-Brown, and Brendan Lucier. Efficiency through procrastination: Approximately optimal algorithm configuration with runtime guarantees. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pages 2023–2031, 2017.
  • [KMMO90] Anna R. Karlin, Mark S. Manasse, Lyle A. McGeoch, and Susan S. Owicki. Competitive randomized algorithms for non-uniform problems. In Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, 22-24 January 1990, San Francisco, California, USA., pages 301–309, 1990.
  • [Luc17] Brendan Lucier. An economic view of prophet inequalities. SIGecom Exch., 16(1):24–47, 2017.
  • [MBMW05] Kamesh Munagala, Shivnath Babu, Rajeev Motwani, and Jennifer Widom. The pipelined set cover problem. In Thomas Eiter and Leonid Libkin, editors, Database Theory - ICDT 2005, 10th International Conference, Edinburgh, UK, January 5-7, 2005, Proceedings, volume 3363 of Lecture Notes in Computer Science, pages 83–98. Springer, 2005.
  • [Sha12] Shai Shalev-Shwartz. Online learning and online convex optimization. Found. Trends Mach. Learn., 4(2):107–194, 2012.
  • [Sin18] Sahil Singla. The price of information in combinatorial optimization. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 2523–2532, 2018.
  • [SS07] Shai Shalev-Shwartz and Yoram Singer. A primal-dual perspective of online learning algorithms. Mach. Learn., 69(2-3):115–142, 2007.
  • [SW11] Martin Skutella and David P. Williamson. A note on the generalized min-sum set cover problem. Oper. Res. Lett., 39(6):433–436, 2011.
  • [Wei79] Martin L Weitzman. Optimal Search for the Best Alternative. Econometrica, 47(3):641–654, May 1979.
  • [WGS18] Gellert Weisz, Andras Gyorgy, and Csaba Szepesvari. Leapsandbounds: A method for approximately optimal algorithm configuration. In International Conference on Machine Learning, pages 5257–5265, 2018.

Appendix A Proofs from section 3

The following lemma shows the strong convexity of the regularizer used in our FTRL algorithms.

Lemma A.1 (Convexity of Regularizer).

The following function is 1/n1/n-strongly convex with respect to the 1\ell_{1}-norm.

U(𝒙)=i=1nt=1nxitlogxitU(\bm{x})=\sum_{i=1}^{n}\sum_{t=1}^{n}x_{it}\log x_{it}

for a doubly-stochastic matrix x[0,1]n×nx\in[0,1]^{n\times n}

Proof.

Since U(𝒙)U(\bm{x}) is twice continuously differentiable we calculate 2U(𝒙)\nabla^{2}U(\bm{x}), which is a n×nn\times n diagonal matrix since

ϑU(𝒙)ϑxktϑxij={1/xijIf i=k and j=t0Else\frac{\vartheta U(\bm{x})}{\vartheta x_{kt}\vartheta x_{ij}}=\begin{cases}1/x_{ij}&\text{If }i=k\text{ and }j=t\\ 0&\text{Else}\end{cases}

We show that 𝒛2U(𝒙)𝒛𝒛12\bm{z}\nabla^{2}U(\bm{x})\bm{z}\geq||\bm{z}||_{1}^{2} for all 𝒙n2\bm{x}\in\mathbb{R}^{n^{2}}. We make the following mapping of the variables for each xijx_{ij} we map it to pkp_{k} where k=(i1)n+jk=(i-1)n+j. We have that

𝒛2U(𝒙)𝒛\displaystyle\bm{z}\nabla^{2}U(\bm{x})\bm{z} =i=1n2(zi)2pi\displaystyle=\sum_{i=1}^{n^{2}}\frac{(z_{i})^{2}}{p_{i}}
=1n(i=1n2pi)i=1n2(zi)2pi\displaystyle=\frac{1}{n}\left(\sum_{i=1}^{n^{2}}p_{i}\right)\sum_{i=1}^{n^{2}}\frac{(z_{i})^{2}}{p_{i}}
1n(i=1n2pi|zi|pi)2\displaystyle\geq\frac{1}{n}\left(\sum_{i=1}^{n^{2}}\sqrt{p_{i}}\frac{|z_{i}|}{\sqrt{p_{i}}}\right)^{2}
=1n𝒛12.\displaystyle=\frac{1}{n}||\bm{z}||_{1}^{2}.

where in the second line we used that xijx_{ij}’s are a doubly stochastic matrix, and then Cauchy-Schwartz inequality. ∎

See 3.3

Proof.

Initially observe that by setting xij=1/nx_{ij}=1/n we get UmaxUmin=(nlogn)/ηU_{\max}-U_{\min}=(n\log n)/\eta, since we get the maximum entropy when the values are all equal. Additionally, from Lemma A.1 we have that U(𝒙)U(\bm{x}) is η/n\eta/n-strongly convex. Observing also that the functions in all cases are nn-Lipschitz and using Theorem 3.2 we obtain the guarantee of the theorem, by setting η=lognT\eta=\frac{\sqrt{\log n}}{\sqrt{T}}. ∎

Appendix B Proofs from section 5

Before moving to the formal proof of Lemma 5.1, we recall the following lemma about the ellipsoid algorithm, bounding the number of steps it takes to find a feasible solution.

Lemma B.1 (Lemma 3.1.36 from [GLS88]).

Given a full dimensional polytope P={x:Cxd}P=\{x:Cx\leq d\}, for xnx\in\mathbb{R}^{n}, and let C,d\langle C,d\rangle be the encoding length of CC and dd. If the initial ellipsoid is E0=E(R2I,0)E_{0}=E(R^{2}I,0)333E(R,Z)E(R,Z) indicates a ball of radius RR and center ZZ. where R=n2C,dn2R=\sqrt{n}2^{\langle C,d\rangle-n^{2}} the ellipsoid algorithm finds a feasible solution after O(n2C,d)O(n^{2}\langle C,d\rangle) steps.

Using the lemma above, we can now prove Lemma 5.1, which we also restate below.

See 5.1

Proof.

Our analysis follows similarly to Theorem 3.2 of [AKL+19]. Initially observe that the only time we make a mistake is in the case with probability (1pt)(1-p_{t}) if the LP solution is not feasible. Denote by 𝒞\mathcal{C}^{*} the set of the constraints of 𝒫\mathcal{LP} as defined in Algorithm 3, and by 𝒞1𝒞2𝒞t\mathcal{C}_{1}\subseteq\mathcal{C}_{2}\subseteq\ldots\subseteq\mathcal{C}_{t} the constraint set for every round of the algorithm, for all t[T]t\in[T]. We also denote by NT(c)N_{T}(c) the number of times a constraint cc was not in 𝒞t\mathcal{C}_{t} for some time tt but was part of 𝒫\mathcal{LP}. Formally NT(c)=|{c𝒞,c𝒞t,}|N_{T}(c)=|\{c\in\mathcal{C}^{*},c\not\in\mathcal{C}_{t},\}| for a constraint c𝒞c\in\mathcal{C}^{*} and any t[T]t\in[T]. We can bound the mistake bound of Algorithm 3 as follows

(𝒜,T)c𝒞𝔼[NT(c)].\displaystyle\mathcal{M}(\mathcal{A},T)\leq\sum_{c\in\mathcal{C}^{*}}\mathbb{E}\left[N_{T}(c)\right].

Let tc[T]t_{c}\in[T] be the round that constraint cc is added to the algorithm’s constraint set for the first time, and let ScS_{c} be the set of \ell rounds in which we made a mistake because of this constraint. Observe that {S1,S2,S}=Sc{𝒞1,𝒞2,𝒞tc}\{S_{1},S_{2},\ldots S_{\ell}\}=S_{c}\subseteq\{\mathcal{C}_{1},\mathcal{C}_{2},\ldots\mathcal{C}_{t_{c}}\}. We calculate the probability that NT(c)N_{T}(c) is incremented on round kk of SS

Pr[NT(c) incremented on round k]=i=1k(1pi)(1p)k,{\small\textbf{Pr}\left[N_{T}(c)\text{ incremented on round }k\right]=\prod_{i=1}^{k}(1-p_{i})\leq(1-p)^{k}},

since in order to make a mistake, we ended up on line 3 of the algorithm. Therefore

𝔼[NT(c)]i=1T(1p)i=(1p)(1(1p))Tp.\mathbb{E}\left[N_{T}(c)\right]\leq\sum_{i=1}^{T}(1-p)^{i}=\frac{(1-p)(1-(1-p))^{T}}{p}.

However in our case, every time a constraint is added to 𝒞t\mathcal{C}_{t}, one step of the ellipsoid algorithm is run, for the 𝒫\mathcal{LP}. Using Lemma B.1 and observing that in our case C,d=O(1)\langle C,d\rangle=O(1) the total times this step can happen is at most O(n2)O(n^{2}), giving us the result of the lemma by setting p=1/Tp=1/\sqrt{T}. ∎

See 5.3

Proof of Theorem 5.3.

Denote by MM be the mistake bound term, bounded in Lemma 5.1. Calculating the total average regret we get

𝔼[RegretNA(𝒜,T)]\displaystyle\mathbb{E}\left[\text{Regret}_{NA}(\mathcal{A}_{\mathcal{F}},T)\right] +OPT=1T(M+t=1T𝔼[St])\displaystyle+\text{OPT}=\frac{1}{T}\left(M+\sum_{t=1}^{T}\mathbb{E}\left[S_{t}\right]\right) Definition
=1T(M+t=1Tpt(n+minici)+(1pt)𝔼[St])\displaystyle=\frac{1}{T}\left(M+\sum_{t=1}^{T}p_{t}(n+\min_{i\in\mathcal{F}}c_{i})+(1-p_{t})\mathbb{E}\left[S_{t}\right]\right) Algorithm3
1T(M+t=1Tptn+ptminici+2βOPTt)\displaystyle\leq\frac{1}{T}\left(M+\sum_{t=1}^{T}p_{t}n+p_{t}\min_{i\in\mathcal{F}}c_{i}+2\beta\text{OPT}_{t}\right) 𝒜, and ellipsoid’s loss\displaystyle\mathcal{A}_{\mathcal{F}},\text{ and ellipsoid's loss}
(2β+1)OPT+1T(M+nt=1Tpt)\displaystyle\leq(2\beta+1)\text{OPT}+\frac{1}{T}\left(M+n\sum_{t=1}^{T}p_{t}\right) miniciOPTtOPT\displaystyle\min_{i\in\mathcal{F}}c_{i}\leq\text{OPT}_{t}\leq\text{OPT}
(2β+1)OPT+nT\displaystyle\leq(2\beta+1)\text{OPT}+\frac{n}{\sqrt{T}} t=1TptT from [AKL+19]\displaystyle\sum_{t=1}^{T}p_{t}\leq\sqrt{T}\text{ from~{}\cite[cite]{[\@@bibref{}{AlabKalaLigeMuscTzamVite2019}{}{}]}}
(2β+1)OPT+O(n2T)\displaystyle\leq(2\beta+1)\text{OPT}+O\left(\frac{n^{2}}{\sqrt{T}}\right) From Lemma 5.1.\displaystyle\text{From Lemma~{}\ref{thm:MB_simple}}.

Therefore, subtracting OPT from both sides we get the theorem. ∎

Appendix C Rounding against Partially-adaptive

In this section we show how using the rounding algorithms presented in [CGT+20], we obtain α\alpha-approximate rounding for our convex relaxations. We emphasize the fact that these algorithms convert a fractional solution of the relaxed problem, to an integer solution of comparable cost, without needing to know the scenario. Formally we show the following guarantees.

Corollary C.0.1 (Rounding against PA).

Given a fractional solution x¯\overline{x} of cost f¯(x¯)\overline{f}(\overline{x}) there exists an α\alpha-approximate rounding where

  • Selecting 11 box: α=9.22\alpha=9.22 (Using Lemma C.1)

  • Selecting kk boxes: α=O(1)\alpha=O(1) (Using Lemma C.2)

  • Selecting a matroid basis: α=O(logk)\alpha=O(\log k) (Using Lemma C.4)

In order to obtain the results of this corollary, we combine the ski-rental Lemma 2.1 with Lemmas C.1, C.2 and C.4, for each different case of constraints. Observe that, as we show in the following sections, the part where the fractional solution given is rounded to an integer permutation does not depend on the scenario realized. Summarizing the rounding framework of the algorithms is as follows.

  1. 1.

    Receive fractional solution x¯,z¯\overline{x},\overline{z}.

  2. 2.

    Use rounding 4, 5, 6, depending on the constraints, to obtain an (integer) permutation π\pi of the boxes.

  3. 3.

    Start opening boxes in the order of π\pi.

  4. 4.

    Use ski-rental to decide the stopping time.

C.1 Rounding against Partially-adaptive for Choosing 11 Box

The convex relaxation for this case (Relaxation-SPA) is also given in section 2.2.2, but we repeat it here for convenience.

minz0\displaystyle\text{min}_{z\geq 0}\quad i,t𝒯(t+cis)zits\displaystyle\sum_{i\in\mathcal{B},t\in\mathcal{T}}(t+c_{i}^{s})z_{it}^{s} (Relaxation-SPA)
s.t. t𝒯,izits=1,\displaystyle\sum_{t\in\mathcal{T},i\in\mathcal{B}}z_{it}^{s}=1,
zitsxit,\displaystyle\hskip 19.91684ptz_{it}^{s}\leq x_{it}, i,t𝒯.\displaystyle i\in\mathcal{B},t\in\mathcal{T}.

Our main lemma in this case, shows how to obtain a constant competitive partially adaptive strategy, when given a scenario-aware solution.

Lemma C.1.

Given a scenario-aware fractional solution x¯\overline{x} of cost f¯(x¯)\overline{f}(\overline{x}) there exists an efficient partially-adaptive strategy xx with cost at most 9.22f¯(x¯)9.22\overline{f}(\overline{x}).

Proof.

We explicitly describe the rounding procedure, in order to highlight its independence of the scenario realized. For the rest of the proof we fix an (unknown) realized scenario ss. Starting from our (fractional) solution 𝒙¯\overline{\bm{x}} of cost f¯s=f¯os+f¯cs\overline{f}^{s}=\overline{f}_{o}^{s}+\overline{f}_{c}^{s}, where f¯os\overline{f}_{o}^{s} and f¯cs\overline{f}_{c}^{s} are the opening and values444Cost incurred by the value found inside the box. cost respectively, we use the reduction in Theorem 5.2 in [CGT+20] to obtain a transformed fractional solution 𝒙¯\overline{\bm{x}}^{\prime} of cost f¯s=f¯os+f¯cs\overline{f^{\prime}}^{s}=\overline{f^{\prime}}^{s}_{o}+\overline{f^{\prime}}^{s}_{c}. For this transformed solution, [CGT+20] in Lemma 5.1 showed that

f¯os(αα1)2f¯os\overline{f^{\prime}}^{s}_{o}\leq\left(\frac{\alpha}{\alpha-1}\right)^{2}\overline{f}_{o}^{s} (3)

for the opening cost and

f¯csαf¯cs\overline{f^{\prime}}^{s}_{c}\leq\alpha\overline{f}^{s}_{c} (4)

for the cost incured by the value inside the box chosen. To achieve this, the initial variables x¯it\overline{x}_{it} are scaled by a factor depending on α\alpha to obtain x¯it\overline{x}_{it}^{\prime}. For the remainder of the proof, we assume this scaling happened at the beginning, and abusing notation we denote by 𝒙¯\overline{\bm{x}} the scaled variables. This is without loss of generality since, at the end of the proof, we are taking into account the loss in cost incurred by the scaling (Inequalities 3 and 4). The rounding process is shown in Algorithm 4.

Data: Fractional solution 𝒙\bm{x} with cost f¯\overline{f}, set α=3+22\alpha=3+2\sqrt{2}
/* Part 1: Scenario-independent rounding */
1 σ:=\sigma:= for every t=1,,nt=1,\ldots,n, repeat twice: open each box ii w.p. qit=ttx¯ittq_{it}=\frac{\sum_{t^{\prime}\leq t}\overline{x}_{it^{\prime}}}{t}.
2
/* Part 2: Scenario-dependent stopping time */
3 Given scenario ss, calculate 𝒛s\bm{z}^{s} and f¯cs\overline{f}_{c}^{s}
τs:=\tau_{s}:= If box ii is opened and has value cisαf¯csc_{i}^{s}\leq\alpha\overline{f}_{c}^{s} then select it.
Algorithm 4 Scenario aware, α\alpha-approximate rounding for 11 box


The ratio of the opening cost of the integer to the fractional solution is bounded by

fosf¯os\displaystyle\frac{f_{o}^{s}}{\overline{f^{\prime}}^{s}_{o}} 2t=1k=1t1(1iA,tkzitsk)2itzits\displaystyle\leq\frac{2\sum_{t=1}^{\infty}\prod_{k=1}^{t-1}\left(1-\frac{\sum_{i\in A,t^{\prime}\leq k}z_{it^{\prime}}^{s}}{k}\right)^{2}}{\sum_{i}t\cdot z_{it}^{s}} Since zitsxit\displaystyle\text{Since }z_{it}^{s}\leq x_{it}
2t=1exp(2k=1t1tkzitsk)itzits\displaystyle\leq\frac{2\sum_{t=1}^{\infty}\exp\left(-2\sum_{k=1}^{t-1}\frac{\sum_{t^{\prime}\leq k}z_{it^{\prime}}^{s}}{k}\right)}{\sum_{i}t\cdot z_{it}^{s}} Using that 1+xex\displaystyle\text{Using that }1+x\leq e^{x}

Observe that h(𝒛)=logfosf¯os=logfoslogf¯osh(\bm{z})=\log\frac{f^{s}_{o}}{\overline{f^{\prime}}^{s}_{o}}=\log f_{o}^{s}-\log\overline{f}^{s}_{o} is a convex function since the first part is LogSumExp, and logf¯os\log\overline{f}^{s}_{o} is the negation of a concave function. That means h(𝒛)h(\bm{z}) obtains the maximum value in the boundary of the domain, therefore at the integer points where zits=1z_{it}^{s}=1 iff t=t=\ell for some [n]\ell\in[n], otherwise zits=0z_{it}^{s}=0. Using this fact we obtain

fosf¯os\displaystyle\frac{f_{o}^{s}}{\overline{f^{\prime}}_{o}^{s}} 2+2t=+1exp(2k=t11k)\displaystyle\leq\frac{2\ell+2\sum_{t=\ell+1}^{\infty}\exp\left(-2\sum_{k=\ell}^{t-1}\frac{1}{k}\right)}{\ell} Using that zits=1 iff t=\displaystyle\text{Using that }z_{it}^{s}=1\text{ iff }t=\ell
=2+2t=+1exp(Ht1H1)\displaystyle=\frac{2\ell+2\sum_{t=\ell+1}^{\infty}\exp\left(H_{t-1}-H_{\ell-1}\right)}{\ell} Ht is the t’th harmonic number\displaystyle H_{t}\text{ is the $t$'th harmonic number}
2+2t=+1(t)2\displaystyle\leq\frac{2\ell+2\sum_{t=\ell+1}^{\infty}\left(\frac{\ell}{t}\right)^{2}}{\ell} Since Ht1H1t1x𝑑x=logtlog\displaystyle\text{Since }H_{t-1}-H_{\ell-1}\geq\int_{\ell}^{t}\frac{1}{x}dx=\log t-\log\ell
2+221t2𝑑t\displaystyle\leq\frac{2\ell+2\ell^{2}\int_{\ell}^{\infty}\frac{1}{t^{2}}dt}{\ell} Since t2x2 for x[t1,t]\displaystyle\text{Since }t^{-2}\leq x^{-2}\text{ for }x\in[t-1,t]
=4.\displaystyle=4.

Combining with equation 3, we get that fos4(αα1)2f¯osf^{s}_{o}\leq 4\left(\frac{\alpha}{\alpha-1}\right)^{2}\overline{f}^{s}_{o}. Recall that for the values cost, inequality (4) holds, therefore requiring that 4(αα1)2=α4\left(\frac{\alpha}{\alpha-1}\right)^{2}=\alpha, we have the lemma for α=3+22\alpha=3+2\sqrt{2}. ∎

Corollary C.1.1.

For the case of MSSC, when the costs inside the boxes are either 0 or \infty, the rounding of Lemma C.1 obtains a 44-approximation, improving the 11.47311.473 of [FLPS20].

C.2 Rounding against Partially-adaptive for Choosing kk Boxes

In this case we are required to pick kk boxes instead of one. Similarly to the one box case, we relax the domain X¯={x[0,1]n×n:ixit=1 and txit=1}\overline{X}=\{x\in[0,1]^{n\times n}:\sum_{i}x_{it}=1\text{ and }\sum_{t}x_{it}=1\}, to be the set of doubly stochastic matrices. We define the relaxation g¯s(𝒙)\overline{g}^{s}(\bm{x}) as

miny0,z0\displaystyle\text{min}_{y\geq 0,z\geq 0} t𝒯(1yts)\displaystyle\quad\quad\quad\sum_{t\in\mathcal{T}}(1-y_{t}^{s}) +\displaystyle+ i,t𝒯ciszits\displaystyle\quad\sum_{i\in\mathcal{B},t\in\mathcal{T}}c_{i}^{s}z_{it}^{s} (Relaxation-SPA-k)
subject to zits\displaystyle\hskip 71.13188ptz_{it}^{s}\quad \displaystyle\leq xit,\displaystyle\quad x_{it}, i,t𝒯\displaystyle\forall i\in\mathcal{B},t\in\mathcal{T}
tt,iAzits\displaystyle\hskip 34.14322pt\sum_{t^{\prime}\leq t,i\not\in A}z_{it^{\prime}}^{s}\quad \displaystyle\geq (k|A|)yts,\displaystyle\quad(k-|A|)y_{t}^{s}, A,t𝒯\displaystyle\quad\quad\forall A\subseteq\mathcal{B},t\in\mathcal{T} (5)

Our main lemma in this case, shows how to obtain a constant competitive partially adaptive strategy, when given a scenario-aware solution, in the case we are required to select kk items.

Lemma C.2.

Given a scenario-aware fractional solution z¯s,x¯\overline{z}^{s},\overline{x} of cost g¯s(x¯)\overline{g}^{s}(\overline{x}) there exists an efficient partially-adaptive strategy xx with cost at most O(1)g¯s(x¯)O(1)\overline{g}^{s}(\overline{x}).

Proof.

We follow exactly the steps of the proof of Theorem 6.2 from [CGT+20], but here we highlight two important properties of it.

  • The rounding part that decides the permutation (Algorithm 5) does not depend on the scenario realised, despite the algorithm being scenario-aware.

  • The proof does not use the fact that the initial solution is an optimal LP solution. Therefore, the guarantee is given against any feasible fractional solution.

The rounding process is shown in Algorithm 5.

Data: Solution 𝒙\bm{x} to Relaxation-SPA-k. Set α=8\alpha=8
/* Part 1: Scenario-independent rounding */
1 σ:=\sigma:= For each phase =1,2,\ell=1,2,\ldots, open each box ii independently with probability qi=min(αt2xit,1)q_{i\ell}=\min\left(\alpha\sum_{t\leq 2^{\ell}}x_{it},1\right).
2
/* Part 2: Scenario-dependent stopping time */
3 τs:=\tau_{s}:=
4  Given scenario ss, calculate 𝒚s,𝒛s\bm{y}^{s},\bm{z}^{s}
5  Define ts=max{t:yts1/2}t_{s}^{*}=\max\{t:y_{t}^{s}\leq 1/2\}.
6if 2ts2^{\ell}\geq t^{*}_{s} then
7       For each opened box ii, select it with probability min(αt2zitsqi,1)\min\left(\frac{\alpha\sum_{t\leq 2^{\ell}}z_{it}^{s}}{q_{i\ell}},1\right).
8       Stop when we have selected kk boxes in total.  
9 end if
Algorithm 5 Scenario aware, α\alpha-approximate rounding for kk-coverage from [CGT+20]


Assume that we are given a fractional solution (𝒙¯,𝒚¯s,𝒛¯s)(\overline{\bm{x}},\overline{\bm{y}}^{s},\overline{\bm{z}}^{s}), where x¯it\overline{x}_{it} is the fraction that box ii is opened at time tt, z¯its\overline{z}_{it}^{s} is the fraction box ii is chosen for scenario ss at time tt, and y¯ts\overline{y}_{t}^{s} is the fraction scenario ss is ”covered“ at time tt, where covered means that there are kk boxes selected for this scenario555We use the variables ytsy_{t}^{s} for convenience, they are not required since yts=t<t,izitsy_{t}^{s}=\sum_{t^{\prime}<t,i\in\mathcal{B}}z_{it}^{s}.. Denote by f¯os\overline{f}_{o}^{s} (fosf_{o}^{s}) and f¯cs\overline{f}_{c}^{s} (fcsf_{c}^{s}) the fractional (rounded) costs for scenario ss due to opening and selecting boxes respectively. Denote also by tst^{*}_{s} the last time step that yts1/2y_{t}^{s}\leq 1/2 and observe that

f¯osts2.\overline{f}_{o}^{s}\geq\frac{t^{*}_{s}}{2}. (6)

Fix a realized scenario ss and denote by 0=logts\ell_{0}=\lceil\log t^{*}_{s}\rceil. Using that for each box ii the probability that it is selected in phase 0\ell\geq\ell_{0} is min(1,8t2zits)\min(1,8\sum_{t^{\prime}\leq 2^{\ell}}z_{it^{\prime}}^{s}), we use the following lemma from [BGK10] that still holds in our case ; the proof of the lemma only uses constraint 5 and a Chernoff bound.

Lemma C.3 (Lemma 5.1 in [BGK10]).

If each box ii is selected w.p. at least min(1,8ttzits)\min(1,8\sum_{t^{\prime}\leq t}z_{it^{\prime}}^{s}) for ttst\geq t_{s}^{*}, then with probability at least 1e9/81-e^{-9/8}, at least kk different boxes are selected.

Similarly to [CGT+20], let γ=e9/8\gamma=e^{-9/8} and j\mathcal{B}_{j} be the set of boxes selected at phase jj. Since the number of boxes opened in a phase is independent of the event that the algorithm reaches that phase prior to covering scenario ss the expected inspection cost is

𝔼[fos after phase 0]\displaystyle\mathbb{E}\left[f_{o}^{s}\text{ after phase }\ell_{0}\right] ==0𝔼[fos in phase ]Pr[reach phase ]\displaystyle=\sum_{\ell=\ell_{0}}^{\infty}\mathbb{E}\left[f_{o}^{s}\text{ in phase }\ell\right]\cdot\textbf{Pr}\left[\text{reach phase }\ell\right]
=0iαt2xitj=01Pr[|j|k]\displaystyle\leq\sum_{\ell=\ell_{0}}^{\infty}\sum_{i\in\mathcal{B}}\alpha\sum_{t^{\prime}\leq 2^{\ell}}x_{it^{\prime}}\cdot\prod_{j=\ell_{0}}^{\ell-1}\textbf{Pr}\left[|\mathcal{B}_{j}|\leq k\right]
=02αγ0\displaystyle\leq\sum_{\ell=\ell_{0}}^{\infty}2^{\ell}\alpha\cdot\gamma^{\ell-\ell_{0}} Lemma C.3xit doubly stochastic\displaystyle\text{Lemma \ref{lem:successprobk}, }x_{it}\text{ doubly stochastic}
=20α12γ<2tsα12γ4αf¯os12γ.\displaystyle=\frac{2^{\ell_{0}}\alpha}{1-2\gamma}<\frac{2t_{s}^{*}\alpha}{1-2\gamma}\leq\frac{4\alpha\overline{f}^{s}_{o}}{1-2\gamma}. 0=logts and ineq. (6)\displaystyle\ell_{0}=\lceil\log t_{s}^{*}\rceil\text{ and ineq. \eqref{eq:opt_t*}}

Observe that the expected opening cost at each phase \ell is at most α2\alpha 2^{\ell}, therefore the expected opening cost before phase 0\ell_{0} is at most <0α2<20α<2tsα4αf¯os\sum_{\ell<\ell_{0}}\alpha 2^{\ell}<2^{\ell_{0}}\alpha<2t_{s}^{*}\alpha\leq 4\alpha\overline{f}_{o}^{s}. Putting it all together, the total expected opening cost of the algorithm for scenario ss is

fos4αf¯os+4αf¯os12γ<123.25f¯os.f_{o}^{s}\leq 4\alpha\overline{f}_{o}^{s}+\frac{4\alpha\overline{f}_{o}^{s}}{1-2\gamma}<123.25\overline{f}_{o}^{s}.

To bound the cost of our algorithm, we find the expected total value of any phase \ell, conditioned on selecting at least kk distinct boxes in this phase.

𝔼[cost in phase \displaystyle\mathbb{E}[\text{cost in phase }\ell |at least k boxes are selected in phase ]\displaystyle|\text{at least $k$ boxes are selected in phase }\ell]
𝔼[cost in phase ]Pr[at least k boxes are selected in phase ]\displaystyle\leq\frac{\mathbb{E}\left[\text{cost in phase }\ell\right]}{\textbf{Pr}\left[\text{at least $k$ boxes are selected in phase }\ell\right]}
11γ𝔼[cost in phase ]\displaystyle\leq\frac{1}{1-\gamma}\mathbb{E}\left[\text{cost in phase }\ell\right]
11γiαt2zitscis=11γαf¯cs<11.85f¯cs.\displaystyle\leq\frac{1}{1-\gamma}\sum_{i\in\mathcal{B}}\alpha\sum_{t\leq 2^{\ell}}z_{it}^{s}c_{i}^{s}=\frac{1}{1-\gamma}\alpha\overline{f}_{c}^{s}<11.85\overline{f}_{c}^{s}.

The third line follows by Lemma C.3 and the last line by the definition of f¯cs\overline{f}_{c}^{s}. Notice that the upper bound does not depend on the phase \ell, so the same upper bound holds for fcsf_{c}^{s}. Thus the total cost contributed from scenario ss in our algorithm is

fs=fos+fcs<123.25f¯os+11.85f¯cs123.25f¯s,f^{s}=f_{o}^{s}+f_{c}^{s}<123.25\overline{f}_{o}^{s}+11.85\overline{f}_{c}^{s}\leq 123.25\overline{f}^{s},

which gives us the lemma.

C.3 Rounding against Partially-adaptive for Choosing a Matroid Basis

Similarly to the kk boxes case, we relax the domain X¯={x[0,1]n×n:ixit=1 and txit=1}\overline{X}=\{x\in[0,1]^{n\times n}:\sum_{i}x_{it}=1\text{ and }\sum_{t}x_{it}=1\}, to be the set of doubly stochastic matrices. Let r(A)r(A) for any set AA\subseteq\mathcal{B} denote the rank of this set. We define g¯s(𝒙)\overline{g}^{s}(\bm{x}) as

miny0,z0\displaystyle\text{min}_{y\geq 0,z\geq 0} t𝒯(1yts)\displaystyle\sum_{t\in\mathcal{T}}(1-y_{t}^{s}) +\displaystyle\quad+\quad i,t𝒯ciszits\displaystyle\sum_{i\in\mathcal{B},t\in\mathcal{T}}c_{i}^{s}z_{it}^{s} (Relaxation-SPA-matroid)
subject to t𝒯,iAzits\displaystyle\hskip 19.91684pt\sum_{t\in\mathcal{T},i\in A}z_{it}^{s} \displaystyle\quad\leq\quad r(A),\displaystyle r(A), A\displaystyle\forall A\subseteq\mathcal{B} (7)
zits\displaystyle\hskip 56.9055ptz_{it}^{s} \displaystyle\quad\leq\quad xit,\displaystyle x_{it}, i,t𝒯\displaystyle\forall i\in\mathcal{B},t\in\mathcal{T}
iAttzits\displaystyle\hskip 18.49411pt\sum_{i\not\in A}\sum_{t^{\prime}\leq t}z_{it^{\prime}}^{s} \displaystyle\quad\geq\quad (r([n])r(A))yts,\displaystyle(r([n])-r(A))y_{t}^{s},\quad A,t𝒯\displaystyle\forall A\subseteq\mathcal{B},t\in\mathcal{T} (8)

Our main lemma in this case, shows how to obtain a constant logk\log k-competitive partially adaptive strategy, when given a scenario-aware solution, in the case we are required to select a matroid basis.

Lemma C.4.

Given a scenario-aware fractional solution z¯s,x¯\overline{z}^{s},\overline{x} of cost g¯s(x¯)\overline{g}^{s}(\overline{x}) there exists an efficient partially-adaptive strategy zsz_{s} with cost at most O(logk)g¯s(x¯)O(\log k)\overline{g}^{s}(\overline{x}).

Proof of Lemma C.4.

We follow exactly the steps of the proof of Lemma 6.4 from [CGT+20], but similarly to the kk items case we highlight the same important properties; (1) the rounding that decides the permutation does not depend on the scenario (2) the proof does not use the fact that the initial solution given is optimal in any way. The rounding process is shown in Algorithm 6.

Data: Fractional solution 𝒙,𝒚,𝒛\bm{x},\bm{y},\bm{z} for scenario ss, α=64\alpha=64.
/* Part 1: Scenario-independent rounding */
1 σ:=\sigma:= for every t=1,,nt=1,\ldots,n, open each box ii independently with probability qit=min{αlnkttxitt,1}q_{it}=\min\left\{\alpha\ln k\frac{\sum_{t^{\prime}\leq t}x_{it^{\prime}}}{t},1\right\}.
2
/* Part 2: Scenario-dependent stopping time. */
3 τs:=\tau_{s}:=
4  Given scenario ss, calculate 𝒚,𝒛\bm{y},\bm{z}
5  Let ts=min{t:yts1/2}t_{s}^{*}=\min\{t:y_{t}^{s}\leq 1/2\}.
6if t>tst>t^{*}_{s} then
7       For each opened box ii, select it with probability min{αlnkttzitstqit, 1}\min\left\{\frac{\alpha\ln k\sum_{t^{\prime}\leq t}z_{it^{\prime}}^{s}}{tq_{it}},\ 1\right\}.
8       Stop when we find a base of the matroid.
9 end if
Algorithm 6 Scenario aware, O(logn)O(\log n)-approximate rounding, matroids from [CGT+20]


Denote by (𝒙¯,𝒚¯s,𝒛¯s)(\overline{\bm{x}},\overline{\bm{y}}^{s},\overline{\bm{z}}^{s}) the fractional scenario-aware solution given to us, where x¯it\overline{x}_{it} is the fraction that box ii is opened at time tt, z¯its\overline{z}_{it}^{s} is the fraction box ii is chosen for scenario ss at time tt, and y¯ts\overline{y}_{t}^{s} is the fraction scenario ss is ”covered“ at time tt, where covered means that there is a matroid basis selected for this scenario. Denote also by f¯os(fos),f¯cs(fcs)\overline{f}_{o}^{s}\ (f_{o}^{s}),\overline{f}_{c}^{s}\ (f_{c}^{s}) and the fractional costs for scenario ss due to opening and selecting boxes for the fractional (integral) solution respectively.

In scenario ss, let phase \ell be when t(21ts,2ts]t\in(2^{\ell-1}t_{s}^{*},2^{\ell}t_{s}^{*}]. We divide the time after tst^{*}_{s} into exponentially increasing phases, while in each phase we prove that our success probability is a constant. The following lemma gives an upper bound for the opening cost needed in each phase to get a full rank base of the matroid, and still holds in our case, since only uses the constraints of a feasible solution.

Lemma C.5 (Lemma 6.6 from [CGT+20]).

In phase \ell, the expected number of steps needed to select a set of full rank is at most (4+2+2/α)ts(4+2^{\ell+2}/\alpha)t^{*}_{s}.

Define 𝒳\mathcal{X} to be the random variable indicating number of steps needed to build a full rank subset. The probability that we build a full rank basis within some phase 6\ell\geq 6 is

Pr[𝒳21ts]1𝔼[𝒳]21ts1121ts(4+2+2/α)ts=1238α34,\displaystyle\textbf{Pr}\left[\mathcal{X}\leq 2^{\ell-1}t_{s}^{*}\right]\geq 1-\frac{\mathbb{E}\left[\mathcal{X}\right]}{2^{\ell-1}t_{s}^{*}}\geq 1-\frac{1}{2^{\ell-1}t_{s}^{*}}(4+2^{\ell+2}/\alpha)t^{*}_{s}=1-2^{3-\ell}-\frac{8}{\alpha}\geq\frac{3}{4}, (9)

where we used Markov’s inequality for the first inequality and Lemma C.5 for the second inequality. To calculate the total inspection cost, we sum up the contribution of all phases.

𝔼[fos after phase 6]\displaystyle\mathbb{E}\left[f_{o}^{s}\text{ after phase 6}\right] ==6𝔼[fos at phase ]Pr[ALG reaches phase ]\displaystyle=\sum_{\ell=6}^{\infty}\mathbb{E}\left[f_{o}^{s}\text{ at phase }\ell\right]\cdot\textbf{Pr}\left[\text{ALG}\text{ reaches phase }\ell\right]
=6t=21ts+12tsiαlnkttxitt(14)6\displaystyle\leq\sum_{\ell=6}^{\infty}\sum_{t=2^{\ell-1}t_{s}^{*}+1}^{2^{\ell}t_{s}^{*}}\sum_{i\in\mathcal{B}}\alpha\ln k\cdot\frac{\sum_{t^{\prime}\leq t}x_{it^{\prime}}}{t}\left(\frac{1}{4}\right)^{\ell-6} Algorithm 6
=621tsαlnk(14)6\displaystyle\leq\sum_{\ell=6}^{\infty}2^{\ell-1}t_{s}^{*}\alpha\ln k\cdot\left(\frac{1}{4}\right)^{\ell-6} xit doubly stochastic\displaystyle x_{it}\text{ doubly stochastic}
=128αlnkts3256clnkf¯os3.\displaystyle=\frac{128\alpha\ln kt_{s}^{*}}{3}\leq\frac{256c\ln k\overline{f}_{o}^{s}}{3}. Since ts2f¯os\displaystyle\text{Since }t_{s}^{*}\leq 2\overline{f}_{o}^{s}

Since the expected opening cost at each step is αlnk\alpha\ln k and there are 25ts64f¯os2^{5}t_{s}^{*}\leq 64\overline{f}^{s}_{o} steps before phase 66, we have

fosαlnk64f¯os+256αlnkf¯os3=O(logk)f¯os.f_{o}^{s}\leq\alpha\ln k\cdot 64\overline{f}_{o}^{s}+\frac{256\alpha\ln k\overline{f}_{o}^{s}}{3}=O(\log k)\overline{f}_{o}^{s}.

Similarly to the kk-coverage case, to bound the cost of our algorithm, we find the expected total cost of any phase 6\ell\geq 6, conditioned on boxes forming a full rank base are selected in this phase.

E[fcs in phase |\displaystyle\textbf{E}[f_{c}^{s}\text{ in phase }\ell| full rank base selected in phase ]\displaystyle\text{full rank base selected in phase }\ell]
𝔼[fcs in phase ]Pr[full rank base selected in phase ]\displaystyle\leq\frac{\mathbb{E}\left[f_{c}^{s}\text{ in phase }\ell\right]}{\textbf{Pr}\left[\text{full rank base selected in phase }\ell\right]}
13/4𝔼[fcs in phase ]\displaystyle\leq\frac{1}{3/4}\mathbb{E}\left[f_{c}^{s}\text{ in phase }\ell\right]
13/4it=21ts+12tsαlnkttzitscist\displaystyle\leq\frac{1}{3/4}\sum_{i\in\mathcal{B}}\sum_{t=2^{\ell-1}t_{s}^{*}+1}^{2^{\ell}t_{s}^{*}}\alpha\ln k\frac{\sum_{t^{\prime}\leq t}z_{it^{\prime}}^{s}c_{i}^{s}}{t}
13/4t=21ts+12tsαlnkit𝒯zitscis21ts\displaystyle\leq\frac{1}{3/4}\sum_{t=2^{\ell-1}t_{s}^{*}+1}^{2^{\ell}t_{s}^{*}}\alpha\ln k\sum_{i\in\mathcal{B}}\frac{\sum_{t^{\prime}\in\mathcal{T}}z_{it^{\prime}}^{s}c_{i}^{s}}{2^{\ell-1}t_{s}^{*}}
=13/4αlnkf¯cs=O(logk)f¯cs.\displaystyle=\frac{1}{3/4}\alpha\ln k\overline{f}_{c}^{s}=O(\log k)\overline{f}_{c}^{s}.

Such upper bound of conditional expectation does not depend on \ell, thus also gives the same upper bound for fcsf_{c}^{s}. Therefore fs=fos+fcsO(logk)(f¯os+f¯cs)=O(logk)f¯sf^{s}=f_{o}^{s}+f_{c}^{s}\leq O(\log k)(\overline{f}_{o}^{s}+\overline{f}_{c}^{s})=O(\log k)\overline{f}^{s}.

Appendix D Linear Programs & Roundings against NA

D.1 Competing with the non-adaptive for choosing 11 box

The linear program for this case (LP-NA) is already given in the preliminaries section. The result in this case is a e/(e1)e/(e-1)-approximate partially adaptive strategy, given in [CGT+20] is formally restated below, and the rounding algorithm is presented in Algorithm 7.

Theorem D.1 (Theorem 4.2 from [CGT+20]).

There exists an efficient partially adaptive algorithm with cost at most e/(e1)e/(e-1) times the total cost of the optimal non-adaptive strategy.

Input: Solution 𝒙,𝒛\bm{x},\bm{z} to program (LP-NA); scenario ss
1 σ:=\sigma:= For t1t\geq 1, select and open box ii with probability xiixi\frac{x_{i}}{\sum_{i\in\mathcal{B}}x_{i}}.
τs:=\tau_{s}:= If box ii is opened at step tt, select the box and stop with probability zisxi\frac{z_{i}^{s}}{x_{i}}.
Algorithm 7 SPA vs NA from[CGT+20]

D.2 Competing with the non-adaptive benchmark for choosing kk boxes

We move on to consider the case where we are required to pick kk distinct boxes at every round. Similarly to the one box case, we define the optimal non-adaptive strategy that can be expressed by a linear program. We start by showing how to perform the rounding step of line 3 of Algorithm 3 in the case we have to select kk boxes. The guarantees are given in Theorem D.3 and the rounding is presented in Algorithm 8. This extends the results of [CGT+20] for the case of selecting kk items against the non-adaptive.

Lemma D.2.

There exists a scenario-aware partially adaptive 44-competitive algorithm to the optimal non-adaptive algorithm for picking kk boxes.

Combining this lemma with Theorem 3.4 from [CGT+20] we get Theorem D.3.

Theorem D.3.

We can efficiently find a partially-adaptive strategy for optimal search with kk options that is 4e/(e1)4e/(e-1)-competitive against the optimal non-adaptive strategy.

Before presenting the proof for Lemma D.2, we formulate our problem as a linear program as follows. The formulation is the same as LP-NA, we introduce constraints 10, since we need to pick kk boxes instead of 11.

minimize ixi\displaystyle\sum_{i\in\mathcal{B}}x_{i} +\displaystyle\quad+\quad 1|𝒮|i,s𝒮ciszis\displaystyle\frac{1}{|\mathcal{S}|}\sum_{i\in\mathcal{B},s\in\mathcal{S}}c_{i}^{s}z_{i}^{s} (LP-NA-k)
subject to izis\displaystyle\sum_{i\in\mathcal{B}}z_{i}^{s} =\displaystyle\quad=\quad k,\displaystyle k, s𝒮\displaystyle\forall s\in\mathcal{S} (10)
zis\displaystyle\hskip 19.91684ptz_{i}^{s} \displaystyle\quad\leq\quad xi,\displaystyle x_{i}, i,s𝒮\displaystyle\forall i\in\mathcal{B},s\in\mathcal{S}
xi,zis\displaystyle\hskip 5.69046ptx_{i},z_{i}^{s} \displaystyle\quad\in\quad [0,1]\displaystyle[0,1] i,s𝒮\displaystyle\forall i\in\mathcal{B},s\in\mathcal{S}

Denote by OPTp=ixi\text{OPT}_{p}=\sum_{i\in\mathcal{B}}x_{i} and OPTc=1/|𝒮|i,s𝒮ciszis\text{OPT}_{c}=1/|\mathcal{S}|\sum_{i\in\mathcal{B},s\in\mathcal{S}}c_{i}^{s}z_{i}^{s} to be the optimal opening cost and selected boxes’ costs, and respectively ALGp\text{ALG}_{p} and ALGc\text{ALG}_{c} the algorithm’s costs.

Input: Solution 𝒙,𝒛\bm{x},\bm{z} to above LP-NA-k, scenario ss. We set β=1/100,α=1/4950\beta=1/100,\alpha=1/4950
1 Denote by 𝒳low={i:xi<1/β}\mathcal{X}_{\text{low}}=\{i:x_{i}<1/\beta\} and X=i𝒳lowxiX=\sum_{i\in\mathcal{X}_{\text{low}}}x_{i}
2 σ:=\sigma:= open all boxes that xi1/βx_{i}\geq 1/\beta, from 𝒳low\mathcal{X}_{\text{low}} select each box ii w.p. xiX\frac{x_{i}}{X}
3
4 Denote by kk^{\prime} and OPTc\text{OPT}_{c}^{\prime} the values of OPTc\text{OPT}_{c} and kk restricted in the set 𝒳low\mathcal{X}_{\text{low}}
5 τs:=\tau_{s}:= select all boxes that zis1/βz_{i}^{s}\geq 1/\beta
6   Discard all boxes ii that ci>αOPTc/kc_{i}>\alpha\text{OPT}_{c}^{\prime}/k^{\prime}
7   From the rest select box ii with probability xiX\frac{x_{i}}{X}
8   Stop when we have selected kk boxes in total.
Algorithm 8 SPA vs NA, k-coverage
Proof of Lemma D.2.

Let (𝒙,𝒛)(\bm{x},\bm{z}) be the solution to (LP-NA-k), for some scenario s𝒮s\in\mathcal{S}. We round this solution through the following steps, bounding the extra cost occurred at every step. Let β>1\beta>1 be a constant to be set later.

  • Step 1: open all boxes ii with xi1/βx_{i}\geq 1/\beta, select all that zis1/βz_{i}^{s}\geq 1/\beta. This step only incurs at most β(OPTp+OPTc)\beta(\text{OPT}_{p}+\text{OPT}_{c}) cost. The algorithm’s value cost is ALGc=i:zis1/βci\text{ALG}_{c}=\sum_{i:z_{i}^{s}\geq 1/\beta}c_{i} while OPTc=iziscii:zis1/βcizis1/βi:zis1/βci=1/βALGc\text{OPT}_{c}=\sum_{i}z_{i}^{s}c_{i}\geq\sum_{i:z_{i}^{s}\geq 1/\beta}c_{i}z_{i}^{s}\geq 1/\beta\sum_{i:z_{i}^{s}\geq 1/\beta}c_{i}=1/\beta\text{ALG}_{c}. A similar argument holds for the opening cost.

  • Step 2: let 𝒳low={i:xi<1/β}\mathcal{X}_{\text{low}}=\{i:x_{i}<1/\beta\}, and denote by OPTc\text{OPT}_{c}^{\prime} and kk^{\prime} the new values for OPTc\text{OPT}_{c} and kk restricted on the set 𝒳low\mathcal{X}_{\text{low}} and by X=i𝒳lowxiX=\sum_{i\in\mathcal{X}_{\text{low}}}x_{i}.

    • Step 2a: convert values to either 0 or \infty by setting ci=c_{i}=\infty for every box ii such that ci>αOPTc/kc_{i}>\alpha\text{OPT}_{c}^{\prime}/k^{\prime} and denote by s={i:ciαOPTc/k}\mathcal{L}_{s}=\{i:c_{i}\leq\alpha\text{OPT}_{c}^{\prime}/k^{\prime}\}.

    • Step 2b: select every box with probability xiX\frac{x_{i}}{X}, choose a box only if it is in 𝒳low\mathcal{X}_{\text{low}}. Observe that the probability of choosing the jj’th box from LsL_{s} given that we already have chosen j1j-1 is

      Pr[choose j’th|have chosen j1]\displaystyle\textbf{Pr}\left[\text{choose }j\text{'th}|\text{have chosen }j-1\right] iLsxij/βX\displaystyle\geq\frac{\sum_{i\in L_{s}}x_{i}-j/\beta}{X} Since xi1/β for all xi𝒳low\displaystyle\text{Since }x_{i}\leq 1/\beta\text{ for all }x_{i}\in\mathcal{X_{\text{low}}}
      iLszisj/βX\displaystyle\geq\frac{\sum_{i\in L_{s}}z_{i}^{s}-j/\beta}{X} From LP constraint
      (11/α)kj/βOPTp\displaystyle\geq\frac{(1-1/\alpha)k^{\prime}-j/\beta}{\text{OPT}_{p}^{\prime}} From Markov’s Inequality
      (11/α)kk/βOPTp\displaystyle\geq\frac{(1-1/\alpha)k^{\prime}-k^{\prime}/\beta}{\text{OPT}_{p}^{\prime}} Since jk\displaystyle\text{Since }j\leq k^{\prime}
      (αββα)kαβOPTp\displaystyle\geq\frac{(\alpha\beta-\beta-\alpha)k^{\prime}}{\alpha\beta\text{OPT}_{p}^{\prime}}

      Therefore the expected time until we choose kk^{\prime} boxes is

      𝔼[ALGp]\displaystyle\mathbb{E}\left[\text{ALG}_{p}\right] =j=1k1Pr[choose j’th|have chosen j1]\displaystyle=\sum_{j=1}^{k^{\prime}}\frac{1}{\textbf{Pr}\left[\text{choose }j\text{'th}|\text{have chosen }j-1\right]}
      j=1kαβOPTp(αβαβ)k\displaystyle\leq\sum_{j=1}^{k^{\prime}}\alpha\beta\frac{\text{OPT}_{p}^{\prime}}{(\alpha\beta-\alpha-\beta)k^{\prime}}
      =αβOPTpαβαβ\displaystyle=\alpha\beta\frac{\text{OPT}_{p}^{\prime}}{\alpha\beta-\alpha-\beta}

    Observe also that since all values selected are are ciαOPTc/kc_{i}\leq\alpha\text{OPT}_{c}^{\prime}/k^{\prime}, we incur value cost ALGcαOPTc\text{ALG}_{c}\leq\alpha\text{OPT}_{c}^{\prime}.

Putting all the steps together, we get ALG(β+αβαβαβ)OPTp+(β+α)OPTc4OPT\text{ALG}\leq\left(\beta+\frac{\alpha\beta}{\alpha\beta-\alpha-\beta}\right)\text{OPT}_{p}+(\beta+\alpha)\text{OPT}_{c}\leq 4\text{OPT}, when setting a=2β/(β1)a=2\beta/(\beta-1) and β=1/100\beta=1/100

D.3 Competing with the non-adaptive benchmark for choosing a matroid basis

In this section \mathcal{F} requires us to select a basis of a given matroid. More specifically, assuming that boxes have an underlying matroid structure we seek to find a basis of size kk with the minimum cost and the minimum query time. Let r(A)r(A) denote the rank of the set AA\subseteq\mathcal{B}. Using the linear program of the kk-items case, we replace the constraints to ensure that ensure that we select at most r(A)r(A) number of elements for every set and that whatever set AA of boxes is already chosen, there still enough elements to cover the rank constraint. The guarantees for this case are given in Theorem D.5 and the rounding presented in Algorithm 9. This case also extends the results of [CGT+20].

Lemma D.4.

There exists a scenario-aware partially-adaptive O(logk)O(\log k)-approximate algorithm to the optimal non-adaptive algorithm for picking a matroid basis of rank kk.

Combining this lemma with Theorem 3.4 from [CGT+20] we get Theorem D.5.

Theorem D.5.

We can efficiently find a partially-adaptive strategy for optimal search over a matroid of rank kk that is O(logk)O(logk)-competitive against the optimal non-adaptive strategy.

In order to present the proof for Lemma D.4, we are using the LP formulation of the problem with a matroid constraint, as shown below. Let r(A)r(A) denote the rank of the set AA\subseteq\mathcal{B}. The difference with LP-NA-k is that we replace constraint 10 with constraint11 which ensures we select at most r(A)r(A) number of elements for every set and constraint (12) ensures that whatever set AA of boxes is already chosen, there still enough elements to cover the rank constraint.

minimize ixi\displaystyle\sum_{i\in\mathcal{B}}x_{i} +\displaystyle\quad+\quad 1|𝒮|i,s𝒮ciszis\displaystyle\frac{1}{|\mathcal{S}|}\sum_{i\in\mathcal{B},s\in\mathcal{S}}c_{i}^{s}z_{i}^{s} (LP-NA-matroid)
subject to izis\displaystyle\sum_{i\in\mathcal{B}}z_{i}^{s} \displaystyle\quad\leq\quad r(A),\displaystyle r(A), s𝒮,A\displaystyle\forall s\in\mathcal{S},A\subseteq\mathcal{B} (11)
iAzis\displaystyle\sum_{i\in A}z_{i}^{s} \displaystyle\quad\geq\quad r([n])r(A)\displaystyle r([n])-r(A) A,s𝒮\displaystyle\forall A\subseteq\mathcal{B},\forall s\in\mathcal{S} (12)
zis\displaystyle\hskip 19.91684ptz_{i}^{s} \displaystyle\quad\leq\quad xi,\displaystyle x_{i}, i,s𝒮\displaystyle\forall i\in\mathcal{B},s\in\mathcal{S} (13)
xi,zis\displaystyle\hskip 5.69046ptx_{i},z_{i}^{s} \displaystyle\quad\in\quad [0,1]\displaystyle[0,1] i,s𝒮\displaystyle\forall i\in\mathcal{B},s\in\mathcal{S}

Similarly to the case for kk items, denote by OPTp=ixi\text{OPT}_{p}=\sum_{i\in\mathcal{B}}x_{i} and OPTc=1/|𝒮|i,s𝒮ciszis\text{OPT}_{c}=1/|\mathcal{S}|\sum_{i\in\mathcal{B},s\in\mathcal{S}}c_{i}^{s}z_{i}^{s}, and ALGp,ALGc\text{ALG}_{p},\text{ALG}_{c} the respective algorithm’s costs.

Input: Solution 𝒙,𝒛\bm{x},\bm{z} to above LP-NA-matroid, scenario ss. We set β=1/100,α=1/4950\beta=1/100,\alpha=1/4950
1 Denote by 𝒳low={i:xi<1/β}\mathcal{X}_{\text{low}}=\{i:x_{i}<1/\beta\} and X=i𝒳lowxiX=\sum_{i\in\mathcal{X}_{\text{low}}}x_{i}
2 σ:=\sigma:= open all boxes that xi1/βx_{i}\geq 1/\beta, from 𝒳low\mathcal{X}_{\text{low}} select each box ii w.p. xiX\frac{x_{i}}{X}
3
4 Denote by kjk^{j} and OPTcj\text{OPT}_{c}^{j} the values of OPTc\text{OPT}_{c} and kk restricted in the set 𝒳low\mathcal{X}_{\text{low}} when jj boxes are selected.
5 τs:=\tau_{s}:= select all boxes that zis1/βz_{i}^{s}\geq 1/\beta
6   Discard all boxes ii that ci>αOPTcj/kjc_{i}>\alpha\text{OPT}_{c}^{j}/k^{j}
7   From the rest select box ii with probability xiX\frac{x_{i}}{X}
8   Stop when we have selected kk boxes in total.
Algorithm 9 SPA vs NA, matroid
Proof of Lemma D.4.

Similarly to Lemma D.2, let (𝒙,𝒛)(\bm{x},\bm{z}) be the solution to LP-NA-matroid, for some scenario s𝒮s\in\mathcal{S}. We round this solution through the following process. Let β>1\beta>1 be a constant to be set later.

  • Step 1: open all boxes ii with xi1/βx_{i}\geq 1/\beta, select all that zis1/βz_{i}^{s}\geq 1/\beta. This step only incurs at most β(OPTp+OPTc)\beta(\text{OPT}_{p}+\text{OPT}_{c}) cost.

  • Step 2: let 𝒳low={i:xi<1/β}\mathcal{X}_{\text{low}}=\{i:x_{i}<1/\beta\}. Denote by OPTc\text{OPT}_{c}^{\prime} and kk^{\prime} the new values of OPTc\text{OPT}_{c} and kk restricted on 𝒳low\mathcal{X}_{\text{low}}. At every step, after having selected jj boxes, we restrict our search to the set of low cost boxes sj={i:viαOPTcj/kj}\mathcal{L}_{s}^{j}=\{i:v_{i}\leq\alpha\text{OPT}_{c}^{j}/k^{j}\} where OPTcj\text{OPT}_{c}^{j} and kjk^{j} are the new values for OPTc\text{OPT}_{c} and kk after having selected kj=jk^{j}=j boxes.

    • Step 2a: Convert values to either 0 or \infty by setting vi=v_{i}=\infty for every box ii such that vi>αOPTcj/kjv_{i}>\alpha\text{OPT}_{c}^{j}/k^{j}.

    • Step 2b: Select every box with probability xiX\frac{x_{i}}{X}, choose a box only if it is in 𝒳low\mathcal{X}_{\text{low}}. Observe that the probability of choosing the jj’th box from LsL_{s} given that we already have chosen j1j-1 is

      Pr[choose j’th|have chosen j1]\displaystyle\textbf{Pr}\left[\text{choose }j\text{'th}|\text{have chosen }j-1\right] iLsj1xiX\displaystyle\geq\frac{\sum_{i\in L_{s}^{j-1}}x_{i}}{X}
      iLsj1zisX\displaystyle\geq\frac{\sum_{i\in L_{s}^{j-1}}z_{i}^{s}}{X} From LP constraint (13)
      k(kj)X\displaystyle\geq\frac{k-(k-j)}{X} From LP constraint (12)
      =jOPTp\displaystyle=\frac{j}{\text{OPT}_{p}^{\prime}}

      Therefore the expected time until we choose kk^{\prime} boxes is

      𝔼[ALGc]\displaystyle\mathbb{E}\left[\text{ALG}_{c}\right] =j=1k1Pr[choose j’th|have chosen j1]\displaystyle=\sum_{j=1}^{k^{\prime}}\frac{1}{\textbf{Pr}\left[\text{choose }j\text{'th}|\text{have chosen }j-1\right]}
      OPTpj=1k1j\displaystyle\leq\text{OPT}_{p}^{\prime}\sum_{j=1}^{k^{\prime}}\frac{1}{j}
      logkOPTp\displaystyle\leq\log k\cdot\text{OPT}_{p}

    Observe also that every time we choose a value from the set sj\mathcal{L}_{s}^{j}, therefore the total cost incurred by the selected values is

    ALGvi=1kαOPTcikii=1kOPTciαlogkOPTc\text{ALG}_{v}\leq\sum_{i=1}^{k^{\prime}}\alpha\frac{\text{OPT}_{c}^{i}}{k_{i}}\leq\sum_{i=1}^{k^{\prime}}\frac{\text{OPT}_{c}}{i}\leq\alpha\log k\cdot\text{OPT}_{c}

Putting all the steps together, we get ALGO(logk)OPT\text{ALG}\leq O(\log k)\text{OPT}