This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Differentially Private Online Submodular Maximization

Sebastian Perez-Salazar111Georgia Institute of Technology, [email protected]    Rachel Cummings222Georgia Institute of Technology, [email protected]. Supported in part by a Mozilla Research Grant, NSF grants CNS-1850187 and CNS-1942772 (CAREER), and a JPMorgan Chase Faculty Research Award.
Abstract

In this work we consider the problem of online submodular maximization under a cardinality constraint with differential privacy (DP). A stream of TT submodular functions over a common finite ground set UU arrives online, and at each time-step the decision maker must choose at most kk elements of UU before observing the function. The decision maker obtains a payoff equal to the function evaluated on the chosen set, and aims to learn a sequence of sets that achieves low expected regret.

In the full-information setting, we develop an (ε,δ)(\varepsilon,\delta)-DP algorithm with expected (11/e)(1-1/e)-regret bound of 𝒪(k2log|U|Tlogk/δε)\mathcal{O}\left(\frac{k^{2}\log|U|\sqrt{T\log k/\delta}}{\varepsilon}\right). This algorithm contains kk ordered experts that learn the best marginal increments for each item over the whole time horizon while maintaining privacy of the functions. In the bandit setting, we provide an (ε,δ+O(eT1/3))(\varepsilon,\delta+O(e^{-T^{1/3}}))-DP algorithm with expected (11/e)(1-1/e)-regret bound of 𝒪(logk/δε(k(|U|log|U|)1/3)2T2/3)\mathcal{O}\left(\frac{\sqrt{\log k/\delta}}{\varepsilon}(k(|U|\log|U|)^{1/3})^{2}T^{2/3}\right).

Our algorithms contains kk ordered experts that learn the best marginal item to select given the items chosen her predecessors, while maintaining privacy of the functions. One challenge for privacy in this setting is that the payoff and feedback of expert ii depends on the actions taken by her i1i-1 predecessors. This particular type of information leakage is not covered by post-processing, and new analysis is required. Our techniques for maintaining privacy with feedforward may be of independent interest.

1 Introduction

Ensuring users’ privacy has become a critical task in online learning algorithms. As an illustrative example, sponsored search engines aim to maximize the probability that displayed ads or products are clicked by incoming customers, but prospective customers do not want their privacy infringed after clicking on a product. Users visiting online retailer web-pages such as Amazon, Walmart or Target leave behind an abundance of sensitive personal information that can be use to predict their behaviors or preferences, potentially leading to catastrophic results (Zhang et al.,, 2014)333See also https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html. In this work, we introduce the first algorithms for privacy-preserving online monotone submodular maximization under a cardinality constraint.

A submodular set function f:2Uf:2^{U}\to\mathbb{R} exhibits diminishing returns, meaning that adding an element xx to a larger set BB creates less additional value than adding xx any subset of BB. (See Definition 1 in Section 2 for a formal definition.) Submodular functions have found widespread application in economics, computer science and operations research (see, e.g., Bach, (2013) and Krause and Golovin, (2014)), and have recently gained attention as a modeling tool for data summarization and ad display (Ahmed et al.,, 2012; Streeter et al.,, 2009; Badanidiyuru et al.,, 2014). We additionally consider monotone submodular functions, where adding elements to a set can only increase the value of ff. Since unconstrained monotone submodular maximization is trivial—f(S)f(S) can be maximized by choosing the entire universe S=US=U—we consider cardinality constrained maximization, where the decision-maker solves: maxSUf(S) s.t. |S|k\max_{S\subseteq U}f(S)\text{ s.t. }|S|\leq k.

In the online learning setting, at each time-step tt a learner must choose a set StUS_{t}\subseteq U of size at most kk and receives payoff ft(St)f_{t}(S_{t}) for a monotone submodular function ftf_{t}. Importantly, the learner does not know ftf_{t} before she chooses StS_{t}, but this set can be chosen based on previous functions f1,,ft1f_{1},\ldots,f_{t-1}. Two types of informational feedback are commonly studied in the online learning literature. In the full-information setting, the learner gets full oracle access to the function ftf_{t} after choosing StS_{t}, and thus is able to incorporate the entirety of previous functions into her future decisions. In the bandit setting, the learner only observes her own payoff ft(St)f_{t}(S_{t}) as feedback.

Performance of an online learner is typically measured by the regret, which is the difference between the best fixed decision in hindsight and the cumulative payoff obtained by the learner (Zinkevich,, 2003; Hazan,, 2016; Shalev-Shwartz,, 2012). More precisely, the regret of a learner after TT rounds is: max|S|kt=1Tft(S)t=1Tft(St)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\sum_{t=1}^{T}f_{t}(S_{t}). The aim often is to design algorithms with sublinear regret, i.e., o(T)o(T), so that the average payoff over time of the algorithm is comparable with the best average fixed profit in hindsight. Offline monotone submodular maximization under a cardinality constraint is NP-hard to approximate with a factor better than (11/e)(1-1/e) (Feige,, 1998; Mirrokni et al.,, 2008), so we instead measure the quality of our algorithms using the more restrictive notion of (11/e)(1-1/e)-regret (Streeter and Golovin,, 2009; Streeter et al.,, 2009):

T=(11e)max|S|kt=1Tft(S)t=1Tft(St).\operatorname{\mathcal{R}}_{T}=\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\sum_{t=1}^{T}f_{t}(S_{t}). (1)

The privacy notion we consider in this work is differential privacy (Dwork et al.,, 2006), which enables accurate estimation of population-level statistics while ensuring little can be learned about the individuals in the database. Informally, a randomized algorithm is said to be differentially private if changing a single entry in the input database results in only a small distributional change in the outputs. (See Definition 2 in Section 2 for a formal definition.) This means that an adversary cannot information-theoretically infer whether or not a single individual participated in the database. Differentially private algorithms have been deployed by major organizations including Apple, Google, Microsoft, Uber, and the U.S. Census Bureau, and are seen as the gold standard in privacy-preserving data analysis. In this work, the input database to our learning algorithm consists of a stream of functions F={f1,,fT}F=\{f_{1},\ldots,f_{T}\}, and each individual’s data corresponds to a function ftf_{t}. Our privacy guarantees ensure that the stream of chosen sets S1,,STS_{1},\ldots,S_{T} are differentially private with respect to this database of functions,

In both the full-information and bandit settings, we present differentially private online learning algorithms that achieve sublinear expected (11/e)(1-1/e)-regret.

Motivating Example.

While there are countless examples of practical online submodular maximization problems using sensitive data, we offer this motivating example for concreteness. Consider an online product display model where a website has kk display slots and wants to maximize the probability of any displayed product being clicked. Each customer tt has a (privately known) probability patp_{a}^{t} of clicking a display for product aUa\in U, independently of the other products displayed. Let ft(S)f_{t}(S) denote the probability that customer tt clicks on any product in a display set SS. We can write this function in closed form as ft(S)=1aS(1pat)f_{t}(S)=1-\prod_{a\in S}(1-p_{a}^{t}). Note that this function is submodular because adding products to the set SS exhibits diminishing returns in total click probability. Each customer’s click-probabilities {pat}aU\{p^{t}_{a}\}_{a\in U} contain sensitive information about his preferences or habits, and require formal privacy protections.

1.1 Our Results

Our main results are differentially private algorithms for online submodular maximization under a cardinality constraint. We provide algorithms that achieve sublinear expected (11/e)(1-1/e)-regret in both the full-information and bandit settings.

Our algorithms are based on the approach of Streeter and Golovin, (2009), who designed (non-private) online algorithms with low expected (11/e)(1-1/e)-regret for submodular maximization. We adapt and extend their techniques to additionally satisfy differential privacy. Following the spirit of Streeter and Golovin, (2009), our algorithms have kk ordered online learning algorithms, or experts, that together pick kk items at every time-step and learn from their decisions over time. Roughly speaking, expert ii learns how to choose an item that compliments the decisions of the previous i1i-1 experts. The expected (11/e)(1-1/e)-regret can be bounded by the regret of these kk experts, so to show a low (11/e)(1-1/e)-regret algorithm that preserves privacy, we simply need to find no-regret experts that together preserve privacy. Ideally, we would like each expert to be differentially private so that simple composition and post-processing arguments would yield overall privacy guarantees. Unfortunately this is not possible for k>1k>1 because the choices of all previous experts alter the distribution of payoffs for expert ii.

Specifically, the ii-th expert non-privately queries the function (i.e., accesses the database) at |U||U| points that depend on the action of the previous experts. A naive solution is to allow each expert to query the function at any of its 2|U|2^{|U|} values, and then privacy would be satisfied by post-processing on the differentially private outputs of previous experts. However, this larger domain size requires large quantities of noise that would harm the experts’ no-regret guarantees. Effectively, this decouples the advice of the kk experts, so that experts are not learning from each other. This naturally helps privacy but harms learning. Instead, we restrict each expert to a domain of size |U||U| that is defined by the actions of previous experts. This ensures no-regret learning, but post-processing no longer ensures privacy. We overcome this challenge by showing that together the experts are differentially private and sufficiently low quantities of noise are needed.

Theorem 1 below is an informal version of our main results in the full-information setting (Theorems 5 and 6 in Section 3).

Theorem 1 (Informal).

In the full-information setting, Algorithm 2 for online monotone kk-cardinality-constrained submodular maximization is (ε,δ)(\varepsilon,\delta)-differentially private and guarantees

𝔼[T]=𝒪(k2log|U|Tlog(k/δ)ε).\operatorname{\mathbb{E}}\left[\operatorname{\mathcal{R}}_{T}\right]=\mathcal{O}\left(\frac{k^{2}\log|U|\sqrt{T\log(k/\delta)}}{\varepsilon}\right).

In the bandit setting, each expert only receives its own payoff as feedback, and does not have oracle access to the entire function. For this setting, we modify the full-information algorithm by using a biased estimator of the marginal increments for other actions.

The algorithm also requires additional privacy considerations. The non-private approach of Streeter and Golovin, (2009) randomly decides in each round whether to explore or exploit. In exploit rounds, the experts sample a new set but play the current-optimal action, providing both learning and exploitation. Directly privatizing this algorithm incurs additional privacy loss from the exploit rounds, which leads to a weak bound of 𝒪(T3/4)\mathcal{O}(T^{3/4}) for the expected (11/e)(1-1/e)-regret, far from the best known 𝒪(T2/3)\mathcal{O}(T^{2/3}). Instead, we have the experts sample new sets only after an exploration round has occurred. The choice to explore is data-independent, so privacy is maintained by post-processing. If the exact number and timing of explore rounds are known in advance, this results in an (ε,δ)(\varepsilon,\delta)-DP algorithm. However, this approach requires Ω(T2/3+k|U|)\Omega(T^{2/3}+k|U|) space, which is not appealing in practical settings where TT is substantially larger than UU. Instead we allow explore-exploit decisions to be made online and obtain a high probability bound on the number of explore rounds based on the sampling parameter. At the expense of an exponentially small loss in the δ\delta privacy parameter—resulting from the failure of the high probability bound—we obtain the asymptotically optimal 𝒪(T2/3)\mathcal{O}(T^{2/3}) expected (11/e)(1-1/e)-regret.

Theorem 2 is an informal version of our main results in the more challenging bandit feedback setting (Theorems 7 and 8 in Section 4).

Theorem 2 (Informal).

In the bandit feedback setting, Algorithm 3 for online monotone kk-cardinality-constrained submodular maximization is (ε,δ+e8T1/3)(\varepsilon,\delta+e^{-8T^{1/3}})-differentially private and guarantees

𝔼[T]=𝒪(logk/δε(k(|U|log|U|)1/3)2T2/3).\operatorname{\mathbb{E}}\left[\operatorname{\mathcal{R}}_{T}\right]=\mathcal{O}\left(\frac{\sqrt{\log k/\delta}}{\varepsilon}(k(|U|\log|U|)^{1/3})^{2}T^{2/3}\right).

The best known non-private expected (11/e)(1-1/e)-regret in the full-information setting is 𝒪(kTlog|U|)\mathcal{O}\left(\sqrt{kT\log|U|}\right) and in the bandit setting is 𝒪(k(|U|log|U|)1/3T2/3)\mathcal{O}\left(k(|U|\log|U|)^{1/3}T^{2/3}\right) (Streeter and Golovin,, 2009). Comparing our expected (11/e)(1-1/e)-regret bounds to these, we see that our bounds are asymptotically optimal in TT, and have slight gaps in terms of kk and UU. Typically, the dominating term is the time horizon TT with k|U|Tk\leq|U|\ll T, so our results match the best expected (11/e)(1-1/e)-regret asymptotically in TT.

Additionally, we show that our algorithms can be extended to a continuous generalization of submodular functions, know as DR-submodular functions. We provide a differentially private online learning algorithm for DR-submodular maximization that achieves low expected regret. A brief overview of this extension is given in Section 5, with further details in the appendix.

1.2 Related Work

Online learning (Zinkevich,, 2003; Cesa-Bianchi and Lugosi,, 2006; Hazan,, 2016; Shalev-Shwartz,, 2012) has gained increasing attention for making decisions in dynamic environments when only partial information is available. Its applicability in ad placement (Chatterjee et al.,, 2003; Chapelle and Li,, 2011; Tang et al.,, 2014) has made this model attractive from a practical viewpoint.

Submodular optimization has been widely studied, due to the large number of important submodular functions, such as the cut of a graph, entropy of a set of random variables, and the rank of a matroid, to name only a few. For more applications see (Schrijver,, 2003; Williamson and Shmoys,, 2011; Bach,, 2013). While (unconstrained) submodular minimization can be solved with polynomial number of oracle calls (Schrijver,, 2003; Bach,, 2013), submodular maximization is known to be NP-hard for general submodular functions. For monotone submodular functions under cardinality constraint, it is impossible to find a polynomial time algorithm that achieves a fraction better than (11/e)(1-1/e) of the optimal solution unless P=NP (Feige,, 1998), and this approximation factor is achieved by the greedy algorithm (Fisher et al.,, 1978). For further results with more general constraints, we refer the reader to the survey (Krause and Golovin,, 2014). In the online setting, Streeter and Golovin, (2009) and Streeter et al., (2009) were the first to study online monotone submodular maximization, respectively with cardinality/knapsack constraints and partition matroid constraints. Recently, continuous submodularity, has gained attention in the continuous optimization community Hassani et al., (2017); Niazadeh et al., (2018); Zhang et al., (2020). See Chen et al., 2018a ; Chen et al., 2018b for online continuous submodular optimization algorithms.

Differential privacy (Dwork et al.,, 2006) has become the gold standard for individual privacy, and there as been a large literature developed of differentially private algorithms for a broad set of analysis tasks. See Dwork and Roth, (2014) for a textbook treatment. Due to privacy concerns in practical applications of online learning, there has been growing interest in implementing well-known methods—such as experts algorithms and gradient optimization methods–in a differentially private way. See for instance (Jain et al.,, 2012; Thakurta and Smith,, 2013).

Differential privacy and submodularity were first jointly considered in (Gupta et al.,, 2010). They studied the combinatorial public projects problem, where the objective function was a sum of monotone submodular functions, each representing an agent’s private valuation function, and a decision-maker must maximize this objective subject to a cardinality constraint. The authors designed an (ε,0)(\varepsilon,0)-DP algorithm using the Exponential Mechanism of (McSherry and Talwar,, 2007) as a private subroutine, and achieved a (11/e)(1-1/e)-approximation to the optimal non-private solution, plus an additional ε1\propto\varepsilon^{-1} term. Later, Mitrovic et al., (2017) extended these results to monotone submodular functions in the cardinality, matroid and pp-system constraint cases. Their methods also used the Exponential Mechanism to ensure differential privacy. See also recent work by Rafiey and Yoshida, (2020).

In the online learning framework, Cardoso and Cummings, (2019) study online (unconstrained) differentially private submodular minimization. They use the Lovász extension of a set function as a convex proxy to apply known privacy tools that work in online convex optimization (Jain et al.,, 2012; Thakurta and Smith,, 2013). Since submodular minimization and maximization are fundamentally different technical problems, the techniques of Cardoso and Cummings, (2019) do not extend to our setting.

Fundamental to our analysis is the differentially private Exponential Mechanism of McSherry and Talwar, (2007) and its inherent connection to multiplicative weights algorithms (Hazan,, 2016; Shalev-Shwartz,, 2012) to estimate probability distributions in the simplex while preserving privacy.

2 Preliminaries

In this section we review definitions and properties of submodular functions and differential privacy.

Definition 1 (Submodularity).

A function f:2Uf:2^{U}\to\mathbb{R} is submodular if it satisfies the following diminishing returns property: For all ABUA\subseteq B\subseteq U and xBx\notin B,444 Equivalently, ff is submodular if f(AB)+f(AB)f(A)+f(B)f(A\cap B)+f(A\cup B)\leq f(A)+f(B) for all A,BUA,B\subseteq U.

f(A{x})f(A)f(B{x})f(B).f(A\cup\{x\})-f(A)\geq f(B\cup\{x\})-f(B).

As is standard in the submodular maximization literature, we assume f()=0f(\emptyset)=0. In our motivating example, this means that if no items are shown to the incoming customer, then the probability of selecting an item is 0. We let \mathcal{F} denote the family of submodular functions with finite ground set UU. For the sake of simplicity, we will additionally assume that all functions take value in the interval [0,1][0,1]. In this work, we additionally consider set functions ff that are monotone or non-decreasing, i.e., f(A)f(B)f(A)\leq f(B) for all ABA\subseteq B.

In the problem of online monotone submodular maximization under a cardinality constraint, a sequence of TT monotone submodular functions f1,,fT:2U[0,1]f_{1},\ldots,f_{T}:2^{U}\to[0,1] arrive in an online fashion. At every time-step tt, the decision maker 𝒜\mathcal{A} has to choose a subset StUS_{t}\subseteq U of size at most kk before observing ftf_{t}. This decision must be based solely on previous observations. The decision maker 𝒜\mathcal{A} receives a payoff ft(St)f_{t}(S_{t}) and her goal is to minimize the (11/e)(1-1/e)-expected-regret 𝔼[T]\operatorname{\mathbb{E}}[\operatorname{\mathcal{R}}_{T}], where T=(11e)max|S|kt=1Tft(S)t=1Tft(St)\operatorname{\mathcal{R}}_{T}=\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\sum_{t=1}^{T}f_{t}(S_{t}) as defined in Equation (1), and the randomness is over the algorithm’s choices.

A fundamental tool in our analysis is the Hedge algorithm (Algorithm 1) of Freund and Schapire, (1997) which chooses an action from a set [N]={1,,N}[N]=\{1,\ldots,N\} based on past payoffs from each action. The algorithm takes as input a learning rate η\eta and a stream of linear functions g1,,gT:[N][0,1]g_{1},\ldots,g_{T}:[N]\to[0,1], where the payoff of playing action ii at time tt is gt(i)g_{t}(i).

In our setting, the learner must select a set of at most kk items from the ground set UU. The learner does this by implementing kk ordered copies of the Hedge algorithm, each of which choses one item, so the action space for each instantiation is the ground set: N=UN=U. The ii-th copy of Hedge learns the item with the best marginal gain given the decisions made by the previous i1i-1 Hedge algorithms.

Initialize w1=(1,,1)Nw_{1}=(1,\ldots,1)\in\mathbb{R}^{N}
for t=1,,Tt=1,\ldots,T do
       Sample action it[N]i_{t}\in[N] w.p. xt(i)=wt(i)jwt(j)x_{t}(i)=\frac{w_{t}(i)}{\sum_{j}w_{t}(j)}
       Obtain payoff gt(it)g_{t}(i_{t}) and full access to gtg_{t}
       Update wt+1(i)=wt(i)eηgt(i)w_{t+1}(i)=w_{t}(i)e^{\eta g_{t}(i)}
Algorithm 1 Hedge(η,g1,,gT\eta,g_{1},\ldots,g_{T})

The Hedge algorithm exhibits the following guarantee, which is useful for analyzing its regret, as well as the regret of our algorithms which instantiate Hedge.

Theorem 3 (Freund and Schapire, (1997)).

For any i[N]i\in[N], the distributions 𝐱1,,𝐱T\mathbf{x}_{1},\ldots,\mathbf{x}_{T} over [N][N] constructed by Algorithm 1 satisfy

t=1Tgt(i)t=1T𝐱tgtηt=1T𝐱tgt2+logNη,\sum_{t=1}^{T}g_{t}(i)-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}g_{t}\leq\eta\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}g_{t}^{2}+\frac{\log N}{\eta},

where gt2g^{2}_{t} is the vector gtg_{t} with each coordinate squared.

For the privacy considerations of this work, we view the input database as the ordered input sequence of submodular functions F={f1,,fT}F=\{f_{1},\ldots,f_{T}\} and the algorithm’s output as the sequence of chosen sets S1,,STS_{1},\ldots,S_{T}. We say that two sequences F,FF,F^{\prime} of functions are neighboring if ftftf_{t}\neq f_{t}^{\prime} for at most one t[T]t\in[T].

Definition 2 (Differential Privacy (Dwork et al.,, 2006)).

An online learning algorithm 𝒜:T(2U)T\mathcal{A}:\mathcal{F}^{T}\to(2^{U})^{T} is (ε,δ)(\varepsilon,\delta)-differentially private if for any neighboring function databases F,FF,F^{\prime}, and any event S(2U)TS\subseteq(2^{U})^{T},

Pr(𝒜(F)S)eεPr(𝒜(F)S)+δ.\operatorname{\Pr}(\mathcal{A}(F)\in S)\leq e^{\varepsilon}\operatorname{\Pr}(\mathcal{A}(F^{\prime})\in S)+\delta.

Differential privacy is robust to post-processing, meaning that any function of a differentially private output maintains the same privacy guarantee.

Proposition 1 (Post-Processing (Dwork et al.,, 2006)).

Let :T\mathcal{M}:\mathcal{F}^{T}\to\mathcal{R} be an (ε,δ)(\varepsilon,\delta)-DP algorithm and let h:h:\mathcal{R}\to\mathcal{R}^{\prime} be an arbitrary function. Then, h:T\mathcal{M}^{\prime}\doteq h\circ\mathcal{M}:\mathcal{F}^{T}\to\mathcal{R}^{\prime} is also (ε,δ)(\varepsilon,\delta)-DP.

Differentially private algorithms also compose, and the privacy guarantees degrade gracefully as addition DP computations are performed. This enables modular algorithm design using simple differentially private building blocks. Basic Composition (Dwork et al.,, 2006) says that can simply add up the privacy parameters used in an algorithm’s subroutines to get the overall privacy guarantee. The following Advanced Composition theorem provides even tighter bounds.

Theorem 4 (Advanced Composition (Dwork et al., 2010b, )).

Let 1,,k\mathcal{M}_{1},\ldots,\mathcal{M}_{k} each be (ε,δ)(\varepsilon,\delta)-DP algorithms. Then, =(1,,k)\mathcal{M}=(\mathcal{M}_{1},\ldots,\mathcal{M}_{k}) is (ε,kδ+δ)(\varepsilon^{\prime},k\delta+\delta^{\prime})-DP for ε=2klog(1/δ)ε+kε(eε1)\varepsilon^{\prime}=\sqrt{2k\log(1/\delta^{\prime})}\varepsilon+k\varepsilon(e^{\varepsilon}-1) and any δ0\delta^{\prime}\geq 0.

Our algorithms rely on the Exponential Mechanism (EM) introduced by McSherry and Talwar, (2007). The EM takes in database FF, a finite action set UU, and a quality score q:T×Uq:\mathcal{F}^{T}\times U\to\mathbb{R}, where q(F,i)q(F,i) assigns a numeric score to the quality of outputting ii on input database FF. The sensitivity of the quality score, denoted Δq\Delta q, is the maximum change in the value of qq across neighboring databases: Δq=maxiUmaxF,Fneighbors|q(F,i)q(F,i)|\Delta q=\max_{i\in U}\max_{F,F^{\prime}\;neighbors}|q(F,i)-q(F^{\prime},i)|. Given these inputs, the EM outputs iUi\in U with probability proportional to exp(εq(F,i)2Δq)\exp(\varepsilon\frac{q(F,i)}{2\Delta q}). The Exponential Mechanism is (ε,0)(\varepsilon,0)-DP (McSherry and Talwar,, 2007).

As noted by Jain et al., (2012) and Dwork et al., 2010a , the Hedge algorithm can be converted into a DP algorithm using advanced composition and EM.

Proposition 2.

If η=ε32Tlog1/δ\eta=\frac{\varepsilon}{\sqrt{32T\log 1/\delta}}, then Hedge (Algorithm 1) is (ε,δ)(\varepsilon,\delta)-DP.

3 Full Information Setting

In this section, we introduce our first algorithm for online submodular maximization under cardinality constraint. It is both differentially private and achieves the best known expected (11/e)(1-1/e)-regret in TT. For cardinality kk, the learner implements kk ordered copies of the Hedge algorithm. Each copy is in charge of learning the marginal gain that complements the choices of the previous Hedge algorithms. At time-step tt, each Hedge algorithm selects an element aUa\in U and the learner gathers these choices to play the corresponding set. When she obtains oracle access to the submodular function, for each i[k]i\in[k], she constructs a vector gtig_{t}^{i} with aa-th coordinate given by the marginal gain of adding aUa\in U to the choices made by the previous i1i-1 Hedge algorithms. Finally, she feeds back the vector gtig_{t}^{i} to Hedge algorithm ii. A formal description of this procedure is presented in Algorithm 2.

Initialize: Set η=εk32Tlog(k/δ)\eta=\frac{\varepsilon}{k\sqrt{32T\log(k/\delta)}}
Instantiate kk parallel copies 1,,k\mathcal{E}_{1},\ldots,\mathcal{E}_{k} of Hedge algorithm with rate η\eta.
for t=1,,Tt=1,\ldots,T do
       For each i=1,,ki=1,\ldots,k, sample atia_{t}^{i} given by i\mathcal{E}_{i}.
       Play St=i=1k{ati}S_{t}=\cup_{i=1}^{k}\{a_{t}^{i}\}.
       Obtain ft(St)f_{t}(S_{t}) and oracle access to ftf_{t}.
       For each i=1,,ki=1,\ldots,k, define linear function gti:U[0,1]g_{t}^{i}:U\to[0,1]:
gti(a)=ft(Sti1+a)ft(Sti1),aU,g_{t}^{i}(a)=f_{t}(S_{t}^{i-1}+a)-f_{t}(S_{t}^{i-1}),\quad\forall a\in U,
where Sti=j=1i{atj}S_{t}^{i}=\cup_{j=1}^{i}\{a_{t}^{j}\}.
       Feed back each Hedge algorithm i\mathcal{E}_{i} with gtig_{t}^{i}
Algorithm 2 FI-DP(F={ft}t=1T,k,ε,δ)(F=\{f_{t}\}_{t=1}^{T},k,\varepsilon,\delta)

To ensure differential privacy, it would be enough to show that each Hedge i\mathcal{E}_{i} is (ε/k,δ/k)(\varepsilon/k,\delta/k)-DP. Indeed, if the sequence (a1i,,aTi)(a_{1}^{i},\ldots,a_{T}^{i}) constructed by each Hedge algorithm ii is (ε/k,δ/k)(\varepsilon/k,\delta/k)-DP, then by Basic Composition and post-processing, the sequence (S1,,ST)(S_{1},\ldots,S_{T}) is (ε,δ)(\varepsilon,\delta)-DP, where St={ati}i=1kS_{t}=\{a_{t}^{i}\}_{i=1}^{k}. However, for i2i\geq 2, the output of expert i\mathcal{E}_{i} depends on the choices made by algorithms 1,,i1\mathcal{E}_{1},\ldots,\mathcal{E}_{i-1}. Moreover, algorithm i\mathcal{E}_{i} by itself is again accessing the database FF, hence ruling out a post-processing argument. Despite this, we show that all experts together are (ε,δ)(\varepsilon,\delta)-DP even though individually we cannot ensure they preserve (ε/k,δ/k)(\varepsilon/k,\delta/k)-DP.

It is worth noting that the Hedge algorithms 1,,k\mathcal{E}_{1},\ldots,\mathcal{E}_{k} in Algorithm 2 can be replaced by any other no-regret DP method that selects items over UU, and the same proof structure would follow—although the regret bound would depend on the choice of no-regret algorithm. For instance, if we utilize the private experts method of(Thakurta and Smith,, 2013) instead of the Hedge algorithm, Algorithm 2 would be (ε,0)(\varepsilon,0)-DP with a regret bound of 𝒪(k2|U|Tlog2.5Tε)\mathcal{O}\left(k^{2}\frac{\sqrt{|U|T\log^{2.5}T}}{\varepsilon}\right).

Theorem 5.

Algorithm 2 is (ε,δ)(\varepsilon,\delta)-differentially private.

Theorem 6.

Algorithm 2 has (11/e)(1-1/e)-expected-regret

𝔼[T]\displaystyle\operatorname{\mathbb{E}}\left[\operatorname{\mathcal{R}}_{T}\right] 𝒪(k2log|U|Tlog(k/δ)ε).\displaystyle\leq\mathcal{O}\left(\frac{k^{2}\log|U|\sqrt{T\log(k/\delta)}}{\varepsilon}\right).

Proof of Theorem 5

The output of Algorithm 2 is the stream of sets (S1,,ST)(S_{1},\ldots,S_{T}). Before showing that this output preserves privacy, we deal with a simpler case from which we can deduce an inductive argument.

Note that 1(F)\mathcal{E}_{1}(F) receives as feedback the functions gt1=(ft(a))aUg_{t}^{1}=(f_{t}(a))_{a\in U} at each time step. By Proposition 2, we have that 1\mathcal{E}_{1} is (ε/k,δ/k)(\varepsilon/k,\delta/k)-DP given that η=εk32Tlogk/δ\eta=\frac{\varepsilon}{k\sqrt{32T\log k/\delta}}. On the other hand 2(F)\mathcal{E}_{2}(F) receives as feedback the functions gt2=(ft(at1+a)ft(at1))aUg_{t}^{2}=(f_{t}(a_{t}^{1}+a)-f_{t}(a_{t}^{1}))_{a\in U} at each time-step, where at1a_{t}^{1} is computed by 1(F)\mathcal{E}_{1}(F). Therefore, the output of 2\mathcal{E}_{2} depends uniquely on the choices of 1\mathcal{E}_{1}, hence, conditioning on these choices, 2\mathcal{E}_{2} should also be (ε/k,δ/k)(\varepsilon/k,\delta/k)-DP. We generalize and formalize this in the next few paragraphs.

Consider the following family of algorithms: For a1,,ai1UTa^{1},\ldots,a^{i-1}\in U^{T} let Si1={ai1,,a1}S^{i-1}=\{a^{i-1},\ldots,a^{1}\}. For t=1,,Tt=1,\ldots,T, let tSi1:TΔ(U)\mathcal{M}^{S^{i-1}}_{t}:\mathcal{F}^{T}\to\Delta(U) be the EM that outputs aUa\in U with probability proportional to eητ<tfτ(Sτi1{a})fτ(Sτi1)e^{\eta\sum_{\tau<t}f_{\tau}(S_{\tau}^{i-1}\cup\{a\})-f_{\tau}(S_{\tau}^{i-1})}. Each of these mechanisms is 2η2\eta-DP by Proposition 2. Therefore, by Advanced Composition and our choice of η\eta, Si1:=(1Si1,,TSi1)\mathcal{M}^{S^{i-1}}:=(\mathcal{M}_{1}^{S^{i-1}},\ldots,\mathcal{M}_{T}^{S^{i-1}}) is (ε/k,δ/k)(\varepsilon/k,\delta/k)-DP. Note that for SUTS\subseteq U^{T} we have

Pr(i(F)S(i1,,1)(F)=Si1)=Pr(Si1(F)S)\displaystyle\operatorname{\Pr}(\mathcal{E}_{i}(F)\in S\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{1})(F)=S^{i-1})=\operatorname{\Pr}(\mathcal{M}^{S^{i-1}}(F)\in S)

and the latter expression describes the output of an (ε/k,δ/k)(\varepsilon/k,\delta/k)-DP algorithm. This formalizes the idea that 2\mathcal{E}_{2} is (ε/k,δ/k)(\varepsilon/k,\delta/k)-DP if the choices of 1\mathcal{E}_{1} are fixed. We utilize this idea to show that together (k,,1)(\mathcal{E}_{k},\ldots,\mathcal{E}_{1}) are (ε,δ)(\varepsilon,\delta)-DP. This is formally presented in Lemma 1. The proof of this result (formally given in Appendix A.1) is an inductive argument that takes advantage of the DP guarantee of the mechanisms Si1\mathcal{M}^{S^{i-1}}.

Lemma 1.

For any i[k]i\in[k], the function (i,i1,,1):TUT××UT(\mathcal{E}_{i},\mathcal{E}_{i-1},\ldots,\mathcal{E}_{1}):\mathcal{F}^{T}\to U^{T}\times\cdots\times U^{T} which is the composition of the first ii Hedge algorithms is (iε/k,iδ/k)(i\varepsilon/k,i\delta/k)-DP.

Lemma 1 with i=ki=k and post-processing ensures that Algorithm 2 is (ε,δ)(\varepsilon,\delta)-DP.

Proof of Theorem 6

The key idea is to bound the (11/e)(1-1/e)-regret of Algorithm 2 by the regret incurred by the kk Hedge algorithms 1,,k\mathcal{E}_{1},\ldots,\mathcal{E}_{k}. We formalize this in Proposition 3 below. With this bound, we can utilize the regret bound of the Hedge algorithm and conclude the proof. The regret incurred by i\mathcal{E}_{i} is

ri=maxaU\displaystyle r_{i}=\max_{a\in U} t=1Tgti(a)t=1Tgti(at),\displaystyle\sum_{t=1}^{T}g_{t}^{i}(a)-\sum_{t=1}^{T}g_{t}^{i}(a_{t}),

where gti(a)=ft(Sti1{a})ft(Sti1)g_{t}^{i}(a)=f_{t}(S_{t}^{i-1}\cup\{a\})-f_{t}(S_{t}^{i-1}).

Proposition 3.

The (11/e)(1-1/e)-regret of Algorithm 2 is bounded by the expected regret of 1,,k\mathcal{E}_{1},\ldots,\mathcal{E}_{k}.

While a full proof of Proposition 3 is deferred to in Appendix A.2, we describe the key idea here. To bound the (11/e)(1-1/e)-regret, we rewrite the regret rir_{i} via the function F:2[T]×U[0,1]F:2^{[T]\times U}\to[0,1], F(A)=1Tt=1Tf(At)F(A)=\frac{1}{T}\sum_{t=1}^{T}f(A_{t}), where At={uU:(t,u)A}A_{t}=\{u\in U:(t,u)\in A\} as:

riT=maxaUF(S~i1{a})F(S~i)\frac{r_{i}}{T}=\max_{a\in U}F(\widetilde{S}^{i-1}\cup\{a\})-F(\widetilde{S}^{i})

where S~=t=1T{t}×S\widetilde{S}^{\ell}=\bigcup_{t=1}^{T}\{t\}\times S^{\ell}. We show that F(S~i)F(S~i1)F(OPT~)F(S~i1)kriTF(\widetilde{S}^{i})-F(\widetilde{S}^{i-1})\geq\frac{F(\widetilde{OPT})-F(\widetilde{S}^{i-1})}{k}-\frac{r_{i}}{T}, where OPT~\widetilde{OPT} is the extension of OPT=argmax|S|kt=1Tft(S)OPT=\mathrm{argmax}_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S) to [T]×U[T]\times U. Upon unrolling this recursion, we obtain the result.

To finish the proof of Theorem 6 we need to bound the overall regret of all i\mathcal{E}_{i}. Observe that once we have fixed S1i1,,STi1S_{1}^{i-1},\ldots,S_{T}^{i-1}, the feedback of expert ii is completely determined since the elements at1,,ati1a_{t}^{1},\ldots,a_{t}^{i-1} depend only on experts 1,,i11,\ldots,i-1. Therefore, we have

𝔼[riS1i1,,STi1]ηT+log|U|η\displaystyle\operatorname{\mathbb{E}}[r_{i}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}]\leq\eta T+\frac{\log|U|}{\eta}

by the Hedge regret guarantee. Integrating from kk to 11 we get 𝔼[T]i=1k𝔼[ri]k(ηT+log|U|η)\operatorname{\mathbb{E}}\left[\mathcal{R}_{T}\right]\leq\sum_{i=1}^{k}\operatorname{\mathbb{E}}[r_{i}]\leq k\left(\eta T+\frac{\log|U|}{\eta}\right), and the result follows with our choice of η=εk32Tlog(k/δ)\eta=\frac{\varepsilon}{k\sqrt{32T\log(k/\delta)}}.

4 Bandit Setting

In the bandit case, the algorithm only receives as feedback the value ft(St)f_{t}(S_{t}). Given this restricted information, the algorithm must trade-off exploration of the function with exploiting current knowledge. As in (Streeter and Golovin,, 2009), our algorithm controls this tradeoff using a parameter γ[0,1]\gamma\in[0,1], and by randomly exploring in each time-step independently with probability γ\gamma.

The non-private approach of Streeter and Golovin, (2009) obtains 𝒪(T2/3)\mathcal{O}(T^{2/3}) expected (11/e)(1-1/e)-regret, and works as follows: In exploit rounds (prob. 1γ1-\gamma), play the experts’ sampled choice StS_{t} and feed back 0 to each i\mathcal{E}_{i}. In explore rounds (prob. γ\gamma), select i[k]i\in[k] and aUa\in U uniformly at random. Play set St=Sti1+aS_{t}=S_{t}^{i-1}+a, observe feedback ft(Sti1+a)f_{t}(S_{t}^{i-1}+a), give this value to i\mathcal{E}_{i}, and feedback 0 to the remaining experts.

As we show in Appendix B.1, directly privatizing this algorithm using the Hedge method from the full-information setting results in an expected (11/e)(1-1/e)-regret of 𝒪(T3/4)\mathcal{O}(T^{3/4}), which is far from the optimal 𝒪(T2/3)\mathcal{O}(T^{2/3}). The problem with this naive approach is that a new sample is obtained via the Hedge algorithms at every time-step, including exploit steps, so to ensure (ε,δ)(\varepsilon,\delta)-DP, a learning rate of η=εk32Tlog(k/δ)\eta=\frac{\varepsilon}{k\sqrt{32T\log(k/\delta)}} is required.

We improve upon this by calling the Hedge algorithm only after an exploration time-step has occurred, and new information is available. The learner continues playing this same set until the next exploration round, and privacy of these exploitation rounds follows from post-processing. This dramatically reduces the number of rounds that access the dataset, and reduces the overall amount of noise required for privacy.

If the exact number of exploration rounds were known, this could be plugged into the learning rate η\eta to achieve (ε,δ)(\varepsilon,\delta)-DP. In the non-private setting, a doubling trick (see, e.g., Shalev-Shwartz, (2012)) can be employed to find the right learning rate by calling the algorithm multiple times, doubling TT and thus rescaling η\eta on each iteration. Unfortunately, this doubling trick does not work in the private setting due to the direct non-linear connection between ε\varepsilon the privacy parameter, TT the time horizon and η\eta the learning rate, as specified in Proposition 2. Instead we use concentration inequalities (Alon and Spencer,, 2004) to ensure that there are no more than 2γT2\gamma T exploration rounds, except with probability e8T1/3e^{-8{T^{1/3}}}. With this, we can select a fixed learning rate η=εk32(2γT)log(k/δ)\eta=\frac{\varepsilon}{k\sqrt{32(2\gamma T)\log(k/\delta)}} and guarantee optimal 𝒪(T2/3)\mathcal{O}(T^{2/3}) expected (11/e)(1-1/e)-regret, and the cost of (ε,δ+e8T1/3)(\varepsilon,\delta+e^{-8{T^{1/3}}})-DP.

We remark in Appendix B.2 that this additional loss in the δ\delta term can be avoided by pre-sampling the exploration round, but this requires Θ(T2/3+k|U|)\Theta(T^{2/3}+k|U|) space, which may be unacceptable for large TT.

Algorithm 3 presents the space-efficient approach. Here f^ti\widehat{f}_{t}^{i} is the vector with aa-th coordinate given by: f^ti,a=ft(Sti1+a)𝟏{Explore at time t, pick i, pick a}\widehat{f}_{t}^{i,a}=f_{t}(S_{t}^{i-1}+a)\mathbf{1}_{\{\text{Explore at time $t$, pick }i,\text{ pick }a\}}.

Initialize: Set γ=k((16|U|log|U|)2T)1/3\gamma=k\left(\frac{(16|U|\log|U|)^{2}}{T}\right)^{1/3} and η=εk32(2γT)log(k/δ)\eta=\frac{\varepsilon}{k\sqrt{32(2\gamma T)\log(k/\delta)}}.
Instantiate kk parallel copies 1,,k\mathcal{E}_{1},\ldots,\mathcal{E}_{k} of Hedge algorithm with rate η\eta. Utilize each i\mathcal{E}_{i} to sample a1ia_{1}^{i} and set S1={a11,,a1k}S_{1}=\{a_{1}^{1},\ldots,a_{1}^{k}\}.
for t=1,,Tt=1,\ldots,T do
       Sample btBernoulli(γ)b_{t}\sim\mathrm{Bernoulli}(\gamma).
       if bt=1b_{t}=1 then
             Sample i[k]i\in[k] u.a.r. and aUa\in U u.a.r.
             Play Sti1{a}S_{t}^{i-1}\cup\{a\}.
             Obtain value ft(St)f_{t}(S_{t}).
             Feed back the function f^ti\widehat{f}_{t}^{i} to expert i\mathcal{E}_{i}, i\forall i.
             Utilize i\mathcal{E}_{i} to pick at+1ia_{t+1}^{i} i\forall i.
             Update set St+1=i=1k{at+1i}S_{t+1}=\cup_{i=1}^{k}\{a_{t+1}^{i}\}.
      else
             Play StS_{t}.
             Obtain ft(St)f_{t}(S_{t}).
             Update St+1=StS_{t+1}=S_{t}.
      
Algorithm 3 BanditDP(F,ε,δ)(F,\varepsilon,\delta)
Theorem 7.

Algorithm 3 is (ε,δ+e8T1/3)(\varepsilon,\delta+e^{-8{T^{1/3}}})-DP.

Theorem 8.

Algorithm 3 has (11/e)(1-1/e)-regret

𝔼[T]𝒪(logk/δε(k(|U|log|U|)1/3)2T2/3).\operatorname{\mathbb{E}}\left[\operatorname{\mathcal{R}}_{T}\right]\leq\mathcal{O}\left(\frac{\sqrt{\log k/\delta}}{\varepsilon}(k(|U|\log|U|)^{1/3})^{2}T^{2/3}\right).

Proof of Theorem 7

Observe that the algorithm only releases new information right an exploration time-step. If t1,,tMt_{1},\ldots,t_{M} are the exploration time-steps, with MM distributed as the sum of TT independent Bernoulli random variables with parameter γ\gamma, then conditioned on the event M<2γTM<2\gamma T, we know that the outputs S1,St1+1,,StM+1S_{1},S_{t_{1}+1},\ldots,S_{t_{M}+1} are (ε,δ)(\varepsilon,\delta)-DP by Theorem 5. Now, conditioning again on the event M<2γTM<2\gamma T, the entire output (S1,,ST)(S_{1},\ldots,S_{T}) is (ε,δ)(\varepsilon,\delta)-DP since this corresponds to post-processing over the previous output by extending the sets to exploitation time-steps. We know that M2γTM\geq 2\gamma T occurs w.p. e8γ2T\leq e^{-8\gamma^{2}T}. Thus, for any SS we have

Pr((k,,1)(F)S)\displaystyle\operatorname{\Pr}((\mathcal{E}_{k},\ldots,\mathcal{E}_{1})(F)\in S)
Pr((k,,1)(F)SM<2γT)Pr(M<2γT)+e8γ2T\displaystyle\leq\operatorname{\Pr}((\mathcal{E}_{k},\ldots,\mathcal{E}_{1})(F)\in S\mid M<2\gamma T)\operatorname{\Pr}(M<2\gamma T)+e^{-8\gamma^{2}T}
eεPr((k,,1)(F)SM<2γT)Pr(M<2γT)+δ+e8γ2T\displaystyle\leq e^{\varepsilon}\operatorname{\Pr}((\mathcal{E}_{k},\ldots,\mathcal{E}_{1})(F^{\prime})\in S\mid M<2\gamma T)\operatorname{\Pr}(M<2\gamma T)+\delta+e^{-8\gamma^{2}T}
eεPr((k,,1)(F)S)+δ+e8γ2T.\displaystyle\leq e^{\varepsilon}\operatorname{\Pr}((\mathcal{E}_{k},\ldots,\mathcal{E}_{1})(F^{\prime})\in S)+\delta+e^{-8\gamma^{2}T}.

The result now follows by plugging in the value of γ\gamma used in Algorithm 3.

Proof of Theorem 8

Theorem 8 requires the following two lemmas, proved respectively in Appendices A.3 and A.4. The first lemma says that the (11/e)(1-1/e)-regret experienced by the learner is bounded by the regret experienced by the expert and an additional error introduced during the exploration times. The second lemma bounds the regret experienced by the experts under the biased estimator.

Lemma 2.

If rir_{i} denotes the regret experience by expert i\mathcal{E}_{i} in Algorithm 3, then

(11e)max|S|kt=1Tft(S)𝔼[t=1Tft(St)]i=1k𝔼[ri]+γT.\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t})\right]\leq\sum_{i=1}^{k}\operatorname{\mathbb{E}}[r_{i}]+\gamma T.
Lemma 3.

If each i\mathcal{E}_{i} is a Hedge algorithm with learning rate η=εk32(2γT)log(k/δ)\eta=\frac{\varepsilon}{k\sqrt{32(2\gamma T)\log(k/\delta)}}, then

𝔼[ri]16k2|U|log|U|Tlog(k/δ)εγ+k|U|γTe8γ2T.\operatorname{\mathbb{E}}[r_{i}]\leq 16\frac{k^{2}|U|\log|U|\sqrt{T\log(k/\delta)}}{\varepsilon\sqrt{\gamma}}+\frac{k|U|}{\gamma}T\cdot e^{-8\gamma^{2}T}.

Using these two results with γ=k((16|U|log|U|)2T)1/3\gamma=k\left(\frac{(16|U|\log|U|)^{2}}{T}\right)^{1/3}:

𝔼[T]\displaystyle\operatorname{\mathbb{E}}\left[\mathcal{R}_{T}\right] k(16k2|U|log|U|Tlog(k/δ)εγ)+k|U|γTe8γ2T+γT\displaystyle\leq k\left(16\frac{k^{2}|U|\log|U|\sqrt{T\log(k/\delta)}}{\varepsilon\sqrt{\gamma}}\right)+\frac{k|U|}{\gamma}T\cdot e^{-8\gamma^{2}T}+\gamma T
=(16k3|U|log|U|logk/δεTγ+γT)+k|U|γTe8γ2T\displaystyle=\left(16\frac{k^{3}|U|\log|U|\sqrt{\log k/\delta}}{\varepsilon}\sqrt{\frac{T}{\gamma}}+\gamma T\right)+\frac{k|U|}{\gamma}T\cdot e^{-8\gamma^{2}T}
32logk/δε(k(|U|log|U|)1/3)2T2/3+|U|1/3T4/3(16log|U|)2/3e8k2(16|U|log|U|)4/3T1/3.\displaystyle\leq 32\frac{\sqrt{\log k/\delta}}{\varepsilon}(k(|U|\log|U|)^{1/3})^{2}T^{2/3}+\frac{|U|^{1/3}T^{4/3}}{(16\log|U|)^{2/3}}e^{-8k^{2}(16|U|\log|U|)^{4/3}T^{1/3}}.

5 Extension to Continuous Functions

We sketch an extension of our methodology for (continuous) DR-submodular functions (Hassani et al.,, 2017; Niazadeh et al.,, 2018). Further details can be found in Appendix C.

Let 𝒳=i=1n𝒳i\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}, where each 𝒳i\mathcal{X}_{i} is a closed convex set in \mathbb{R}. A function f:𝒳+f:\mathcal{X}\to\mathbb{R}_{+} is called DR-submodular if ff is differentiable and f(𝐱)f(𝐲)\nabla f(\mathbf{x})\geq\nabla f(\mathbf{y}) for all 𝐱𝐲\mathbf{x}\leq\mathbf{y}. DR-submodular functions do not fit completely in the context of convex functions. For instance, the multilinear extension of a submodular function (Calinescu et al.,, 2011) is DR-submodular. The function ff is said to be β\beta-smooth if f(𝐱)f(𝐲)2β𝐱𝐲2\|\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\|_{2}\leq\beta\|\mathbf{x}-\mathbf{y}\|_{2}. In the online learning DR-submodular maximization problem, at each time-step t=1,,Tt=1,\ldots,T, a β\beta-smooth DR-submodular function ft:𝒳[0,1]f_{t}:\mathcal{X}\to[0,1] arrives and, without observing the function, the learner selects a point 𝐱t𝒳\mathbf{x}_{t}\in\mathcal{X} learned using f1,,ft1f_{1},\ldots,f_{t-1}. She gets the value ft(𝐱t)f_{t}(\mathbf{x}_{t}) and also oracle access to ft\nabla f_{t}. The learner’s goal is to minimize the (11/e)(1-1/e)-regret T=(11e)max𝐱𝒫t=1Tft(𝐱)t=1tft(𝐱t)\mathcal{R}_{T}=\left(1-\frac{1}{e}\right)\max_{\mathbf{x}\in\mathcal{P}}\sum_{t=1}^{T}f_{t}(\mathbf{x})-\sum_{t=1}^{t}f_{t}(\mathbf{x}_{t}).

Online DR-submodular problems have been extensively studied in the full information setting—see for instance (Chen et al., 2018b, ; Chen et al., 2018a, ; Niazadeh et al.,, 2018). Similarly to the discrete submodular case, most of these methods implement KK ordered algorithms 0,,K1\mathcal{E}_{0},\ldots,\mathcal{E}_{K-1} for optimizing linear functions over 𝒳\mathcal{X}. Algorithm k\mathcal{E}_{k} computes a direction of maximum increment from a point given by the algorithms k1,,0\mathcal{E}_{k-1},\ldots,\mathcal{E}_{0}. The learner averages these directions to obtain a new point to play in the region 𝒳\mathcal{X}. This is the continuous version of the Hedge approach.

We show in Algorithm 4 and Theorem 9 that a simple modification transforms the continuous method of Chen et al., 2018b into a differentially private one. For this, we utilize the Private Follow the Approximate Leader (PFTAL) framework of Thakurta and Smith, (2013) as a black-box. PFTAL is an online convex optimization algorithm for minimizing LL-Lipschitz convex functions over a compact convex region 𝒳\mathcal{X}. In few words, their algorithm guarantees (ε,0)(\varepsilon,0)-DP and achieves an expected regret 𝒪(L2nTlog2.5Tε)\mathcal{O}\left(\frac{L^{2}\sqrt{nT\log^{2.5}T}}{\varepsilon}\right).

Let K=Tlog2.5TK=\sqrt{\frac{\sqrt{T}}{\log^{2.5}T}}. Initialize 0,,K1\mathcal{E}_{0},\ldots,\mathcal{E}_{K-1} parallel copies of PFTALs with privacy parameter ε=ε/K\varepsilon^{\prime}=\varepsilon/K.
for t=1,,Tt=1,\ldots,T do
       for k=0,,K1k=0,\ldots,K-1 do
             Let 𝐯tk\mathbf{v}_{t}^{k} be vector found using k\mathcal{E}_{k}.
      Let 𝐱t=1Kk=0K1𝐯tk\mathbf{x}_{t}=\frac{1}{K}\sum_{k=0}^{K-1}\mathbf{v}_{t}^{k}.
       Play 𝐱t\mathbf{x}_{t}, receive ft(𝐱t)f_{t}(\mathbf{x}_{t}) and access to ft\nabla f_{t}.
       Feed back each k\mathcal{E}_{k} with the linear obsective k(𝐯)=ft(𝐱tk)𝐯\ell_{k}(\mathbf{v})=\nabla f_{t}(\mathbf{x}_{t}^{k})^{\top}\mathbf{v} where 𝐱tk=1Ki=0k1𝐯ti\mathbf{x}_{t}^{k}=\frac{1}{K}\sum_{i=0}^{k-1}\mathbf{v}_{t}^{i}.
Algorithm 4 (F={ft}t=1T,ε)(F=\{f_{t}\}_{t=1}^{T},\varepsilon)
Theorem 9 (Informal).

Algorithm 4 is (ε,0)(\varepsilon,0)-DP with expected (11/e)(1-1/e)-regret

𝒪(T3/4log2.5Tε).\mathcal{O}\left(\frac{T^{3/4}\sqrt{\log^{2.5}T}}{\varepsilon}\right).

The big 𝒪\mathcal{O} term hides dimension, bounds in gradient and diameter of 𝒳\mathcal{X} and only shows terms in TT and privacy parameter ε\varepsilon. The proof appears in Appendix C.

References

  • Ahmed et al., (2012) Ahmed, A., Teo, C. H., Vishwanathan, S., and Smola, A. (2012). Fair and balanced: Learning to present news stories. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining, WSDM ’12, pages 333–342.
  • Alon and Spencer, (2004) Alon, N. and Spencer, J. H. (2004). The Probabilistic Method. John Wiley & Sons.
  • Bach, (2013) Bach, F. (2013). Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, 6(2-3):145–373.
  • Badanidiyuru et al., (2014) Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., and Krause, A. (2014). Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 671–680.
  • Calinescu et al., (2011) Calinescu, G., Chekuri, C., Pál, M., and Vondrák, J. (2011). Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766.
  • Cardoso and Cummings, (2019) Cardoso, A. R. and Cummings, R. (2019). Differentially private online submodular minimization. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, AISTATS ’19, pages 1650–1658.
  • Cesa-Bianchi and Lugosi, (2006) Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press.
  • Chapelle and Li, (2011) Chapelle, O. and Li, L. (2011). An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems, NIPS ’11, pages 2249–2257.
  • Chatterjee et al., (2003) Chatterjee, P., Hoffman, D. L., and Novak, T. P. (2003). Modeling the clickstream: Implications for web-based advertising efforts. Marketing Science, 22(4):520–541.
  • (10) Chen, L., Harshaw, C., Hassani, H., and Karbasi, A. (2018a). Projection-free online optimization with stochastic gradient: From convexity to submodularity. arXiv preprint 1802.08183.
  • (11) Chen, L., Hassani, H., and Karbasi, A. (2018b). Online continuous submodular maximization. arXiv preprint 1802.06052.
  • Dwork et al., (2006) Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography, TCC ’06, pages 265–284.
  • (13) Dwork, C., Naor, M., Pitassi, T., and Rothblum, G. N. (2010a). Differential privacy under continual observation. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC ’10, pages 715–724.
  • Dwork and Roth, (2014) Dwork, C. and Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4):211–407.
  • (15) Dwork, C., Rothblum, G. N., and Vadhan, S. (2010b). Boosting and differential privacy. In Proceedings of the 51st IEEE Annual Symposium on Foundations of Computer Science, FOCS ’10, pages 51–60.
  • Feige, (1998) Feige, U. (1998). A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634–652.
  • Fisher et al., (1978) Fisher, M. L., Nemhauser, G. L., and Wolsey, L. A. (1978). An analysis of approximations for maximizing submodular set functions—II. In Polyhedral Combinatorics, pages 73–87. Springer.
  • Freund and Schapire, (1997) Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139.
  • Gupta et al., (2010) Gupta, A., Ligett, K., McSherry, F., Roth, A., and Talwar, K. (2010). Differentially private combinatorial optimization. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’10, pages 1106–1125.
  • Hassani et al., (2017) Hassani, H., Soltanolkotabi, M., and Karbasi, A. (2017). Gradient methods for submodular maximization. In Advances in Neural Information Processing Systems, NIPS ’17, pages 5841–5851.
  • Hazan, (2016) Hazan, E. (2016). Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3–4):157–325.
  • Jain et al., (2012) Jain, P., Kothari, P., and Thakurta, A. (2012). Differentially private online learning. In Proceedings of the 25th Conference on Learning Theory, COLT ’12, pages 24.1–24.34.
  • Krause and Golovin, (2014) Krause, A. and Golovin, D. (2014). Submodular function maximization, pages 71–104. Cambridge University Press.
  • McSherry and Talwar, (2007) McSherry, F. and Talwar, K. (2007). Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’07, pages 94–103.
  • Mirrokni et al., (2008) Mirrokni, V., Schapira, M., and Vondrák, J. (2008). Tight information-theoretic lower bounds for welfare maximization in combinatorial auctions. In Proceedings of the 9th ACM Conference on Electronic Commerce, EC ’08, pages 70–77.
  • Mitrovic et al., (2017) Mitrovic, M., Bun, M., Krause, A., and Karbasi, A. (2017). Differentially private submodular maximization: Data summarization in disguise. In Proceedings of the 34th International Conference on Machine Learning, ICML ’17, pages 2478–2487.
  • Niazadeh et al., (2018) Niazadeh, R., Roughgarden, T., and Wang, J. (2018). Optimal algorithms for continuous non-monotone submodular and DR-submodular maximization. In Advances in Neural Information Processing Systems, NIPS ’18, pages 9594–9604.
  • Rafiey and Yoshida, (2020) Rafiey, A. and Yoshida, Y. (2020). Fast and private submodular and kk-submodular functions maximization with matroid constraints. arXiv preprint 2006.15744.
  • Schrijver, (2003) Schrijver, A. (2003). Combinatorial optimization: Polyhedra and efficiency, volume 24. Springer Science & Business Media.
  • Shalev-Shwartz, (2012) Shalev-Shwartz, S. (2012). Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194.
  • Streeter and Golovin, (2009) Streeter, M. and Golovin, D. (2009). An online algorithm for maximizing submodular functions. In Advances in Neural Information Processing Systems, NIPS ’09, pages 1577–1584.
  • Streeter et al., (2009) Streeter, M., Golovin, D., and Krause, A. (2009). Online learning of assignments. In Advances in Neural Information Processing Systems, NIPS ’09, pages 1794–1802.
  • Tang et al., (2014) Tang, L., Jiang, Y., Li, L., and Li, T. (2014). Ensemble contextual bandits for personalized recommendation. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14, pages 73–80.
  • Thakurta and Smith, (2013) Thakurta, A. G. and Smith, A. (2013). (Nearly) optimal algorithms for private online learning in full-information and bandit settings. In Advances in Neural Information Processing Systems, NIPS ’13, pages 2733–2741.
  • Williamson and Shmoys, (2011) Williamson, D. P. and Shmoys, D. B. (2011). The design of approximation algorithms. Cambridge University Press.
  • Zhang et al., (2014) Zhang, B., Wang, N., and Jin, H. (2014). Privacy concerns in online recommender systems: Influences of control and user data input. In Proceedings of the 10th Symposium on Usable Privacy and Security, SOUPS ’14, pages 159–173.
  • Zhang et al., (2020) Zhang, M., Shen, Z., Mokhtari, A., Hassani, H., and Karbasi, A. (2020). One sample stochastic Frank-Wolfe. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, AISTATS ’20, pages 4012–4023.
  • Zinkevich, (2003) Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, ICML ’03, pages 928–936.

Appendix A Appendix

A.1 Proof of Lemma 1

See 1

Proof.

We prove the lemma by induction on ii. The base case of i=1i=1 follows from Proposition 2. For the inductive step, assume the result is true for some i1i\geq 1, and we now prove that it also holds for i+1i+1. That is, we aim to show that (i+1,,1):TUT××UT(\mathcal{E}_{i+1},\ldots,\mathcal{E}_{1}):\mathcal{F}^{T}\to U^{T}\times\cdots\times U^{T} is ((i+1)ε,(i+1)δ)((i+1)\varepsilon^{\prime},(i+1)\delta^{\prime})-private, where ε=ε/k\varepsilon^{\prime}=\varepsilon/k and δ=δ/k\delta^{\prime}=\delta/k. Let \wedge denote a maximum and recall that Si\mathcal{M}^{S^{i}} is the behavior of the ii-th expert across all TT rounds.

Consider the neighboring databases FF and FF^{\prime}. Pick any set SUS\subseteq U and a fixed Si=(ai,,a1)(UT)iS^{i}=(a^{i},\ldots,a^{1})\in(U^{T})^{i}, then

Pr(i+1(F)S(i,,1)(F)=Si)\displaystyle\operatorname{\Pr}(\mathcal{E}_{i+1}(F)\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})
=Pr(Si(F)S)\displaystyle=\operatorname{\Pr}(\mathcal{M}^{S^{i}}(F)\in S)
(eεPr(Si(F)S))1+δ\displaystyle\leq(e^{\varepsilon^{\prime}}\operatorname{\Pr}(\mathcal{M}^{S^{i}}(F^{\prime})\in S))\wedge 1+\delta^{\prime} ((ε,δ)(\varepsilon^{\prime},\delta^{\prime})-DP of Si\mathcal{M}^{S^{i}})
=(eεPr(i+1(F)S(i,,1)(F)=Si))1+δ.\displaystyle=(e^{\varepsilon^{\prime}}\operatorname{\Pr}(\mathcal{E}_{i+1}(F^{\prime})\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i}))\wedge 1+\delta^{\prime}.

This is true as long as (i,,1)(F)=Si(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i} and (i,,1)(F)=Si(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i} are non-zero probability events, which is ensured to be true since the Hedge algorithm places positive probability on all events.

We can write

Pr((i,,1)(F)=Si)=eiεPr((i,,1)(F)=Si)+μ(Si),\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})=e^{i\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i})+\mu(S^{i}),

where μ(Si)=Pr((i,,1)(F)=Si)eiεPr((i,,1)(F)=Si)\mu(S^{i})=\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})-e^{i\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i}). We have μ(𝒮)iδ\mu(\mathcal{S})\leq i\delta^{\prime} for any 𝒮(UT)i\mathcal{S}\subseteq(U^{T})^{i} since (i,,1)(\mathcal{E}_{i},\ldots,\mathcal{E}_{1}) is (iε,iδ)(i\varepsilon^{\prime},i\delta^{\prime})-DP by the inductive hypothesis.

Now, consider any set 𝒮(UT)i+1\mathcal{S}\subseteq(U^{T})^{i+1}. Then,

Pr((i+1,i,,1)(F)S)\displaystyle\operatorname{\Pr}((\mathcal{E}_{i+1},\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)\in S)
=Si𝒮Pr((i+1,Si)(F)S(i,,1)(F)=Si)Pr((i,,1)(F)=Si)\displaystyle=\sum_{S^{i}\in\mathcal{S}^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},S^{i})(F)\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})
Si𝒮((eεPr((i+1,Si)(F)S1(F)=a1))1+δ)Pr((i,,1)(F)=Si)\displaystyle\leq\sum_{S^{i}\in\mathcal{S}^{\prime}}\left((e^{\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},S^{i})(F^{\prime})\in S\mid\mathcal{E}_{1}(F^{\prime})=a^{1}))\wedge 1+\delta^{\prime}\right)\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})
Si𝒮((eεPr((i+1,Si)(F)S(i,,1)(F)=Si))1)(eiεPr((i,,1)(F)=Si)+μ(Si))\displaystyle\leq\sum_{S^{i}\in\mathcal{S}^{\prime}}\left((e^{\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},S^{i})(F^{\prime})\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i}))\wedge 1\right)\left(e^{i\varepsilon^{\prime}}\operatorname{\Pr}({(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i}})+\mu(S^{i})\right)
+Si𝒮δPr((i,,1)(F)=Si)\displaystyle\qquad+\sum_{S^{i}\in\mathcal{S}^{\prime}}\delta^{\prime}\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})
e(i+1)εSi𝒮Pr((i+1,Si)(F)S(i,,1)(F)=Si)Pr((i,,1)(F)=Si)+μ(𝒮+)+δ\displaystyle\leq e^{(i+1)\varepsilon^{\prime}}\sum_{S^{i}\in\mathcal{S}^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},S^{i})(F^{\prime})\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i})\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i})+\mu(\mathcal{S}_{+}^{\prime})+\delta^{\prime}
e(i+1)εPr((i+1,i,,1)(F)S)+(i+1)δ\displaystyle\leq e^{(i+1)\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})\in S)+(i+1)\delta^{\prime}

where 𝒮={Si(UT)i:(ai+1,Si)𝒮 for some aiUT}\mathcal{S}^{\prime}=\{S^{i}\in(U^{T})^{i}:(a^{i+1},S^{i})\in\mathcal{S}\text{ for some }a^{i}\in U^{T}\} and 𝒮+\mathcal{S}_{+}^{\prime} are the elements Si𝒮S^{i}\in\mathcal{S}^{\prime} such that μ(𝒮)0\mu(\mathcal{S}^{\prime})\geq 0. This concludes the proof.

\Box

A.2 Proof of Proposition 3

See 3

Proof.

Fix the choices S1,,STS_{1},\ldots,S_{T} of the experts arbitrarily, and let rir_{i} the overall regret experience by i\mathcal{E}_{i}. That is,

ri=maxaU\displaystyle r_{i}=\max_{a\in U} t=1Tft(Sti1+a)ft(Sti1)t=1Tft(Sti1+ati)ft(Sti1).\displaystyle\sum_{t=1}^{T}f_{t}(S_{t}^{i-1}+a)-f_{t}(S_{t}^{i-1})-\sum_{t=1}^{T}f_{t}(S_{t}^{i-1}+a_{t}^{i})-f_{t}(S_{t}^{i-1}).

Define the new function F:2[T]×UF:2^{[T]\times U}\to\mathbb{R} as

F(A)=1Tt=1Tft(At),F(A)=\frac{1}{T}\sum_{t=1}^{T}f_{t}(A_{t}),

where At={xU:(t,x)A}A_{t}=\{x\in U:(t,x)\in A\}. Clearly, FF is submodular, nondecreasing and F()=0F(\emptyset)=0. Then,

riT=maxaUF(S~i1+a~)F(S~i1)(F(S~i)F(S~i1)),\frac{r_{i}}{T}=\max_{a\in U}F(\widetilde{S}^{i-1}+\widetilde{a})-F(\widetilde{S}^{i-1})-(F(\widetilde{S}^{i})-F(\widetilde{S}^{i-1})),

where S~i=t=1T{t}×Si\widetilde{S}^{i}=\bigcup_{t=1}^{T}\{t\}\times S^{i}.

Let OPTUOPT\subseteq U be the optimal solution of max|S|kt=1Tft(S)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S) and consider its extension to [T]×U[T]\times U, i.e., OPT~=t=1T{t}×OPT\widetilde{OPT}=\bigcup_{t=1}^{T}\{t\}\times OPT.

Claim A.1.

For any i=1,,ki=1,\ldots,k, maxaUF(S~i1+a~)F(S~i1)F(OPT~)F(Si1)k\max_{a\in U}F(\widetilde{S}^{i-1}+\widetilde{a})-F(\widetilde{S}^{i-1})\geq\frac{F(\widetilde{OPT})-F(S^{i-1})}{k}.

Proof of Claim A.1.
F(OPT~)F(S~i1)\displaystyle F(\widetilde{OPT})-F(\widetilde{S}^{i-1})
F(S~i1+OPT~)F(S~i1)\displaystyle\leq F(\widetilde{S}^{i-1}+\widetilde{OPT})-F(\widetilde{S}^{i-1})
a~OPT~S~i1F(S~i1+a~)F(S~i1)\displaystyle\leq\sum_{\widetilde{a}\in\widetilde{OPT}\setminus\widetilde{S}^{i-1}}F(\widetilde{S}^{i-1}+\widetilde{a})-F(\widetilde{S}^{i-1})
k(maxaUF(S~i1+a~)F(S~i1)).\displaystyle\leq k\cdot\left(\max_{a\in U}F(\widetilde{S}^{i-1}+\widetilde{a})-F(\widetilde{S}^{i-1})\right).

\Box

Using this claim, we can see,

F(S~i)F(S~i1)F(OPT~)F(S~i1)kriT.F(\widetilde{S}^{i})-F(\widetilde{S}^{i-1})\geq\frac{F(\widetilde{OPT})-F(\widetilde{S}^{i-1})}{k}-\frac{r_{i}}{T}.

Unrolling the recursion, we obtain

t=1Tft(St)(11e)t=1Tft(OPT)i=1kri.\sum_{t=1}^{T}f_{t}(S_{t})\geq\left(1-\frac{1}{e}\right)\sum_{t=1}^{T}f_{t}(OPT)-\sum_{i=1}^{k}r_{i}.

\Box

A.3 Proof of Lemma 2

See 2

Proof.

Observe that at exploration time-steps τ\tau, i.e, when bτ=1b_{\tau}=1, Algorithm 3 plays a set of the form Sτ=Sτi1+aS_{\tau}=S_{\tau}^{i-1}+a. Right after this, the algorithm samples a new set Sτ+1S_{\tau+1} given by the Hedge algorithms and will keep playing this set until the next exploration time step.

For the sake of analysis, we introduce the following set. Let t0=0,t1,,tMt_{0}=0,t_{1},\ldots,t_{M} be the times when a new sample set for exploitation is obtained. Note that besides time t0t_{0}, all times t1,,tMt_{1},\ldots,t_{M} are exploration times. Now, let St=StiS_{t}^{\prime}=S_{t_{i}} for t=ti+1,,ti+1t=t_{i}+1,\ldots,t_{i+1}. Note that for times bt=0b_{t}=0, then St=StS_{t}^{\prime}=S_{t}; however, for times bt=1b_{t}=1, then StS_{t}^{\prime} is not necessarily the same as St=Sti1+aS_{t}=S_{t}^{i-1}+a. In other words, StS_{t}^{\prime} corresponds to the real full exploitation scheme. Now, as in the full information setting, we have

(11e)max|S|kt=1Tft(S)t=1Tft(St)i=1kri,\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\sum_{t=1}^{T}f_{t}(S_{t}^{\prime})\leq\sum_{i=1}^{k}r_{i},

where ri=maxaUt=1Tfti,at=1Tfti,atir_{i}=\max_{a\in U}\sum_{t=1}^{T}f_{t}^{i,a}-\sum_{t=1}^{T}f_{t}^{i,a_{t}^{i}}. Thus

(11e)max|S|kt=1Tft(S)𝔼[t=1Tft(St)]\displaystyle\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t})\right]
i=1k𝔼[ri]+𝔼[t=1Tft(St)ft(St)]\displaystyle\leq\sum_{i=1}^{k}\operatorname{\mathbb{E}}[r_{i}]+\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t}^{\prime})-f_{t}(S_{t})\right]
i=1k𝔼[ri]+γT,\displaystyle\leq\sum_{i=1}^{k}\operatorname{\mathbb{E}}[r_{i}]+\gamma T,

since at the end, only the exploration times could contribute to the difference ft(St)ft(St)f_{t}(S_{t}^{\prime})-f_{t}(S_{t}) and those are γT\gamma T in expectation. \Box

A.4 Proof of Lemma 3

See 3

Proof.

From the perspective of expert i\mathcal{E}_{i}, at every time-step tt, she sees the vector f^ti\widehat{f}_{t}^{i} such that

f^ti,a=ft(Sti1+a)𝟏{Explore at time t, pick i, pick a}\widehat{f}_{t}^{i,a}=f_{t}(S_{t}^{i-1}+a)\mathbf{1}_{\{\text{Explore at time $t$, pick }i,\text{ pick }a\}}

in its aa-th coordinate. Notice that this vector is 0 if no exploration occurs at time tt. The expert i\mathcal{E}_{i} samples a new element in UU only after exploitation times. Observe that the feedback of i\mathcal{E}_{i} is independent of choices made by i\mathcal{E}_{i}. Indeed, this feedback depends only on the set Sti1S_{t}^{i-1} constructed by 1,,i1\mathcal{E}_{1},\ldots,\mathcal{E}_{i-1} and the decision of the learner to explore, which is independent of the learning task. Therefore, the sequence f^i=(f^1i,,f^Ti)\widehat{f}^{i}=(\widehat{f}_{1}^{i},\ldots,\widehat{f}_{T}^{i}) could be considered oblivious for i\mathcal{E}_{i} and we can apply the guarantee of Hedge over f^i\widehat{f}_{i}. That is, for any aUa\in U,

t=1Tf^ti,at=1T𝐱tf^tiηt=1T𝐱t(f^ti)2+log|U|η,\sum_{t=1}^{T}\widehat{f}_{t}^{i,a}-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}\widehat{f}_{t}^{i}\leq\eta\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}(\widehat{f}_{t}^{i})^{2}+\frac{\log|U|}{\eta},

where 𝐱tΔ(U)\mathbf{x}_{t}\in\Delta(U) is the non-zero distribution used by expert i\mathcal{E}_{i} in the Hedge algorithm and Δ(U)={𝐱U:𝐱1=1,𝐱0}\Delta(U)=\{\mathbf{x}\in\mathbb{R}^{U}:\|\mathbf{x}\|_{1}=1,\mathbf{x}\geq 0\} is the probability simplex over elements in UU. Notice that exploitation times appear in the summation with 0 contribution. This expression is not the same as the regret of i\mathcal{E}_{i} but we can relate these quantities as follows. Conditioned on S1i1,,STi1S_{1}^{i-1},\ldots,S_{T}^{i-1} we obtain,

𝔼[f^ti,aS1i1,,STi1]=γk|U|fti,a+δt,\displaystyle\operatorname{\mathbb{E}}[\widehat{f}_{t}^{i,a}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}]=\frac{\gamma}{k|U|}f_{t}^{i,a}+\delta_{t},

where fti,a=f(Sti1+a)f(Sti1)f_{t}^{i,a}=f(S_{t}^{i-1}+a)-f(S_{t}^{i-1}) and δti=γk|U|f(Sti1)\delta_{t}^{i}=\frac{\gamma}{k|U|}f(S_{t}^{i-1}). Notice that Sti1,,STi1S_{t}^{i-1},\ldots,S_{T}^{i-1} are independent of actions taken by i\mathcal{E}_{i}, so

𝔼[𝐱tf^tiS1i1,,STi1]=γk|U|𝔼[𝐱tftiSt1,,STi1]+δt\operatorname{\mathbb{E}}[\mathbf{x}_{t}^{\top}\widehat{f}_{t}^{i}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}]=\frac{\gamma}{k|U|}\operatorname{\mathbb{E}}[\mathbf{x}_{t}^{\top}f_{t}^{i}\mid S_{t}^{1},\ldots,S_{T}^{i-1}]+\delta_{t}

and

𝔼[𝐱t(f^ti)2S1i1,,STi1]\displaystyle\operatorname{\mathbb{E}}[\mathbf{x}_{t}^{\top}(\widehat{f}_{t}^{i})^{2}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}] =𝔼[aUxt(a)(f^ti,a)2S1i1,,STi1]\displaystyle=\operatorname{\mathbb{E}}\left[\sum_{a\in U}x_{t}(a)(\widehat{f}_{t}^{i,a})^{2}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}\right]
=aU𝔼[xt(a)S1i1,,STi1]γk|U|f(Sti1+a)2\displaystyle=\sum_{a\in U}\operatorname{\mathbb{E}}[x_{t}(a)\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}]\frac{\gamma}{k|U|}f(S_{t}^{i-1}+a)^{2}
γk|U|.\displaystyle\leq\frac{\gamma}{k|U|}.

Let MM be the number of times Algorithm 3 decides to explore. That is, MM is distributed as the sum of TT Bernoulli random variables with parameter γ\gamma. By concentration bounds,

Pr(M>2γT)e8γ2T.\operatorname{\Pr}(M>2\gamma T)\leq e^{-8\gamma^{2}T}.

Now, let t1,,tMt_{1},\ldots,t_{M} be the times the algorithm decides to explore and let t0=0t_{0}=0. For i=1,,Mi=1,\ldots,M, we can assume that expert i\mathcal{E}_{i} releases the same vector 𝐱tΔU\mathbf{x}_{t}\in\Delta_{U} during the time interval [ti1,ti)[t_{i-1},t_{i}) since she does not get any feedback during those times. If we consider η=εk32(2γT)log(k/δ)\eta=\frac{\varepsilon}{k\sqrt{32(2\gamma T)\log(k/\delta)}}, then for any aUa\in U we have

γk|U|𝔼[t=1Tfti,at=1T𝐱tfti]\displaystyle\frac{\gamma}{k|U|}\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}^{i,a}-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}f_{t}^{i}\right] =𝔼[t=1Tf^ti,at=1T𝐱tf^ti]\displaystyle=\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}\widehat{f}_{t}^{i,a}-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}\widehat{f}_{t}^{i}\right]
(ηt=1T𝔼[𝐱t(f^ti)2]+log|U|η)+Te8γ2T\displaystyle\leq\left(\eta\sum_{t=1}^{T}\operatorname{\mathbb{E}}\left[\mathbf{x}_{t}^{\top}(\widehat{f}_{t}^{i})^{2}\right]+\frac{\log|U|}{\eta}\right)+T\cdot e^{-8\gamma^{2}T}
(ηγk|U|T+log|U|η)+Te8γ2T\displaystyle\leq\left(\eta\frac{\gamma}{k|U|}T+\frac{\log|U|}{\eta}\right)+T\cdot e^{-8\gamma^{2}T}

Therefore,

𝔼[ri]=maxaUt=1Tfti,a𝔼[t=1T𝐱tfti]16k2|U|log|U|Tlog(k/δ)εγ+k|U|γTe8γ2T.\operatorname{\mathbb{E}}[r_{i}]=\max_{a\in U}\sum_{t=1}^{T}f_{t}^{i,a}-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}f_{t}^{i}\right]\leq 16\frac{k^{2}|U|\log|U|\sqrt{T\log(k/\delta)}}{\varepsilon\sqrt{\gamma}}+\frac{k|U|}{\gamma}T\cdot e^{-8\gamma^{2}T}.

\Box

Appendix B Additional Results in Bandit Setting

B.1 𝒪(T3/4)\mathcal{O}(T^{3/4}) Regret Bound of Direct Approach in Bandit Setting

In the bandit setting, the direct approach for differential privacy corresponds to sampling a new set from the Hedge algorithms at each time step. As in the full-information setting, to ensure (ε,δ)(\varepsilon,\delta)-DP, a learning rate of η=εk32Tlog(k/δ)\eta=\frac{\varepsilon}{k\sqrt{32T\log(k/\delta)}} is enough.

Similar to Lemma 3, in this setting we have

(11e)max|S|kt=1Tft(S)𝔼[t=1Tft(St)]t=1k𝔼[ri]+γT.\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t})\right]\leq\sum_{t=1}^{k}\operatorname{\mathbb{E}}[r_{i}]+\gamma T.

Since,

𝔼[ri]\displaystyle\operatorname{\mathbb{E}}[r_{i}] k|U|γ(ηγk|U|T+log|U|η)\displaystyle\leq\frac{k|U|}{\gamma}\left(\eta\frac{\gamma}{k|U|}T+\frac{\log|U|}{\eta}\right)
=k3|U|32Tlog(kδ)εγ+εkT32log(k/δ),\displaystyle=\frac{k^{3}|U|\sqrt{32T\log(k\delta)}}{\varepsilon\gamma}+\frac{\varepsilon k\sqrt{T}}{\sqrt{32\log(k/\delta)}},

then we have,

(11e)max|S|kt=1Tft(S)𝔼[t=1Tft(St)]\displaystyle\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t})\right] k4|U|32Tlog(kδ)εγ+εk2T32log(k/δ)+γT.\displaystyle\leq\frac{k^{4}|U|\sqrt{32T\log(k\delta)}}{\varepsilon\gamma}+\frac{\varepsilon k^{2}\sqrt{T}}{\sqrt{32\log(k/\delta)}}+\gamma T.

This last bound is minimized when γ=Θ(T1/4)\gamma=\Theta(T^{-1/4}) which gives a (11/e)(1-1/e)-regret bound of 𝒪(T3/4)\mathcal{O}(T^{3/4}).

B.2 Trading Off Privacy δ\delta-Term and Space

In this subsection, we show how to trade-off the δ\delta-term e8T1/3e^{-8T^{1/3}} by allowing additional space. For each tTt\in T, select tt as an explore round independently with probability γ\gamma. Let MM be the number of time-steps selected. Note that 𝔼[M]=γT\operatorname{\mathbb{E}}[M]=\gamma T. Now, run Algorithm 3 with η=εk32(M+1)log(k/δ)\eta=\frac{\varepsilon}{k\sqrt{32(M+1)\log(k/\delta)}} and force the algorithm to explore at the MM sampled time-steps and utilize the rest of the time-steps to exploit.

In this case, and following the proof of Lemma 3 we obtain:

𝔼[ri]\displaystyle\operatorname{\mathbb{E}}[r_{i}] k|U|γ𝔼[ηM+log|U|η]\displaystyle\leq\frac{k|U|}{\gamma}\operatorname{\mathbb{E}}\left[\eta M+\frac{\log|U|}{\eta}\right]
k|U|γ𝔼[6klog|U|log(k/δ)εM+1]\displaystyle\leq\frac{k|U|}{\gamma}\operatorname{\mathbb{E}}\left[6\frac{k\log|U|\sqrt{\log(k/\delta)}}{\varepsilon}\sqrt{M+1}\right]
k|U|γ(6klog|U|log(k/δ)ε𝔼[M]+1)\displaystyle\leq\frac{k|U|}{\gamma}\left(6\frac{k\log|U|\sqrt{\log(k/\delta)}}{\varepsilon}\sqrt{\operatorname{\mathbb{E}}[M]+1}\right) (Jensen’s inequality)
=8k2|U|log|U|log(k/δ)εTγ.\displaystyle=8\frac{k^{2}|U|\log|U|\log(k/\delta)}{\varepsilon}\sqrt{\frac{T}{\gamma}}.

Using Lemma 2 we obtain the (11/e)(1-1/e)-regret bound of

8k3|U|log|U|log(k/δ)εTγ+γT.\displaystyle 8\frac{k^{3}|U|\log|U|\log(k/\delta)}{\varepsilon}\sqrt{\frac{T}{\gamma}}+\gamma T.

This is minimized at γ=Θ(1/T1/3)\gamma=\Theta(1/T^{1/3}) with a regret bound of 𝒪(T2/3)\mathcal{O}(T^{2/3}) and expected space used Θ(T2/3)\Theta(T^{2/3}).

Appendix C Extension to Continuous Functions

In this section we prove Theorem 9. Before this, we present some preliminaries in online convex optimization.

In online convex optimization (OCO), there is compact convex set 𝒳n\mathcal{X}\subseteq\mathbb{R}^{n} where the learner makes decisions. At time-step tt, a convex function ft:𝒳f_{t}:\mathcal{X}\to\mathbb{R} arrives. Without observing this function, the learner has to select a point 𝐱t𝒳\mathbf{x}_{t}\in\mathcal{X} based on previous functions f1,,ft1f_{1},\ldots,f_{t-1}. After the decision has been made, the learner receives the cost ft(𝐱t)f_{t}(\mathbf{x}_{t}) and gains oracle access to ft\nabla f_{t}. The learner’s objective is to minimize the regret:

T=t=1Tft(𝐱t)min𝐱𝒳t=1Tft(𝐱).\mathcal{R}_{T}=\sum_{t=1}^{T}f_{t}(\mathbf{x}_{t})-\min_{\mathbf{x}\in\mathcal{X}}\sum_{t=1}^{T}f_{t}(\mathbf{x}).

Thakurta and Smith, (2013) introduced PFTAL (Private Follow the Approximate Leader) to privately solve the OCO problem.

Theorem 10 (Thakurta and Smith, (2013)).

PFTAL is (ε,0)(\varepsilon,0)-DP and for any input stream of convex and LL-Lipschitz functions f1,,fTf_{1},\ldots,f_{T} has expected regret

𝔼[T]𝒪(nlog2.5T(L+nlog2.5TεTdiam𝒳)2εT).\textstyle\operatorname{\mathbb{E}}\left[\mathcal{R}_{T}\right]\leq\mathcal{O}\left(\frac{\sqrt{n\log^{2.5}T}\left(L+\sqrt{\frac{n\log^{2.5}T}{\varepsilon T}}\operatorname{diam}\mathcal{X}\right)^{2}}{\varepsilon}\sqrt{T}\right).

Similar to the Hedge algorithm, we utilize PFTAL as a black-box in Algorithm 4.

Now, we present the proof of Theorem 9 in two parts, and prove each separately.

Lemma 4 (Privacy guarantee).

Algorithm 4 is (ε,0)(\varepsilon,0)-DP.

Lemma 5 (Regret guarantee).

Let R=sup𝐱𝒳x2R=\sup_{\mathbf{x}\in\mathcal{X}}\|x\|_{2}, GG be a bound on the gradients ft(𝐱t)2\|\nabla f_{t}(\mathbf{x}_{t})\|_{2}, and β\beta be the smoothness parameter of f1,,fTf_{1},\ldots,f_{T}. Then Algorithm 4 has (11/e)(1-1/e)-regret

𝔼[(11e)max𝐱𝒳t=1Tft(𝐱)t=1Tft(𝐱t)]=𝒪(T3/4log2.5T(n(G+nεT3/4log2.5Tdiam𝒳)2ε+βR2)).\textstyle\operatorname{\mathbb{E}}\left[\left(1-\frac{1}{e}\right)\max_{\mathbf{x}\in\mathcal{X}}\sum_{t=1}^{T}f_{t}(\mathbf{x})-\sum_{t=1}^{T}f_{t}(\mathbf{x}_{t})\right]=\mathcal{O}\left(T^{3/4}\sqrt{\log^{2.5}T}\left(\frac{\sqrt{n}\left(G+\sqrt{\frac{n}{\varepsilon T^{3/4}}}\log^{2.5}T\operatorname{diam}\mathcal{X}\right)^{2}}{\varepsilon}+\beta R^{2}\right)\right).

Proof of Lemma 4

As with the analysis of Algorithm 2, we show that (K1,,0)(\mathcal{E}_{K-1},\ldots,\mathcal{E}_{0}) is (ε,0)(\varepsilon,0)-DP. If each k\mathcal{E}_{k} were (ε/K,0)(\varepsilon/K,0)-DP, then the result would immediately follow by simple composition. However, we cannot guarantee that each k\mathcal{E}_{k} is (ε/K,0)(\varepsilon/K,0)-DP since k\mathcal{E}_{k} obtains as input the privatized output from 0,,k1\mathcal{E}_{0},\ldots,\mathcal{E}_{k-1} in the linear function k(𝐯)=ft(𝐱tk)𝐯\ell_{k}(\mathbf{v})=\nabla f_{t}(\mathbf{x}_{t}^{k})^{\top}\mathbf{v}, where 𝐱tk\mathbf{x}_{t}^{k} is computed by 0,,k1\mathcal{E}_{0},\ldots,\mathcal{E}_{k-1}, while at the same time is accessing again the function ftf_{t} (and so the database) via this linear function in the gradient ft\nabla f_{t}. This clearly breaks the privacy that could have been gained via a simple post-processing argument and therefore and alternative method is needed.

We do not show that each k\mathcal{E}_{k} is (ε/K,0)(\varepsilon/K,0)-DP but the group (K1,,0)(\mathcal{E}_{K-1},\ldots,\mathcal{E}_{0}) is (ε,0)(\varepsilon,0)-DP. The proof of the following lemma follows the same steps as the proof of Lemma 1. The proof is slightly simpler since there is no δ\delta-privacy term included but it requires some care since the distributions are continuous in this case.

Lemma 6.

For any i1i\geq 1, the group (i1,,0):T(𝒳T)i(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0}):\mathcal{F}^{T}\to(\mathcal{X}^{T})^{i} is iε/Ki\varepsilon/K-DP.

Proof.

We proceed by induction in ii. The base case i=1i=1 follows immediately from privacy of PFTAL in Thakurta and Smith, (2013) because 0\mathcal{E}_{0} is the only algorithm that has not its distribution perturbed by any other algorithm. For the inductive step, assume the result is true for some i1i\geq 1 and let us prove it for i+1i+1.

Let 𝐱0T,,𝐱i1T𝒳T\mathbf{x}_{0}^{T},\ldots,\mathbf{x}_{i-1}^{T}\in\mathcal{X}^{T} and 𝐗i1=(𝐱i1T,,𝐱1T)\mathbf{X}_{i-1}=(\mathbf{x}_{i-1}^{T},\ldots,\mathbf{x}_{1}^{T}). Then, for any 𝐱iT𝒳T\mathbf{x}_{i}^{T}\in\mathcal{X}^{T} we have

Pr(i(F)=𝐱iT(i1,,0)(F)=𝐗i1)eε/KPr(i(F)=𝐱iT(i1,,0)(F)=𝐗i1)\displaystyle\operatorname{\Pr}(\mathcal{E}_{i}(F)=\mathbf{x}_{i}^{T}\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F)=\mathbf{X}_{i-1})\leq e^{\varepsilon/K}\operatorname{\Pr}(\mathcal{E}_{i}(F^{\prime})=\mathbf{x}_{i}^{T}\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F^{\prime})=\mathbf{X}_{i-1})

by the guarantee of PFTAL. Note that we are referring to the PMF and not the CDF of the distribution. This is because PFTAL utilizes Gaussian noise. With this, for 𝐗i=(𝐱iT,,𝐱0T)\mathbf{X}_{i}=(\mathbf{x}_{i}^{T},\ldots,\mathbf{x}_{0}^{T}) we have,

Pr((i,,0)(F)=𝐗i)\displaystyle\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{0})(F)=\mathbf{X}_{i})
=Pr(i(F)=𝐱iT(i1,,0)(F)=𝐗i1)Pr((i1,,0)(F)=𝐗i1)\displaystyle=\operatorname{\Pr}(\mathcal{E}_{i}(F)=\mathbf{x}_{i}^{T}\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F)=\mathbf{X}_{i-1})\operatorname{\Pr}((\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F)=\mathbf{X}_{i-1})
eε/KPr(i(F)=𝐱iT(i1,,0)(F)=𝐗i1)eiε/KPr((i1,,0)(F)=𝐗i1),\displaystyle\leq e^{\varepsilon/K}\operatorname{\Pr}(\mathcal{E}_{i}(F^{\prime})=\mathbf{x}_{i}^{T}\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F^{\prime})=\mathbf{X}_{i-1})\cdot e^{i\varepsilon/K}\operatorname{\Pr}((\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F^{\prime})=\mathbf{X}_{i-1}),

where we utilized induction and the previous inequality. This completes the proof. \Box

Proof of Lemma 5

Let G=supt=1,,T𝐱𝒳ft(𝐱)2G=\sup_{\begin{subarray}{c}t=1,\ldots,T\\ \mathbf{x}\in\mathcal{X}\end{subarray}}\|\nabla f_{t}(\mathbf{x})\|_{2}. Let rir_{i} be the regret experienced by algorithm i\mathcal{E}_{i} in Algorithm 4.

The following result appears in the proof of Theorem 1 in Chen et al., 2018b .

Lemma 7 (Chen et al., 2018b ).

Assume ftf_{t} is monotone DR-submodular and β\beta-smooth for every tt. Then Algorithm 4 ensures

(11e)max𝐱𝒳t=1Tft(𝐱)t=1Tft(𝐱t)1Ki=0K1ri+βR2T2K.\left(1-\frac{1}{e}\right)\max_{\mathbf{x}\in\mathcal{X}}\sum_{t=1}^{T}f_{t}(\mathbf{x})-\sum_{t=1}^{T}f_{t}(\mathbf{x}_{t})\leq\frac{1}{K}\sum_{i=0}^{K-1}r_{i}+\frac{\beta R^{2}T}{2K}.

where R=sup𝐱𝒳𝐱2R=\sup_{\mathbf{x}\in\mathcal{X}}\|\mathbf{x}\|_{2} and rir_{i} is the regret of algorithm i\mathcal{E}_{i}.

Using this result, we obtain

𝔼[(11e)max𝐱𝒳t=1Tft(𝐱)t=1Tft(𝐱t)]\displaystyle\operatorname{\mathbb{E}}\left[\left(1-\frac{1}{e}\right)\max_{\mathbf{x}\in\mathcal{X}}\sum_{t=1}^{T}f_{t}(\mathbf{x})-\sum_{t=1}^{T}f_{t}(\mathbf{x}_{t})\right] 1Ki=0K1𝔼[ri]+βR22K\displaystyle\leq\frac{1}{K}\sum_{i=0}^{K-1}\operatorname{\mathbb{E}}[r_{i}]+\frac{\beta R^{2}}{2K}
𝒪(nlog2.5T(G+nlog2.5TεT/Kdiam𝒳)2ε/KT+βR2T2K).\displaystyle\leq\mathcal{O}\left(\frac{\sqrt{n\log^{2.5}T}\left(G+\sqrt{\frac{n\log^{2.5}T}{\varepsilon T/K}}\operatorname{diam}\mathcal{X}\right)^{2}}{\varepsilon/K}\sqrt{T}+\frac{\beta R^{2}T}{2K}\right).

We can find the regret by setting K=(Tlog2.5T)1/4K=\left(\frac{T}{\log^{2.5}T}\right)^{1/4}.