Differentially Private Online Submodular Maximization

Sebastian Perez-Salazar¹¹1Georgia Institute of Technology, [email protected] Rachel Cummings²²2Georgia Institute of Technology, [email protected]. Supported in part by a Mozilla Research Grant, NSF grants CNS-1850187 and CNS-1942772 (CAREER), and a JPMorgan Chase Faculty Research Award.

Abstract

In this work we consider the problem of online submodular maximization under a cardinality constraint with differential privacy (DP). A stream of $T$ submodular functions over a common finite ground set $U$ arrives online, and at each time-step the decision maker must choose at most $k$ elements of $U$ before observing the function. The decision maker obtains a payoff equal to the function evaluated on the chosen set, and aims to learn a sequence of sets that achieves low expected regret.

In the full-information setting, we develop an $(\varepsilon,\delta)$ -DP algorithm with expected $(1-1/e)$ -regret bound of $\mathcal{O}\left(\frac{k^{2}\log|U|\sqrt{T\log k/\delta}}{\varepsilon}\right)$ . This algorithm contains $k$ ordered experts that learn the best marginal increments for each item over the whole time horizon while maintaining privacy of the functions. In the bandit setting, we provide an $(\varepsilon,\delta+O(e^{-T^{1/3}}))$ -DP algorithm with expected $(1-1/e)$ -regret bound of $\mathcal{O}\left(\frac{\sqrt{\log k/\delta}}{\varepsilon}(k(|U|\log|U|)^{1/3})^{2}T^{2/3}\right)$ .

Our algorithms contains $k$ ordered experts that learn the best marginal item to select given the items chosen her predecessors, while maintaining privacy of the functions. One challenge for privacy in this setting is that the payoff and feedback of expert $i$ depends on the actions taken by her $i-1$ predecessors. This particular type of information leakage is not covered by post-processing, and new analysis is required. Our techniques for maintaining privacy with feedforward may be of independent interest.

1 Introduction

Ensuring users’ privacy has become a critical task in online learning algorithms. As an illustrative example, sponsored search engines aim to maximize the probability that displayed ads or products are clicked by incoming customers, but prospective customers do not want their privacy infringed after clicking on a product. Users visiting online retailer web-pages such as Amazon, Walmart or Target leave behind an abundance of sensitive personal information that can be use to predict their behaviors or preferences, potentially leading to catastrophic results (Zhang et al.,, 2014)³³3See also https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html. In this work, we introduce the first algorithms for privacy-preserving online monotone submodular maximization under a cardinality constraint.

A submodular set function $f:2^{U}\to\mathbb{R}$ exhibits diminishing returns, meaning that adding an element $x$ to a larger set $B$ creates less additional value than adding $x$ any subset of $B$ . (See Definition 1 in Section 2 for a formal definition.) Submodular functions have found widespread application in economics, computer science and operations research (see, e.g., Bach, (2013) and Krause and Golovin, (2014)), and have recently gained attention as a modeling tool for data summarization and ad display (Ahmed et al.,, 2012; Streeter et al.,, 2009; Badanidiyuru et al.,, 2014). We additionally consider monotone submodular functions, where adding elements to a set can only increase the value of $f$ . Since unconstrained monotone submodular maximization is trivial— $f(S)$ can be maximized by choosing the entire universe $S=U$ —we consider cardinality constrained maximization, where the decision-maker solves: $\max_{S\subseteq U}f(S)\text{ s.t. }|S|\leq k$ .

In the online learning setting, at each time-step $t$ a learner must choose a set $S_{t}\subseteq U$ of size at most $k$ and receives payoff $f_{t}(S_{t})$ for a monotone submodular function $f_{t}$ . Importantly, the learner does not know $f_{t}$ before she chooses $S_{t}$ , but this set can be chosen based on previous functions $f_{1},\ldots,f_{t-1}$ . Two types of informational feedback are commonly studied in the online learning literature. In the full-information setting, the learner gets full oracle access to the function $f_{t}$ after choosing $S_{t}$ , and thus is able to incorporate the entirety of previous functions into her future decisions. In the bandit setting, the learner only observes her own payoff $f_{t}(S_{t})$ as feedback.

Performance of an online learner is typically measured by the regret, which is the difference between the best fixed decision in hindsight and the cumulative payoff obtained by the learner (Zinkevich,, 2003; Hazan,, 2016; Shalev-Shwartz,, 2012). More precisely, the regret of a learner after $T$ rounds is: $\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\sum_{t=1}^{T}f_{t}(S_{t})$ . The aim often is to design algorithms with sublinear regret, i.e., $o(T)$ , so that the average payoff over time of the algorithm is comparable with the best average fixed profit in hindsight. Offline monotone submodular maximization under a cardinality constraint is NP-hard to approximate with a factor better than $(1-1/e)$ (Feige,, 1998; Mirrokni et al.,, 2008), so we instead measure the quality of our algorithms using the more restrictive notion of $(1-1/e)$ -regret (Streeter and Golovin,, 2009; Streeter et al.,, 2009):

\operatorname{\mathcal{R}}_{T}=\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\sum_{t=1}^{T}f_{t}(S_{t}).

(1)

The privacy notion we consider in this work is differential privacy (Dwork et al.,, 2006), which enables accurate estimation of population-level statistics while ensuring little can be learned about the individuals in the database. Informally, a randomized algorithm is said to be differentially private if changing a single entry in the input database results in only a small distributional change in the outputs. (See Definition 2 in Section 2 for a formal definition.) This means that an adversary cannot information-theoretically infer whether or not a single individual participated in the database. Differentially private algorithms have been deployed by major organizations including Apple, Google, Microsoft, Uber, and the U.S. Census Bureau, and are seen as the gold standard in privacy-preserving data analysis. In this work, the input database to our learning algorithm consists of a stream of functions $F=\{f_{1},\ldots,f_{T}\}$ , and each individual’s data corresponds to a function $f_{t}$ . Our privacy guarantees ensure that the stream of chosen sets $S_{1},\ldots,S_{T}$ are differentially private with respect to this database of functions,

In both the full-information and bandit settings, we present differentially private online learning algorithms that achieve sublinear expected $(1-1/e)$ -regret.

Motivating Example.

While there are countless examples of practical online submodular maximization problems using sensitive data, we offer this motivating example for concreteness. Consider an online product display model where a website has $k$ display slots and wants to maximize the probability of any displayed product being clicked. Each customer $t$ has a (privately known) probability $p_{a}^{t}$ of clicking a display for product $a\in U$ , independently of the other products displayed. Let $f_{t}(S)$ denote the probability that customer $t$ clicks on any product in a display set $S$ . We can write this function in closed form as $f_{t}(S)=1-\prod_{a\in S}(1-p_{a}^{t})$ . Note that this function is submodular because adding products to the set $S$ exhibits diminishing returns in total click probability. Each customer’s click-probabilities $\{p^{t}_{a}\}_{a\in U}$ contain sensitive information about his preferences or habits, and require formal privacy protections.

1.1 Our Results

Our main results are differentially private algorithms for online submodular maximization under a cardinality constraint. We provide algorithms that achieve sublinear expected $(1-1/e)$ -regret in both the full-information and bandit settings.

Our algorithms are based on the approach of Streeter and Golovin, (2009), who designed (non-private) online algorithms with low expected $(1-1/e)$ -regret for submodular maximization. We adapt and extend their techniques to additionally satisfy differential privacy. Following the spirit of Streeter and Golovin, (2009), our algorithms have $k$ ordered online learning algorithms, or experts, that together pick $k$ items at every time-step and learn from their decisions over time. Roughly speaking, expert $i$ learns how to choose an item that compliments the decisions of the previous $i-1$ experts. The expected $(1-1/e)$ -regret can be bounded by the regret of these $k$ experts, so to show a low $(1-1/e)$ -regret algorithm that preserves privacy, we simply need to find no-regret experts that together preserve privacy. Ideally, we would like each expert to be differentially private so that simple composition and post-processing arguments would yield overall privacy guarantees. Unfortunately this is not possible for $k>1$ because the choices of all previous experts alter the distribution of payoffs for expert $i$ .

Specifically, the $i$ -th expert non-privately queries the function (i.e., accesses the database) at $|U|$ points that depend on the action of the previous experts. A naive solution is to allow each expert to query the function at any of its $2^{|U|}$ values, and then privacy would be satisfied by post-processing on the differentially private outputs of previous experts. However, this larger domain size requires large quantities of noise that would harm the experts’ no-regret guarantees. Effectively, this decouples the advice of the $k$ experts, so that experts are not learning from each other. This naturally helps privacy but harms learning. Instead, we restrict each expert to a domain of size $|U|$ that is defined by the actions of previous experts. This ensures no-regret learning, but post-processing no longer ensures privacy. We overcome this challenge by showing that together the experts are differentially private and sufficiently low quantities of noise are needed.

Theorem 1 below is an informal version of our main results in the full-information setting (Theorems 5 and 6 in Section 3).

Theorem 1 (Informal).

In the full-information setting, Algorithm 2 for online monotone $k$ -cardinality-constrained submodular maximization is $(\varepsilon,\delta)$ -differentially private and guarantees

\operatorname{\mathbb{E}}\left[\operatorname{\mathcal{R}}_{T}\right]=\mathcal{O}\left(\frac{k^{2}\log|U|\sqrt{T\log(k/\delta)}}{\varepsilon}\right).

In the bandit setting, each expert only receives its own payoff as feedback, and does not have oracle access to the entire function. For this setting, we modify the full-information algorithm by using a biased estimator of the marginal increments for other actions.

The algorithm also requires additional privacy considerations. The non-private approach of Streeter and Golovin, (2009) randomly decides in each round whether to explore or exploit. In exploit rounds, the experts sample a new set but play the current-optimal action, providing both learning and exploitation. Directly privatizing this algorithm incurs additional privacy loss from the exploit rounds, which leads to a weak bound of $\mathcal{O}(T^{3/4})$ for the expected $(1-1/e)$ -regret, far from the best known $\mathcal{O}(T^{2/3})$ . Instead, we have the experts sample new sets only after an exploration round has occurred. The choice to explore is data-independent, so privacy is maintained by post-processing. If the exact number and timing of explore rounds are known in advance, this results in an $(\varepsilon,\delta)$ -DP algorithm. However, this approach requires $\Omega(T^{2/3}+k|U|)$ space, which is not appealing in practical settings where $T$ is substantially larger than $U$ . Instead we allow explore-exploit decisions to be made online and obtain a high probability bound on the number of explore rounds based on the sampling parameter. At the expense of an exponentially small loss in the $\delta$ privacy parameter—resulting from the failure of the high probability bound—we obtain the asymptotically optimal $\mathcal{O}(T^{2/3})$ expected $(1-1/e)$ -regret.

Theorem 2 is an informal version of our main results in the more challenging bandit feedback setting (Theorems 7 and 8 in Section 4).

Theorem 2 (Informal).

In the bandit feedback setting, Algorithm 3 for online monotone $k$ -cardinality-constrained submodular maximization is $(\varepsilon,\delta+e^{-8T^{1/3}})$ -differentially private and guarantees

\operatorname{\mathbb{E}}\left[\operatorname{\mathcal{R}}_{T}\right]=\mathcal{O}\left(\frac{\sqrt{\log k/\delta}}{\varepsilon}(k(|U|\log|U|)^{1/3})^{2}T^{2/3}\right).

The best known non-private expected $(1-1/e)$ -regret in the full-information setting is $\mathcal{O}\left(\sqrt{kT\log|U|}\right)$ and in the bandit setting is $\mathcal{O}\left(k(|U|\log|U|)^{1/3}T^{2/3}\right)$ (Streeter and Golovin,, 2009). Comparing our expected $(1-1/e)$ -regret bounds to these, we see that our bounds are asymptotically optimal in $T$ , and have slight gaps in terms of $k$ and $U$ . Typically, the dominating term is the time horizon $T$ with $k\leq|U|\ll T$ , so our results match the best expected $(1-1/e)$ -regret asymptotically in $T$ .

Additionally, we show that our algorithms can be extended to a continuous generalization of submodular functions, know as DR-submodular functions. We provide a differentially private online learning algorithm for DR-submodular maximization that achieves low expected regret. A brief overview of this extension is given in Section 5, with further details in the appendix.

1.2 Related Work

Online learning (Zinkevich,, 2003; Cesa-Bianchi and Lugosi,, 2006; Hazan,, 2016; Shalev-Shwartz,, 2012) has gained increasing attention for making decisions in dynamic environments when only partial information is available. Its applicability in ad placement (Chatterjee et al.,, 2003; Chapelle and Li,, 2011; Tang et al.,, 2014) has made this model attractive from a practical viewpoint.

Submodular optimization has been widely studied, due to the large number of important submodular functions, such as the cut of a graph, entropy of a set of random variables, and the rank of a matroid, to name only a few. For more applications see (Schrijver,, 2003; Williamson and Shmoys,, 2011; Bach,, 2013). While (unconstrained) submodular minimization can be solved with polynomial number of oracle calls (Schrijver,, 2003; Bach,, 2013), submodular maximization is known to be NP-hard for general submodular functions. For monotone submodular functions under cardinality constraint, it is impossible to find a polynomial time algorithm that achieves a fraction better than $(1-1/e)$ of the optimal solution unless P=NP (Feige,, 1998), and this approximation factor is achieved by the greedy algorithm (Fisher et al.,, 1978). For further results with more general constraints, we refer the reader to the survey (Krause and Golovin,, 2014). In the online setting, Streeter and Golovin, (2009) and Streeter et al., (2009) were the first to study online monotone submodular maximization, respectively with cardinality/knapsack constraints and partition matroid constraints. Recently, continuous submodularity, has gained attention in the continuous optimization community Hassani et al., (2017); Niazadeh et al., (2018); Zhang et al., (2020). See Chen et al., 2018a ; Chen et al., 2018b for online continuous submodular optimization algorithms.

Differential privacy (Dwork et al.,, 2006) has become the gold standard for individual privacy, and there as been a large literature developed of differentially private algorithms for a broad set of analysis tasks. See Dwork and Roth, (2014) for a textbook treatment. Due to privacy concerns in practical applications of online learning, there has been growing interest in implementing well-known methods—such as experts algorithms and gradient optimization methods–in a differentially private way. See for instance (Jain et al.,, 2012; Thakurta and Smith,, 2013).

Differential privacy and submodularity were first jointly considered in (Gupta et al.,, 2010). They studied the combinatorial public projects problem, where the objective function was a sum of monotone submodular functions, each representing an agent’s private valuation function, and a decision-maker must maximize this objective subject to a cardinality constraint. The authors designed an $(\varepsilon,0)$ -DP algorithm using the Exponential Mechanism of (McSherry and Talwar,, 2007) as a private subroutine, and achieved a $(1-1/e)$ -approximation to the optimal non-private solution, plus an additional $\propto\varepsilon^{-1}$ term. Later, Mitrovic et al., (2017) extended these results to monotone submodular functions in the cardinality, matroid and $p$ -system constraint cases. Their methods also used the Exponential Mechanism to ensure differential privacy. See also recent work by Rafiey and Yoshida, (2020).

In the online learning framework, Cardoso and Cummings, (2019) study online (unconstrained) differentially private submodular minimization. They use the Lovász extension of a set function as a convex proxy to apply known privacy tools that work in online convex optimization (Jain et al.,, 2012; Thakurta and Smith,, 2013). Since submodular minimization and maximization are fundamentally different technical problems, the techniques of Cardoso and Cummings, (2019) do not extend to our setting.

Fundamental to our analysis is the differentially private Exponential Mechanism of McSherry and Talwar, (2007) and its inherent connection to multiplicative weights algorithms (Hazan,, 2016; Shalev-Shwartz,, 2012) to estimate probability distributions in the simplex while preserving privacy.

2 Preliminaries

In this section we review definitions and properties of submodular functions and differential privacy.

Definition 1 (Submodularity).

A function $f:2^{U}\to\mathbb{R}$ is submodular if it satisfies the following diminishing returns property: For all $A\subseteq B\subseteq U$ and $x\notin B$ ,⁴⁴4 Equivalently, $f$ is submodular if $f(A\cap B)+f(A\cup B)\leq f(A)+f(B)$ for all $A,B\subseteq U$ .

f(A\cup\{x\})-f(A)\geq f(B\cup\{x\})-f(B).

As is standard in the submodular maximization literature, we assume $f(\emptyset)=0$ . In our motivating example, this means that if no items are shown to the incoming customer, then the probability of selecting an item is $0$ . We let $\mathcal{F}$ denote the family of submodular functions with finite ground set $U$ . For the sake of simplicity, we will additionally assume that all functions take value in the interval $[0,1]$ . In this work, we additionally consider set functions $f$ that are monotone or non-decreasing, i.e., $f(A)\leq f(B)$ for all $A\subseteq B$ .

In the problem of online monotone submodular maximization under a cardinality constraint, a sequence of $T$ monotone submodular functions $f_{1},\ldots,f_{T}:2^{U}\to[0,1]$ arrive in an online fashion. At every time-step $t$ , the decision maker $\mathcal{A}$ has to choose a subset $S_{t}\subseteq U$ of size at most $k$ before observing $f_{t}$ . This decision must be based solely on previous observations. The decision maker $\mathcal{A}$ receives a payoff $f_{t}(S_{t})$ and her goal is to minimize the $(1-1/e)$ -expected-regret $\operatorname{\mathbb{E}}[\operatorname{\mathcal{R}}_{T}]$ , where $\operatorname{\mathcal{R}}_{T}=\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\sum_{t=1}^{T}f_{t}(S_{t})$ as defined in Equation (1), and the randomness is over the algorithm’s choices.

A fundamental tool in our analysis is the Hedge algorithm (Algorithm 1) of Freund and Schapire, (1997) which chooses an action from a set $[N]=\{1,\ldots,N\}$ based on past payoffs from each action. The algorithm takes as input a learning rate $\eta$ and a stream of linear functions $g_{1},\ldots,g_{T}:[N]\to[0,1]$ , where the payoff of playing action $i$ at time $t$ is $g_{t}(i)$ .

In our setting, the learner must select a set of at most $k$ items from the ground set $U$ . The learner does this by implementing $k$ ordered copies of the Hedge algorithm, each of which choses one item, so the action space for each instantiation is the ground set: $N=U$ . The $i$ -th copy of Hedge learns the item with the best marginal gain given the decisions made by the previous $i-1$ Hedge algorithms.

Initialize

w_{1}=(1,\ldots,1)\in\mathbb{R}^{N}

for $t=1,\ldots,T$ do

Sample action

i_{t}\in[N]

w.p.

x_{t}(i)=\frac{w_{t}(i)}{\sum_{j}w_{t}(j)}

Obtain payoff

g_{t}(i_{t})

and full access to

g_{t}

Update

w_{t+1}(i)=w_{t}(i)e^{\eta g_{t}(i)}

Algorithm 1 Hedge(

\eta,g_{1},\ldots,g_{T}

)

The Hedge algorithm exhibits the following guarantee, which is useful for analyzing its regret, as well as the regret of our algorithms which instantiate Hedge.

Theorem 3 (Freund and Schapire, (1997)).

For any $i\in[N]$ , the distributions $\mathbf{x}_{1},\ldots,\mathbf{x}_{T}$ over $[N]$ constructed by Algorithm 1 satisfy

\sum_{t=1}^{T}g_{t}(i)-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}g_{t}\leq\eta\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}g_{t}^{2}+\frac{\log N}{\eta},

where $g^{2}_{t}$ is the vector $g_{t}$ with each coordinate squared.

For the privacy considerations of this work, we view the input database as the ordered input sequence of submodular functions $F=\{f_{1},\ldots,f_{T}\}$ and the algorithm’s output as the sequence of chosen sets $S_{1},\ldots,S_{T}$ . We say that two sequences $F,F^{\prime}$ of functions are neighboring if $f_{t}\neq f_{t}^{\prime}$ for at most one $t\in[T]$ .

Definition 2 (Differential Privacy (Dwork et al.,, 2006)).

An online learning algorithm $\mathcal{A}:\mathcal{F}^{T}\to(2^{U})^{T}$ is $(\varepsilon,\delta)$ -differentially private if for any neighboring function databases $F,F^{\prime}$ , and any event $S\subseteq(2^{U})^{T}$ ,

\operatorname{\Pr}(\mathcal{A}(F)\in S)\leq e^{\varepsilon}\operatorname{\Pr}(\mathcal{A}(F^{\prime})\in S)+\delta.

Differential privacy is robust to post-processing, meaning that any function of a differentially private output maintains the same privacy guarantee.

Proposition 1 (Post-Processing (Dwork et al.,, 2006)).

Let $\mathcal{M}:\mathcal{F}^{T}\to\mathcal{R}$ be an $(\varepsilon,\delta)$ -DP algorithm and let $h:\mathcal{R}\to\mathcal{R}^{\prime}$ be an arbitrary function. Then, $\mathcal{M}^{\prime}\doteq h\circ\mathcal{M}:\mathcal{F}^{T}\to\mathcal{R}^{\prime}$ is also $(\varepsilon,\delta)$ -DP.

Differentially private algorithms also compose, and the privacy guarantees degrade gracefully as addition DP computations are performed. This enables modular algorithm design using simple differentially private building blocks. Basic Composition (Dwork et al.,, 2006) says that can simply add up the privacy parameters used in an algorithm’s subroutines to get the overall privacy guarantee. The following Advanced Composition theorem provides even tighter bounds.

Theorem 4 (Advanced Composition (Dwork et al., 2010b, )).

Let $\mathcal{M}_{1},\ldots,\mathcal{M}_{k}$ each be $(\varepsilon,\delta)$ -DP algorithms. Then, $\mathcal{M}=(\mathcal{M}_{1},\ldots,\mathcal{M}_{k})$ is $(\varepsilon^{\prime},k\delta+\delta^{\prime})$ -DP for $\varepsilon^{\prime}=\sqrt{2k\log(1/\delta^{\prime})}\varepsilon+k\varepsilon(e^{\varepsilon}-1)$ and any $\delta^{\prime}\geq 0$ .

Our algorithms rely on the Exponential Mechanism (EM) introduced by McSherry and Talwar, (2007). The EM takes in database $F$ , a finite action set $U$ , and a quality score $q:\mathcal{F}^{T}\times U\to\mathbb{R}$ , where $q(F,i)$ assigns a numeric score to the quality of outputting $i$ on input database $F$ . The sensitivity of the quality score, denoted $\Delta q$ , is the maximum change in the value of $q$ across neighboring databases: $\Delta q=\max_{i\in U}\max_{F,F^{\prime}\;neighbors}|q(F,i)-q(F^{\prime},i)|$ . Given these inputs, the EM outputs $i\in U$ with probability proportional to $\exp(\varepsilon\frac{q(F,i)}{2\Delta q})$ . The Exponential Mechanism is $(\varepsilon,0)$ -DP (McSherry and Talwar,, 2007).

As noted by Jain et al., (2012) and Dwork et al., 2010a , the Hedge algorithm can be converted into a DP algorithm using advanced composition and EM.

Proposition 2.

If $\eta=\frac{\varepsilon}{\sqrt{32T\log 1/\delta}}$ , then Hedge (Algorithm 1) is $(\varepsilon,\delta)$ -DP.

3 Full Information Setting

In this section, we introduce our first algorithm for online submodular maximization under cardinality constraint. It is both differentially private and achieves the best known expected $(1-1/e)$ -regret in $T$ . For cardinality $k$ , the learner implements $k$ ordered copies of the Hedge algorithm. Each copy is in charge of learning the marginal gain that complements the choices of the previous Hedge algorithms. At time-step $t$ , each Hedge algorithm selects an element $a\in U$ and the learner gathers these choices to play the corresponding set. When she obtains oracle access to the submodular function, for each $i\in[k]$ , she constructs a vector $g_{t}^{i}$ with $a$ -th coordinate given by the marginal gain of adding $a\in U$ to the choices made by the previous $i-1$ Hedge algorithms. Finally, she feeds back the vector $g_{t}^{i}$ to Hedge algorithm $i$ . A formal description of this procedure is presented in Algorithm 2.

Initialize: Set

\eta=\frac{\varepsilon}{k\sqrt{32T\log(k/\delta)}}

Instantiate

k

parallel copies

\mathcal{E}_{1},\ldots,\mathcal{E}_{k}

of Hedge algorithm with rate

\eta

for $t=1,\ldots,T$ do

For each

i=1,\ldots,k

, sample

a_{t}^{i}

given by

\mathcal{E}_{i}

Play

S_{t}=\cup_{i=1}^{k}\{a_{t}^{i}\}

Obtain

f_{t}(S_{t})

and oracle access to

f_{t}

For each

i=1,\ldots,k

, define linear function

g_{t}^{i}:U\to[0,1]

g_{t}^{i}(a)=f_{t}(S_{t}^{i-1}+a)-f_{t}(S_{t}^{i-1}),\quad\forall a\in U,

where

S_{t}^{i}=\cup_{j=1}^{i}\{a_{t}^{j}\}

Feed back each Hedge algorithm

\mathcal{E}_{i}

with

g_{t}^{i}

Algorithm 2 FI-DP

(F=\{f_{t}\}_{t=1}^{T},k,\varepsilon,\delta)

To ensure differential privacy, it would be enough to show that each Hedge $\mathcal{E}_{i}$ is $(\varepsilon/k,\delta/k)$ -DP. Indeed, if the sequence $(a_{1}^{i},\ldots,a_{T}^{i})$ constructed by each Hedge algorithm $i$ is $(\varepsilon/k,\delta/k)$ -DP, then by Basic Composition and post-processing, the sequence $(S_{1},\ldots,S_{T})$ is $(\varepsilon,\delta)$ -DP, where $S_{t}=\{a_{t}^{i}\}_{i=1}^{k}$ . However, for $i\geq 2$ , the output of expert $\mathcal{E}_{i}$ depends on the choices made by algorithms $\mathcal{E}_{1},\ldots,\mathcal{E}_{i-1}$ . Moreover, algorithm $\mathcal{E}_{i}$ by itself is again accessing the database $F$ , hence ruling out a post-processing argument. Despite this, we show that all experts together are $(\varepsilon,\delta)$ -DP even though individually we cannot ensure they preserve $(\varepsilon/k,\delta/k)$ -DP.

It is worth noting that the Hedge algorithms $\mathcal{E}_{1},\ldots,\mathcal{E}_{k}$ in Algorithm 2 can be replaced by any other no-regret DP method that selects items over $U$ , and the same proof structure would follow—although the regret bound would depend on the choice of no-regret algorithm. For instance, if we utilize the private experts method of(Thakurta and Smith,, 2013) instead of the Hedge algorithm, Algorithm 2 would be $(\varepsilon,0)$ -DP with a regret bound of $\mathcal{O}\left(k^{2}\frac{\sqrt{|U|T\log^{2.5}T}}{\varepsilon}\right)$ .

Theorem 5.

Algorithm 2 is $(\varepsilon,\delta)$ -differentially private.

Theorem 6.

Algorithm 2 has $(1-1/e)$ -expected-regret

\displaystyle\operatorname{\mathbb{E}}\left[\operatorname{\mathcal{R}}_{T}\right]

\displaystyle\leq\mathcal{O}\left(\frac{k^{2}\log|U|\sqrt{T\log(k/\delta)}}{\varepsilon}\right).

Proof of Theorem 5

The output of Algorithm 2 is the stream of sets $(S_{1},\ldots,S_{T})$ . Before showing that this output preserves privacy, we deal with a simpler case from which we can deduce an inductive argument.

Note that $\mathcal{E}_{1}(F)$ receives as feedback the functions $g_{t}^{1}=(f_{t}(a))_{a\in U}$ at each time step. By Proposition 2, we have that $\mathcal{E}_{1}$ is $(\varepsilon/k,\delta/k)$ -DP given that $\eta=\frac{\varepsilon}{k\sqrt{32T\log k/\delta}}$ . On the other hand $\mathcal{E}_{2}(F)$ receives as feedback the functions $g_{t}^{2}=(f_{t}(a_{t}^{1}+a)-f_{t}(a_{t}^{1}))_{a\in U}$ at each time-step, where $a_{t}^{1}$ is computed by $\mathcal{E}_{1}(F)$ . Therefore, the output of $\mathcal{E}_{2}$ depends uniquely on the choices of $\mathcal{E}_{1}$ , hence, conditioning on these choices, $\mathcal{E}_{2}$ should also be $(\varepsilon/k,\delta/k)$ -DP. We generalize and formalize this in the next few paragraphs.

Consider the following family of algorithms: For $a^{1},\ldots,a^{i-1}\in U^{T}$ let $S^{i-1}=\{a^{i-1},\ldots,a^{1}\}$ . For $t=1,\ldots,T$ , let $\mathcal{M}^{S^{i-1}}_{t}:\mathcal{F}^{T}\to\Delta(U)$ be the EM that outputs $a\in U$ with probability proportional to $e^{\eta\sum_{\tau<t}f_{\tau}(S_{\tau}^{i-1}\cup\{a\})-f_{\tau}(S_{\tau}^{i-1})}$ . Each of these mechanisms is $2\eta$ -DP by Proposition 2. Therefore, by Advanced Composition and our choice of $\eta$ , $\mathcal{M}^{S^{i-1}}:=(\mathcal{M}_{1}^{S^{i-1}},\ldots,\mathcal{M}_{T}^{S^{i-1}})$ is $(\varepsilon/k,\delta/k)$ -DP. Note that for $S\subseteq U^{T}$ we have

\displaystyle\operatorname{\Pr}(\mathcal{E}_{i}(F)\in S\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{1})(F)=S^{i-1})=\operatorname{\Pr}(\mathcal{M}^{S^{i-1}}(F)\in S)

and the latter expression describes the output of an $(\varepsilon/k,\delta/k)$ -DP algorithm. This formalizes the idea that $\mathcal{E}_{2}$ is $(\varepsilon/k,\delta/k)$ -DP if the choices of $\mathcal{E}_{1}$ are fixed. We utilize this idea to show that together $(\mathcal{E}_{k},\ldots,\mathcal{E}_{1})$ are $(\varepsilon,\delta)$ -DP. This is formally presented in Lemma 1. The proof of this result (formally given in Appendix A.1) is an inductive argument that takes advantage of the DP guarantee of the mechanisms $\mathcal{M}^{S^{i-1}}$ .

Lemma 1.

For any $i\in[k]$ , the function $(\mathcal{E}_{i},\mathcal{E}_{i-1},\ldots,\mathcal{E}_{1}):\mathcal{F}^{T}\to U^{T}\times\cdots\times U^{T}$ which is the composition of the first $i$ Hedge algorithms is $(i\varepsilon/k,i\delta/k)$ -DP.

Lemma 1 with $i=k$ and post-processing ensures that Algorithm 2 is $(\varepsilon,\delta)$ -DP.

Proof of Theorem 6

The key idea is to bound the $(1-1/e)$ -regret of Algorithm 2 by the regret incurred by the $k$ Hedge algorithms $\mathcal{E}_{1},\ldots,\mathcal{E}_{k}$ . We formalize this in Proposition 3 below. With this bound, we can utilize the regret bound of the Hedge algorithm and conclude the proof. The regret incurred by $\mathcal{E}_{i}$ is

\displaystyle r_{i}=\max_{a\in U}

\displaystyle\sum_{t=1}^{T}g_{t}^{i}(a)-\sum_{t=1}^{T}g_{t}^{i}(a_{t}),

where $g_{t}^{i}(a)=f_{t}(S_{t}^{i-1}\cup\{a\})-f_{t}(S_{t}^{i-1})$ .

Proposition 3.

The $(1-1/e)$ -regret of Algorithm 2 is bounded by the expected regret of $\mathcal{E}_{1},\ldots,\mathcal{E}_{k}$ .

While a full proof of Proposition 3 is deferred to in Appendix A.2, we describe the key idea here. To bound the $(1-1/e)$ -regret, we rewrite the regret $r_{i}$ via the function $F:2^{[T]\times U}\to[0,1]$ , $F(A)=\frac{1}{T}\sum_{t=1}^{T}f(A_{t})$ , where $A_{t}=\{u\in U:(t,u)\in A\}$ as:

\frac{r_{i}}{T}=\max_{a\in U}F(\widetilde{S}^{i-1}\cup\{a\})-F(\widetilde{S}^{i})

where $\widetilde{S}^{\ell}=\bigcup_{t=1}^{T}\{t\}\times S^{\ell}$ . We show that $F(\widetilde{S}^{i})-F(\widetilde{S}^{i-1})\geq\frac{F(\widetilde{OPT})-F(\widetilde{S}^{i-1})}{k}-\frac{r_{i}}{T}$ , where $\widetilde{OPT}$ is the extension of $OPT=\mathrm{argmax}_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)$ to $[T]\times U$ . Upon unrolling this recursion, we obtain the result.

To finish the proof of Theorem 6 we need to bound the overall regret of all $\mathcal{E}_{i}$ . Observe that once we have fixed $S_{1}^{i-1},\ldots,S_{T}^{i-1}$ , the feedback of expert $i$ is completely determined since the elements $a_{t}^{1},\ldots,a_{t}^{i-1}$ depend only on experts $1,\ldots,i-1$ . Therefore, we have

\displaystyle\operatorname{\mathbb{E}}[r_{i}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}]\leq\eta T+\frac{\log|U|}{\eta}

by the Hedge regret guarantee. Integrating from $k$ to $1$ we get $\operatorname{\mathbb{E}}\left[\mathcal{R}_{T}\right]\leq\sum_{i=1}^{k}\operatorname{\mathbb{E}}[r_{i}]\leq k\left(\eta T+\frac{\log|U|}{\eta}\right)$ , and the result follows with our choice of $\eta=\frac{\varepsilon}{k\sqrt{32T\log(k/\delta)}}$ .

4 Bandit Setting

In the bandit case, the algorithm only receives as feedback the value $f_{t}(S_{t})$ . Given this restricted information, the algorithm must trade-off exploration of the function with exploiting current knowledge. As in (Streeter and Golovin,, 2009), our algorithm controls this tradeoff using a parameter $\gamma\in[0,1]$ , and by randomly exploring in each time-step independently with probability $\gamma$ .

The non-private approach of Streeter and Golovin, (2009) obtains $\mathcal{O}(T^{2/3})$ expected $(1-1/e)$ -regret, and works as follows: In exploit rounds (prob. $1-\gamma$ ), play the experts’ sampled choice $S_{t}$ and feed back $0$ to each $\mathcal{E}_{i}$ . In explore rounds (prob. $\gamma$ ), select $i\in[k]$ and $a\in U$ uniformly at random. Play set $S_{t}=S_{t}^{i-1}+a$ , observe feedback $f_{t}(S_{t}^{i-1}+a)$ , give this value to $\mathcal{E}_{i}$ , and feedback $0$ to the remaining experts.

As we show in Appendix B.1, directly privatizing this algorithm using the Hedge method from the full-information setting results in an expected $(1-1/e)$ -regret of $\mathcal{O}(T^{3/4})$ , which is far from the optimal $\mathcal{O}(T^{2/3})$ . The problem with this naive approach is that a new sample is obtained via the Hedge algorithms at every time-step, including exploit steps, so to ensure $(\varepsilon,\delta)$ -DP, a learning rate of $\eta=\frac{\varepsilon}{k\sqrt{32T\log(k/\delta)}}$ is required.

We improve upon this by calling the Hedge algorithm only after an exploration time-step has occurred, and new information is available. The learner continues playing this same set until the next exploration round, and privacy of these exploitation rounds follows from post-processing. This dramatically reduces the number of rounds that access the dataset, and reduces the overall amount of noise required for privacy.

If the exact number of exploration rounds were known, this could be plugged into the learning rate $\eta$ to achieve $(\varepsilon,\delta)$ -DP. In the non-private setting, a doubling trick (see, e.g., Shalev-Shwartz, (2012)) can be employed to find the right learning rate by calling the algorithm multiple times, doubling $T$ and thus rescaling $\eta$ on each iteration. Unfortunately, this doubling trick does not work in the private setting due to the direct non-linear connection between $\varepsilon$ the privacy parameter, $T$ the time horizon and $\eta$ the learning rate, as specified in Proposition 2. Instead we use concentration inequalities (Alon and Spencer,, 2004) to ensure that there are no more than $2\gamma T$ exploration rounds, except with probability $e^{-8{T^{1/3}}}$ . With this, we can select a fixed learning rate $\eta=\frac{\varepsilon}{k\sqrt{32(2\gamma T)\log(k/\delta)}}$ and guarantee optimal $\mathcal{O}(T^{2/3})$ expected $(1-1/e)$ -regret, and the cost of $(\varepsilon,\delta+e^{-8{T^{1/3}}})$ -DP.

We remark in Appendix B.2 that this additional loss in the $\delta$ term can be avoided by pre-sampling the exploration round, but this requires $\Theta(T^{2/3}+k|U|)$ space, which may be unacceptable for large $T$ .

Algorithm 3 presents the space-efficient approach. Here $\widehat{f}_{t}^{i}$ is the vector with $a$ -th coordinate given by: $\widehat{f}_{t}^{i,a}=f_{t}(S_{t}^{i-1}+a)\mathbf{1}_{\{\text{Explore at time $t$, pick }i,\text{ pick }a\}}$ .

Initialize: Set

\gamma=k\left(\frac{(16|U|\log|U|)^{2}}{T}\right)^{1/3}

and

\eta=\frac{\varepsilon}{k\sqrt{32(2\gamma T)\log(k/\delta)}}

Instantiate

k

parallel copies

\mathcal{E}_{1},\ldots,\mathcal{E}_{k}

of Hedge algorithm with rate

\eta

. Utilize each

\mathcal{E}_{i}

to sample

a_{1}^{i}

and set

S_{1}=\{a_{1}^{1},\ldots,a_{1}^{k}\}

for $t=1,\ldots,T$ do

Sample

b_{t}\sim\mathrm{Bernoulli}(\gamma)

if $b_{t}=1$ then

Sample

i\in[k]

u.a.r. and

a\in U

u.a.r.

Play

S_{t}^{i-1}\cup\{a\}

Obtain value

f_{t}(S_{t})

Feed back the function

\widehat{f}_{t}^{i}

to expert

\mathcal{E}_{i}

\forall i

Utilize

\mathcal{E}_{i}

to pick

a_{t+1}^{i}

\forall i

Update set

S_{t+1}=\cup_{i=1}^{k}\{a_{t+1}^{i}\}

else

Play

S_{t}

Obtain

f_{t}(S_{t})

Update

S_{t+1}=S_{t}

Algorithm 3 BanditDP

(F,\varepsilon,\delta)

Theorem 7.

Algorithm 3 is $(\varepsilon,\delta+e^{-8{T^{1/3}}})$ -DP.

Theorem 8.

Algorithm 3 has $(1-1/e)$ -regret

\operatorname{\mathbb{E}}\left[\operatorname{\mathcal{R}}_{T}\right]\leq\mathcal{O}\left(\frac{\sqrt{\log k/\delta}}{\varepsilon}(k(|U|\log|U|)^{1/3})^{2}T^{2/3}\right).

Proof of Theorem 7

Observe that the algorithm only releases new information right an exploration time-step. If $t_{1},\ldots,t_{M}$ are the exploration time-steps, with $M$ distributed as the sum of $T$ independent Bernoulli random variables with parameter $\gamma$ , then conditioned on the event $M<2\gamma T$ , we know that the outputs $S_{1},S_{t_{1}+1},\ldots,S_{t_{M}+1}$ are $(\varepsilon,\delta)$ -DP by Theorem 5. Now, conditioning again on the event $M<2\gamma T$ , the entire output $(S_{1},\ldots,S_{T})$ is $(\varepsilon,\delta)$ -DP since this corresponds to post-processing over the previous output by extending the sets to exploitation time-steps. We know that $M\geq 2\gamma T$ occurs w.p. $\leq e^{-8\gamma^{2}T}$ . Thus, for any $S$ we have

	$\displaystyle\operatorname{\Pr}((\mathcal{E}_{k},\ldots,\mathcal{E}_{1})(F)\in S)$
	$\displaystyle\leq\operatorname{\Pr}((\mathcal{E}_{k},\ldots,\mathcal{E}_{1})(F)\in S\mid M<2\gamma T)\operatorname{\Pr}(M<2\gamma T)+e^{-8\gamma^{2}T}$
	$\displaystyle\leq e^{\varepsilon}\operatorname{\Pr}((\mathcal{E}_{k},\ldots,\mathcal{E}_{1})(F^{\prime})\in S\mid M<2\gamma T)\operatorname{\Pr}(M<2\gamma T)+\delta+e^{-8\gamma^{2}T}$
	$\displaystyle\leq e^{\varepsilon}\operatorname{\Pr}((\mathcal{E}_{k},\ldots,\mathcal{E}_{1})(F^{\prime})\in S)+\delta+e^{-8\gamma^{2}T}.$

The result now follows by plugging in the value of $\gamma$ used in Algorithm 3.

Proof of Theorem 8

Theorem 8 requires the following two lemmas, proved respectively in Appendices A.3 and A.4. The first lemma says that the $(1-1/e)$ -regret experienced by the learner is bounded by the regret experienced by the expert and an additional error introduced during the exploration times. The second lemma bounds the regret experienced by the experts under the biased estimator.

Lemma 2.

If $r_{i}$ denotes the regret experience by expert $\mathcal{E}_{i}$ in Algorithm 3, then

\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t})\right]\leq\sum_{i=1}^{k}\operatorname{\mathbb{E}}[r_{i}]+\gamma T.

Lemma 3.

If each $\mathcal{E}_{i}$ is a Hedge algorithm with learning rate $\eta=\frac{\varepsilon}{k\sqrt{32(2\gamma T)\log(k/\delta)}}$ , then

\operatorname{\mathbb{E}}[r_{i}]\leq 16\frac{k^{2}|U|\log|U|\sqrt{T\log(k/\delta)}}{\varepsilon\sqrt{\gamma}}+\frac{k|U|}{\gamma}T\cdot e^{-8\gamma^{2}T}.

Using these two results with $\gamma=k\left(\frac{(16|U|\log|U|)^{2}}{T}\right)^{1/3}$ :

	$\displaystyle\operatorname{\mathbb{E}}\left[\mathcal{R}_{T}\right]$	$\displaystyle\leq k\left(16\frac{k^{2}\|U\|\log\|U\|\sqrt{T\log(k/\delta)}}{\varepsilon\sqrt{\gamma}}\right)+\frac{k\|U\|}{\gamma}T\cdot e^{-8\gamma^{2}T}+\gamma T$
		$\displaystyle=\left(16\frac{k^{3}\|U\|\log\|U\|\sqrt{\log k/\delta}}{\varepsilon}\sqrt{\frac{T}{\gamma}}+\gamma T\right)+\frac{k\|U\|}{\gamma}T\cdot e^{-8\gamma^{2}T}$
		$\displaystyle\leq 32\frac{\sqrt{\log k/\delta}}{\varepsilon}(k(\|U\|\log\|U\|)^{1/3})^{2}T^{2/3}+\frac{\|U\|^{1/3}T^{4/3}}{(16\log\|U\|)^{2/3}}e^{-8k^{2}(16\|U\|\log\|U\|)^{4/3}T^{1/3}}.$

5 Extension to Continuous Functions

We sketch an extension of our methodology for (continuous) DR-submodular functions (Hassani et al.,, 2017; Niazadeh et al.,, 2018). Further details can be found in Appendix C.

Let $\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}$ , where each $\mathcal{X}_{i}$ is a closed convex set in $\mathbb{R}$ . A function $f:\mathcal{X}\to\mathbb{R}_{+}$ is called DR-submodular if $f$ is differentiable and $\nabla f(\mathbf{x})\geq\nabla f(\mathbf{y})$ for all $\mathbf{x}\leq\mathbf{y}$ . DR-submodular functions do not fit completely in the context of convex functions. For instance, the multilinear extension of a submodular function (Calinescu et al.,, 2011) is DR-submodular. The function $f$ is said to be $\beta$ -smooth if $\|\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\|_{2}\leq\beta\|\mathbf{x}-\mathbf{y}\|_{2}$ . In the online learning DR-submodular maximization problem, at each time-step $t=1,\ldots,T$ , a $\beta$ -smooth DR-submodular function $f_{t}:\mathcal{X}\to[0,1]$ arrives and, without observing the function, the learner selects a point $\mathbf{x}_{t}\in\mathcal{X}$ learned using $f_{1},\ldots,f_{t-1}$ . She gets the value $f_{t}(\mathbf{x}_{t})$ and also oracle access to $\nabla f_{t}$ . The learner’s goal is to minimize the $(1-1/e)$ -regret $\mathcal{R}_{T}=\left(1-\frac{1}{e}\right)\max_{\mathbf{x}\in\mathcal{P}}\sum_{t=1}^{T}f_{t}(\mathbf{x})-\sum_{t=1}^{t}f_{t}(\mathbf{x}_{t})$ .

Online DR-submodular problems have been extensively studied in the full information setting—see for instance (Chen et al., 2018b, ; Chen et al., 2018a, ; Niazadeh et al.,, 2018). Similarly to the discrete submodular case, most of these methods implement $K$ ordered algorithms $\mathcal{E}_{0},\ldots,\mathcal{E}_{K-1}$ for optimizing linear functions over $\mathcal{X}$ . Algorithm $\mathcal{E}_{k}$ computes a direction of maximum increment from a point given by the algorithms $\mathcal{E}_{k-1},\ldots,\mathcal{E}_{0}$ . The learner averages these directions to obtain a new point to play in the region $\mathcal{X}$ . This is the continuous version of the Hedge approach.

We show in Algorithm 4 and Theorem 9 that a simple modification transforms the continuous method of Chen et al., 2018b into a differentially private one. For this, we utilize the Private Follow the Approximate Leader (PFTAL) framework of Thakurta and Smith, (2013) as a black-box. PFTAL is an online convex optimization algorithm for minimizing $L$ -Lipschitz convex functions over a compact convex region $\mathcal{X}$ . In few words, their algorithm guarantees $(\varepsilon,0)$ -DP and achieves an expected regret $\mathcal{O}\left(\frac{L^{2}\sqrt{nT\log^{2.5}T}}{\varepsilon}\right)$ .

Let

K=\sqrt{\frac{\sqrt{T}}{\log^{2.5}T}}

. Initialize

\mathcal{E}_{0},\ldots,\mathcal{E}_{K-1}

parallel copies of PFTALs with privacy parameter

\varepsilon^{\prime}=\varepsilon/K

for $t=1,\ldots,T$ do

for $k=0,\ldots,K-1$ do

Let

\mathbf{v}_{t}^{k}

be vector found using

\mathcal{E}_{k}

Let

\mathbf{x}_{t}=\frac{1}{K}\sum_{k=0}^{K-1}\mathbf{v}_{t}^{k}

Play

\mathbf{x}_{t}

, receive

f_{t}(\mathbf{x}_{t})

and access to

\nabla f_{t}

Feed back each

\mathcal{E}_{k}

with the linear obsective

\ell_{k}(\mathbf{v})=\nabla f_{t}(\mathbf{x}_{t}^{k})^{\top}\mathbf{v}

where

\mathbf{x}_{t}^{k}=\frac{1}{K}\sum_{i=0}^{k-1}\mathbf{v}_{t}^{i}

Algorithm 4

(F=\{f_{t}\}_{t=1}^{T},\varepsilon)

Theorem 9 (Informal).

Algorithm 4 is $(\varepsilon,0)$ -DP with expected $(1-1/e)$ -regret

\mathcal{O}\left(\frac{T^{3/4}\sqrt{\log^{2.5}T}}{\varepsilon}\right).

The big $\mathcal{O}$ term hides dimension, bounds in gradient and diameter of $\mathcal{X}$ and only shows terms in $T$ and privacy parameter $\varepsilon$ . The proof appears in Appendix C.

References

Ahmed et al., (2012) Ahmed, A., Teo, C. H., Vishwanathan, S., and Smola, A. (2012). Fair and balanced: Learning to present news stories. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining, WSDM ’12, pages 333–342.
Alon and Spencer, (2004) Alon, N. and Spencer, J. H. (2004). The Probabilistic Method. John Wiley & Sons.
Bach, (2013) Bach, F. (2013). Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, 6(2-3):145–373.
Badanidiyuru et al., (2014) Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., and Krause, A. (2014). Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 671–680.
Calinescu et al., (2011) Calinescu, G., Chekuri, C., Pál, M., and Vondrák, J. (2011). Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766.
Cardoso and Cummings, (2019) Cardoso, A. R. and Cummings, R. (2019). Differentially private online submodular minimization. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, AISTATS ’19, pages 1650–1658.
Cesa-Bianchi and Lugosi, (2006) Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press.
Chapelle and Li, (2011) Chapelle, O. and Li, L. (2011). An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems, NIPS ’11, pages 2249–2257.
Chatterjee et al., (2003) Chatterjee, P., Hoffman, D. L., and Novak, T. P. (2003). Modeling the clickstream: Implications for web-based advertising efforts. Marketing Science, 22(4):520–541.
(10) Chen, L., Harshaw, C., Hassani, H., and Karbasi, A. (2018a). Projection-free online optimization with stochastic gradient: From convexity to submodularity. arXiv preprint 1802.08183.
(11) Chen, L., Hassani, H., and Karbasi, A. (2018b). Online continuous submodular maximization. arXiv preprint 1802.06052.
Dwork et al., (2006) Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography, TCC ’06, pages 265–284.
(13) Dwork, C., Naor, M., Pitassi, T., and Rothblum, G. N. (2010a). Differential privacy under continual observation. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC ’10, pages 715–724.
Dwork and Roth, (2014) Dwork, C. and Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4):211–407.
(15) Dwork, C., Rothblum, G. N., and Vadhan, S. (2010b). Boosting and differential privacy. In Proceedings of the 51st IEEE Annual Symposium on Foundations of Computer Science, FOCS ’10, pages 51–60.
Feige, (1998) Feige, U. (1998). A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634–652.
Fisher et al., (1978) Fisher, M. L., Nemhauser, G. L., and Wolsey, L. A. (1978). An analysis of approximations for maximizing submodular set functions—II. In Polyhedral Combinatorics, pages 73–87. Springer.
Freund and Schapire, (1997) Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139.
Gupta et al., (2010) Gupta, A., Ligett, K., McSherry, F., Roth, A., and Talwar, K. (2010). Differentially private combinatorial optimization. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’10, pages 1106–1125.
Hassani et al., (2017) Hassani, H., Soltanolkotabi, M., and Karbasi, A. (2017). Gradient methods for submodular maximization. In Advances in Neural Information Processing Systems, NIPS ’17, pages 5841–5851.
Hazan, (2016) Hazan, E. (2016). Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3–4):157–325.
Jain et al., (2012) Jain, P., Kothari, P., and Thakurta, A. (2012). Differentially private online learning. In Proceedings of the 25th Conference on Learning Theory, COLT ’12, pages 24.1–24.34.
Krause and Golovin, (2014) Krause, A. and Golovin, D. (2014). Submodular function maximization, pages 71–104. Cambridge University Press.
McSherry and Talwar, (2007) McSherry, F. and Talwar, K. (2007). Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’07, pages 94–103.
Mirrokni et al., (2008) Mirrokni, V., Schapira, M., and Vondrák, J. (2008). Tight information-theoretic lower bounds for welfare maximization in combinatorial auctions. In Proceedings of the 9th ACM Conference on Electronic Commerce, EC ’08, pages 70–77.
Mitrovic et al., (2017) Mitrovic, M., Bun, M., Krause, A., and Karbasi, A. (2017). Differentially private submodular maximization: Data summarization in disguise. In Proceedings of the 34th International Conference on Machine Learning, ICML ’17, pages 2478–2487.
Niazadeh et al., (2018) Niazadeh, R., Roughgarden, T., and Wang, J. (2018). Optimal algorithms for continuous non-monotone submodular and DR-submodular maximization. In Advances in Neural Information Processing Systems, NIPS ’18, pages 9594–9604.
Rafiey and Yoshida, (2020) Rafiey, A. and Yoshida, Y. (2020). Fast and private submodular and $k$ -submodular functions maximization with matroid constraints. arXiv preprint 2006.15744.
Schrijver, (2003) Schrijver, A. (2003). Combinatorial optimization: Polyhedra and efficiency, volume 24. Springer Science & Business Media.
Shalev-Shwartz, (2012) Shalev-Shwartz, S. (2012). Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194.
Streeter and Golovin, (2009) Streeter, M. and Golovin, D. (2009). An online algorithm for maximizing submodular functions. In Advances in Neural Information Processing Systems, NIPS ’09, pages 1577–1584.
Streeter et al., (2009) Streeter, M., Golovin, D., and Krause, A. (2009). Online learning of assignments. In Advances in Neural Information Processing Systems, NIPS ’09, pages 1794–1802.
Tang et al., (2014) Tang, L., Jiang, Y., Li, L., and Li, T. (2014). Ensemble contextual bandits for personalized recommendation. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14, pages 73–80.
Thakurta and Smith, (2013) Thakurta, A. G. and Smith, A. (2013). (Nearly) optimal algorithms for private online learning in full-information and bandit settings. In Advances in Neural Information Processing Systems, NIPS ’13, pages 2733–2741.
Williamson and Shmoys, (2011) Williamson, D. P. and Shmoys, D. B. (2011). The design of approximation algorithms. Cambridge University Press.
Zhang et al., (2014) Zhang, B., Wang, N., and Jin, H. (2014). Privacy concerns in online recommender systems: Influences of control and user data input. In Proceedings of the 10th Symposium on Usable Privacy and Security, SOUPS ’14, pages 159–173.
Zhang et al., (2020) Zhang, M., Shen, Z., Mokhtari, A., Hassani, H., and Karbasi, A. (2020). One sample stochastic Frank-Wolfe. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, AISTATS ’20, pages 4012–4023.
Zinkevich, (2003) Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, ICML ’03, pages 928–936.

Appendix A Appendix

A.1 Proof of Lemma 1

See 1

Proof.

We prove the lemma by induction on $i$ . The base case of $i=1$ follows from Proposition 2. For the inductive step, assume the result is true for some $i\geq 1$ , and we now prove that it also holds for $i+1$ . That is, we aim to show that $(\mathcal{E}_{i+1},\ldots,\mathcal{E}_{1}):\mathcal{F}^{T}\to U^{T}\times\cdots\times U^{T}$ is $((i+1)\varepsilon^{\prime},(i+1)\delta^{\prime})$ -private, where $\varepsilon^{\prime}=\varepsilon/k$ and $\delta^{\prime}=\delta/k$ . Let $\wedge$ denote a maximum and recall that $\mathcal{M}^{S^{i}}$ is the behavior of the $i$ -th expert across all $T$ rounds.

Consider the neighboring databases $F$ and $F^{\prime}$ . Pick any set $S\subseteq U$ and a fixed $S^{i}=(a^{i},\ldots,a^{1})\in(U^{T})^{i}$ , then

	$\displaystyle\operatorname{\Pr}(\mathcal{E}_{i+1}(F)\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})$
	$\displaystyle=\operatorname{\Pr}(\mathcal{M}^{S^{i}}(F)\in S)$
	$\displaystyle\leq(e^{\varepsilon^{\prime}}\operatorname{\Pr}(\mathcal{M}^{S^{i}}(F^{\prime})\in S))\wedge 1+\delta^{\prime}$		( $(\varepsilon^{\prime},\delta^{\prime})$ -DP of $\mathcal{M}^{S^{i}}$ )
	$\displaystyle=(e^{\varepsilon^{\prime}}\operatorname{\Pr}(\mathcal{E}_{i+1}(F^{\prime})\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i}))\wedge 1+\delta^{\prime}.$

This is true as long as $(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i}$ and $(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i}$ are non-zero probability events, which is ensured to be true since the Hedge algorithm places positive probability on all events.

We can write

\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})=e^{i\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i})+\mu(S^{i}),

where $\mu(S^{i})=\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})-e^{i\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i})$ . We have $\mu(\mathcal{S})\leq i\delta^{\prime}$ for any $\mathcal{S}\subseteq(U^{T})^{i}$ since $(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})$ is $(i\varepsilon^{\prime},i\delta^{\prime})$ -DP by the inductive hypothesis.

Now, consider any set $\mathcal{S}\subseteq(U^{T})^{i+1}$ . Then,

	$\displaystyle\operatorname{\Pr}((\mathcal{E}_{i+1},\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)\in S)$
	$\displaystyle=\sum_{S^{i}\in\mathcal{S}^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},S^{i})(F)\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})$
	$\displaystyle\leq\sum_{S^{i}\in\mathcal{S}^{\prime}}\left((e^{\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},S^{i})(F^{\prime})\in S\mid\mathcal{E}_{1}(F^{\prime})=a^{1}))\wedge 1+\delta^{\prime}\right)\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})$
	$\displaystyle\leq\sum_{S^{i}\in\mathcal{S}^{\prime}}\left((e^{\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},S^{i})(F^{\prime})\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i}))\wedge 1\right)\left(e^{i\varepsilon^{\prime}}\operatorname{\Pr}({(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i}})+\mu(S^{i})\right)$
	$\displaystyle\qquad+\sum_{S^{i}\in\mathcal{S}^{\prime}}\delta^{\prime}\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F)=S^{i})$
	$\displaystyle\leq e^{(i+1)\varepsilon^{\prime}}\sum_{S^{i}\in\mathcal{S}^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},S^{i})(F^{\prime})\in S\mid(\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i})\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})=S^{i})+\mu(\mathcal{S}_{+}^{\prime})+\delta^{\prime}$
	$\displaystyle\leq e^{(i+1)\varepsilon^{\prime}}\operatorname{\Pr}((\mathcal{E}_{i+1},\mathcal{E}_{i},\ldots,\mathcal{E}_{1})(F^{\prime})\in S)+(i+1)\delta^{\prime}$

where $\mathcal{S}^{\prime}=\{S^{i}\in(U^{T})^{i}:(a^{i+1},S^{i})\in\mathcal{S}\text{ for some }a^{i}\in U^{T}\}$ and $\mathcal{S}_{+}^{\prime}$ are the elements $S^{i}\in\mathcal{S}^{\prime}$ such that $\mu(\mathcal{S}^{\prime})\geq 0$ . This concludes the proof.

$\Box$

A.2 Proof of Proposition 3

See 3

Proof.

Fix the choices $S_{1},\ldots,S_{T}$ of the experts arbitrarily, and let $r_{i}$ the overall regret experience by $\mathcal{E}_{i}$ . That is,

\displaystyle r_{i}=\max_{a\in U}

\displaystyle\sum_{t=1}^{T}f_{t}(S_{t}^{i-1}+a)-f_{t}(S_{t}^{i-1})-\sum_{t=1}^{T}f_{t}(S_{t}^{i-1}+a_{t}^{i})-f_{t}(S_{t}^{i-1}).

Define the new function $F:2^{[T]\times U}\to\mathbb{R}$ as

F(A)=\frac{1}{T}\sum_{t=1}^{T}f_{t}(A_{t}),

where $A_{t}=\{x\in U:(t,x)\in A\}$ . Clearly, $F$ is submodular, nondecreasing and $F(\emptyset)=0$ . Then,

\frac{r_{i}}{T}=\max_{a\in U}F(\widetilde{S}^{i-1}+\widetilde{a})-F(\widetilde{S}^{i-1})-(F(\widetilde{S}^{i})-F(\widetilde{S}^{i-1})),

where $\widetilde{S}^{i}=\bigcup_{t=1}^{T}\{t\}\times S^{i}$ .

Let $OPT\subseteq U$ be the optimal solution of $\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)$ and consider its extension to $[T]\times U$ , i.e., $\widetilde{OPT}=\bigcup_{t=1}^{T}\{t\}\times OPT$ .

Claim A.1.

For any $i=1,\ldots,k$ , $\max_{a\in U}F(\widetilde{S}^{i-1}+\widetilde{a})-F(\widetilde{S}^{i-1})\geq\frac{F(\widetilde{OPT})-F(S^{i-1})}{k}$ .

Proof of Claim A.1.

	$\displaystyle F(\widetilde{OPT})-F(\widetilde{S}^{i-1})$
	$\displaystyle\leq F(\widetilde{S}^{i-1}+\widetilde{OPT})-F(\widetilde{S}^{i-1})$
	$\displaystyle\leq\sum_{\widetilde{a}\in\widetilde{OPT}\setminus\widetilde{S}^{i-1}}F(\widetilde{S}^{i-1}+\widetilde{a})-F(\widetilde{S}^{i-1})$
	$\displaystyle\leq k\cdot\left(\max_{a\in U}F(\widetilde{S}^{i-1}+\widetilde{a})-F(\widetilde{S}^{i-1})\right).$

$\Box$

Using this claim, we can see,

F(\widetilde{S}^{i})-F(\widetilde{S}^{i-1})\geq\frac{F(\widetilde{OPT})-F(\widetilde{S}^{i-1})}{k}-\frac{r_{i}}{T}.

Unrolling the recursion, we obtain

\sum_{t=1}^{T}f_{t}(S_{t})\geq\left(1-\frac{1}{e}\right)\sum_{t=1}^{T}f_{t}(OPT)-\sum_{i=1}^{k}r_{i}.

$\Box$

A.3 Proof of Lemma 2

See 2

Proof.

Observe that at exploration time-steps $\tau$ , i.e, when $b_{\tau}=1$ , Algorithm 3 plays a set of the form $S_{\tau}=S_{\tau}^{i-1}+a$ . Right after this, the algorithm samples a new set $S_{\tau+1}$ given by the Hedge algorithms and will keep playing this set until the next exploration time step.

For the sake of analysis, we introduce the following set. Let $t_{0}=0,t_{1},\ldots,t_{M}$ be the times when a new sample set for exploitation is obtained. Note that besides time $t_{0}$ , all times $t_{1},\ldots,t_{M}$ are exploration times. Now, let $S_{t}^{\prime}=S_{t_{i}}$ for $t=t_{i}+1,\ldots,t_{i+1}$ . Note that for times $b_{t}=0$ , then $S_{t}^{\prime}=S_{t}$ ; however, for times $b_{t}=1$ , then $S_{t}^{\prime}$ is not necessarily the same as $S_{t}=S_{t}^{i-1}+a$ . In other words, $S_{t}^{\prime}$ corresponds to the real full exploitation scheme. Now, as in the full information setting, we have

\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\sum_{t=1}^{T}f_{t}(S_{t}^{\prime})\leq\sum_{i=1}^{k}r_{i},

where $r_{i}=\max_{a\in U}\sum_{t=1}^{T}f_{t}^{i,a}-\sum_{t=1}^{T}f_{t}^{i,a_{t}^{i}}$ . Thus

	$\displaystyle\left(1-\frac{1}{e}\right)\max_{\|S\|\leq k}\sum_{t=1}^{T}f_{t}(S)-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t})\right]$
	$\displaystyle\leq\sum_{i=1}^{k}\operatorname{\mathbb{E}}[r_{i}]+\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t}^{\prime})-f_{t}(S_{t})\right]$
	$\displaystyle\leq\sum_{i=1}^{k}\operatorname{\mathbb{E}}[r_{i}]+\gamma T,$

since at the end, only the exploration times could contribute to the difference $f_{t}(S_{t}^{\prime})-f_{t}(S_{t})$ and those are $\gamma T$ in expectation. $\Box$

A.4 Proof of Lemma 3

See 3

Proof.

From the perspective of expert $\mathcal{E}_{i}$ , at every time-step $t$ , she sees the vector $\widehat{f}_{t}^{i}$ such that

\widehat{f}_{t}^{i,a}=f_{t}(S_{t}^{i-1}+a)\mathbf{1}_{\{\text{Explore at time $t$, pick }i,\text{ pick }a\}}

in its $a$ -th coordinate. Notice that this vector is $0$ if no exploration occurs at time $t$ . The expert $\mathcal{E}_{i}$ samples a new element in $U$ only after exploitation times. Observe that the feedback of $\mathcal{E}_{i}$ is independent of choices made by $\mathcal{E}_{i}$ . Indeed, this feedback depends only on the set $S_{t}^{i-1}$ constructed by $\mathcal{E}_{1},\ldots,\mathcal{E}_{i-1}$ and the decision of the learner to explore, which is independent of the learning task. Therefore, the sequence $\widehat{f}^{i}=(\widehat{f}_{1}^{i},\ldots,\widehat{f}_{T}^{i})$ could be considered oblivious for $\mathcal{E}_{i}$ and we can apply the guarantee of Hedge over $\widehat{f}_{i}$ . That is, for any $a\in U$ ,

\sum_{t=1}^{T}\widehat{f}_{t}^{i,a}-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}\widehat{f}_{t}^{i}\leq\eta\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}(\widehat{f}_{t}^{i})^{2}+\frac{\log|U|}{\eta},

where $\mathbf{x}_{t}\in\Delta(U)$ is the non-zero distribution used by expert $\mathcal{E}_{i}$ in the Hedge algorithm and $\Delta(U)=\{\mathbf{x}\in\mathbb{R}^{U}:\|\mathbf{x}\|_{1}=1,\mathbf{x}\geq 0\}$ is the probability simplex over elements in $U$ . Notice that exploitation times appear in the summation with $0$ contribution. This expression is not the same as the regret of $\mathcal{E}_{i}$ but we can relate these quantities as follows. Conditioned on $S_{1}^{i-1},\ldots,S_{T}^{i-1}$ we obtain,

\displaystyle\operatorname{\mathbb{E}}[\widehat{f}_{t}^{i,a}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}]=\frac{\gamma}{k|U|}f_{t}^{i,a}+\delta_{t},

where $f_{t}^{i,a}=f(S_{t}^{i-1}+a)-f(S_{t}^{i-1})$ and $\delta_{t}^{i}=\frac{\gamma}{k|U|}f(S_{t}^{i-1})$ . Notice that $S_{t}^{i-1},\ldots,S_{T}^{i-1}$ are independent of actions taken by $\mathcal{E}_{i}$ , so

\operatorname{\mathbb{E}}[\mathbf{x}_{t}^{\top}\widehat{f}_{t}^{i}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}]=\frac{\gamma}{k|U|}\operatorname{\mathbb{E}}[\mathbf{x}_{t}^{\top}f_{t}^{i}\mid S_{t}^{1},\ldots,S_{T}^{i-1}]+\delta_{t}

and

	$\displaystyle\operatorname{\mathbb{E}}[\mathbf{x}_{t}^{\top}(\widehat{f}_{t}^{i})^{2}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}]$	$\displaystyle=\operatorname{\mathbb{E}}\left[\sum_{a\in U}x_{t}(a)(\widehat{f}_{t}^{i,a})^{2}\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}\right]$
		$\displaystyle=\sum_{a\in U}\operatorname{\mathbb{E}}[x_{t}(a)\mid S_{1}^{i-1},\ldots,S_{T}^{i-1}]\frac{\gamma}{k\|U\|}f(S_{t}^{i-1}+a)^{2}$
		$\displaystyle\leq\frac{\gamma}{k\|U\|}.$

Let $M$ be the number of times Algorithm 3 decides to explore. That is, $M$ is distributed as the sum of $T$ Bernoulli random variables with parameter $\gamma$ . By concentration bounds,

\operatorname{\Pr}(M>2\gamma T)\leq e^{-8\gamma^{2}T}.

Now, let $t_{1},\ldots,t_{M}$ be the times the algorithm decides to explore and let $t_{0}=0$ . For $i=1,\ldots,M$ , we can assume that expert $\mathcal{E}_{i}$ releases the same vector $\mathbf{x}_{t}\in\Delta_{U}$ during the time interval $[t_{i-1},t_{i})$ since she does not get any feedback during those times. If we consider $\eta=\frac{\varepsilon}{k\sqrt{32(2\gamma T)\log(k/\delta)}}$ , then for any $a\in U$ we have

	$\displaystyle\frac{\gamma}{k\|U\|}\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}^{i,a}-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}f_{t}^{i}\right]$	$\displaystyle=\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}\widehat{f}_{t}^{i,a}-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}\widehat{f}_{t}^{i}\right]$
		$\displaystyle\leq\left(\eta\sum_{t=1}^{T}\operatorname{\mathbb{E}}\left[\mathbf{x}_{t}^{\top}(\widehat{f}_{t}^{i})^{2}\right]+\frac{\log\|U\|}{\eta}\right)+T\cdot e^{-8\gamma^{2}T}$
		$\displaystyle\leq\left(\eta\frac{\gamma}{k\|U\|}T+\frac{\log\|U\|}{\eta}\right)+T\cdot e^{-8\gamma^{2}T}$

Therefore,

\operatorname{\mathbb{E}}[r_{i}]=\max_{a\in U}\sum_{t=1}^{T}f_{t}^{i,a}-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}f_{t}^{i}\right]\leq 16\frac{k^{2}|U|\log|U|\sqrt{T\log(k/\delta)}}{\varepsilon\sqrt{\gamma}}+\frac{k|U|}{\gamma}T\cdot e^{-8\gamma^{2}T}.

$\Box$

Appendix B Additional Results in Bandit Setting

B.1 $\mathcal{O}(T^{3/4})$ Regret Bound of Direct Approach in Bandit Setting

In the bandit setting, the direct approach for differential privacy corresponds to sampling a new set from the Hedge algorithms at each time step. As in the full-information setting, to ensure $(\varepsilon,\delta)$ -DP, a learning rate of $\eta=\frac{\varepsilon}{k\sqrt{32T\log(k/\delta)}}$ is enough.

Similar to Lemma 3, in this setting we have

\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t})\right]\leq\sum_{t=1}^{k}\operatorname{\mathbb{E}}[r_{i}]+\gamma T.

Since,

	$\displaystyle\operatorname{\mathbb{E}}[r_{i}]$	$\displaystyle\leq\frac{k\|U\|}{\gamma}\left(\eta\frac{\gamma}{k\|U\|}T+\frac{\log\|U\|}{\eta}\right)$
		$\displaystyle=\frac{k^{3}\|U\|\sqrt{32T\log(k\delta)}}{\varepsilon\gamma}+\frac{\varepsilon k\sqrt{T}}{\sqrt{32\log(k/\delta)}},$

then we have,

\displaystyle\left(1-\frac{1}{e}\right)\max_{|S|\leq k}\sum_{t=1}^{T}f_{t}(S)-\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}(S_{t})\right]

\displaystyle\leq\frac{k^{4}|U|\sqrt{32T\log(k\delta)}}{\varepsilon\gamma}+\frac{\varepsilon k^{2}\sqrt{T}}{\sqrt{32\log(k/\delta)}}+\gamma T.

This last bound is minimized when $\gamma=\Theta(T^{-1/4})$ which gives a $(1-1/e)$ -regret bound of $\mathcal{O}(T^{3/4})$ .

B.2 Trading Off Privacy $\delta$ -Term and Space

In this subsection, we show how to trade-off the $\delta$ -term $e^{-8T^{1/3}}$ by allowing additional space. For each $t\in T$ , select $t$ as an explore round independently with probability $\gamma$ . Let $M$ be the number of time-steps selected. Note that $\operatorname{\mathbb{E}}[M]=\gamma T$ . Now, run Algorithm 3 with $\eta=\frac{\varepsilon}{k\sqrt{32(M+1)\log(k/\delta)}}$ and force the algorithm to explore at the $M$ sampled time-steps and utilize the rest of the time-steps to exploit.

In this case, and following the proof of Lemma 3 we obtain:

$\displaystyle\operatorname{\mathbb{E}}[r_{i}]$	$\displaystyle\leq\frac{k\|U\|}{\gamma}\operatorname{\mathbb{E}}\left[\eta M+\frac{\log\|U\|}{\eta}\right]$
	$\displaystyle\leq\frac{k\|U\|}{\gamma}\operatorname{\mathbb{E}}\left[6\frac{k\log\|U\|\sqrt{\log(k/\delta)}}{\varepsilon}\sqrt{M+1}\right]$
	$\displaystyle\leq\frac{k\|U\|}{\gamma}\left(6\frac{k\log\|U\|\sqrt{\log(k/\delta)}}{\varepsilon}\sqrt{\operatorname{\mathbb{E}}[M]+1}\right)$	(Jensen’s inequality)
	$\displaystyle=8\frac{k^{2}\|U\|\log\|U\|\log(k/\delta)}{\varepsilon}\sqrt{\frac{T}{\gamma}}.$

Using Lemma 2 we obtain the $(1-1/e)$ -regret bound of

\displaystyle 8\frac{k^{3}|U|\log|U|\log(k/\delta)}{\varepsilon}\sqrt{\frac{T}{\gamma}}+\gamma T.

This is minimized at $\gamma=\Theta(1/T^{1/3})$ with a regret bound of $\mathcal{O}(T^{2/3})$ and expected space used $\Theta(T^{2/3})$ .

Appendix C Extension to Continuous Functions

In this section we prove Theorem 9. Before this, we present some preliminaries in online convex optimization.

In online convex optimization (OCO), there is compact convex set $\mathcal{X}\subseteq\mathbb{R}^{n}$ where the learner makes decisions. At time-step $t$ , a convex function $f_{t}:\mathcal{X}\to\mathbb{R}$ arrives. Without observing this function, the learner has to select a point $\mathbf{x}_{t}\in\mathcal{X}$ based on previous functions $f_{1},\ldots,f_{t-1}$ . After the decision has been made, the learner receives the cost $f_{t}(\mathbf{x}_{t})$ and gains oracle access to $\nabla f_{t}$ . The learner’s objective is to minimize the regret:

\mathcal{R}_{T}=\sum_{t=1}^{T}f_{t}(\mathbf{x}_{t})-\min_{\mathbf{x}\in\mathcal{X}}\sum_{t=1}^{T}f_{t}(\mathbf{x}).

Thakurta and Smith, (2013) introduced PFTAL (Private Follow the Approximate Leader) to privately solve the OCO problem.

Theorem 10 (Thakurta and Smith, (2013)).

PFTAL is $(\varepsilon,0)$ -DP and for any input stream of convex and $L$ -Lipschitz functions $f_{1},\ldots,f_{T}$ has expected regret

\textstyle\operatorname{\mathbb{E}}\left[\mathcal{R}_{T}\right]\leq\mathcal{O}\left(\frac{\sqrt{n\log^{2.5}T}\left(L+\sqrt{\frac{n\log^{2.5}T}{\varepsilon T}}\operatorname{diam}\mathcal{X}\right)^{2}}{\varepsilon}\sqrt{T}\right).

Similar to the Hedge algorithm, we utilize PFTAL as a black-box in Algorithm 4.

Now, we present the proof of Theorem 9 in two parts, and prove each separately.

Lemma 4 (Privacy guarantee).

Algorithm 4 is $(\varepsilon,0)$ -DP.

Lemma 5 (Regret guarantee).

Let $R=\sup_{\mathbf{x}\in\mathcal{X}}\|x\|_{2}$ , $G$ be a bound on the gradients $\|\nabla f_{t}(\mathbf{x}_{t})\|_{2}$ , and $\beta$ be the smoothness parameter of $f_{1},\ldots,f_{T}$ . Then Algorithm 4 has $(1-1/e)$ -regret

\textstyle\operatorname{\mathbb{E}}\left[\left(1-\frac{1}{e}\right)\max_{\mathbf{x}\in\mathcal{X}}\sum_{t=1}^{T}f_{t}(\mathbf{x})-\sum_{t=1}^{T}f_{t}(\mathbf{x}_{t})\right]=\mathcal{O}\left(T^{3/4}\sqrt{\log^{2.5}T}\left(\frac{\sqrt{n}\left(G+\sqrt{\frac{n}{\varepsilon T^{3/4}}}\log^{2.5}T\operatorname{diam}\mathcal{X}\right)^{2}}{\varepsilon}+\beta R^{2}\right)\right).

Proof of Lemma 4

As with the analysis of Algorithm 2, we show that $(\mathcal{E}_{K-1},\ldots,\mathcal{E}_{0})$ is $(\varepsilon,0)$ -DP. If each $\mathcal{E}_{k}$ were $(\varepsilon/K,0)$ -DP, then the result would immediately follow by simple composition. However, we cannot guarantee that each $\mathcal{E}_{k}$ is $(\varepsilon/K,0)$ -DP since $\mathcal{E}_{k}$ obtains as input the privatized output from $\mathcal{E}_{0},\ldots,\mathcal{E}_{k-1}$ in the linear function $\ell_{k}(\mathbf{v})=\nabla f_{t}(\mathbf{x}_{t}^{k})^{\top}\mathbf{v}$ , where $\mathbf{x}_{t}^{k}$ is computed by $\mathcal{E}_{0},\ldots,\mathcal{E}_{k-1}$ , while at the same time is accessing again the function $f_{t}$ (and so the database) via this linear function in the gradient $\nabla f_{t}$ . This clearly breaks the privacy that could have been gained via a simple post-processing argument and therefore and alternative method is needed.

We do not show that each $\mathcal{E}_{k}$ is $(\varepsilon/K,0)$ -DP but the group $(\mathcal{E}_{K-1},\ldots,\mathcal{E}_{0})$ is $(\varepsilon,0)$ -DP. The proof of the following lemma follows the same steps as the proof of Lemma 1. The proof is slightly simpler since there is no $\delta$ -privacy term included but it requires some care since the distributions are continuous in this case.

Lemma 6.

For any $i\geq 1$ , the group $(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0}):\mathcal{F}^{T}\to(\mathcal{X}^{T})^{i}$ is $i\varepsilon/K$ -DP.

Proof.

We proceed by induction in $i$ . The base case $i=1$ follows immediately from privacy of PFTAL in Thakurta and Smith, (2013) because $\mathcal{E}_{0}$ is the only algorithm that has not its distribution perturbed by any other algorithm. For the inductive step, assume the result is true for some $i\geq 1$ and let us prove it for $i+1$ .

Let $\mathbf{x}_{0}^{T},\ldots,\mathbf{x}_{i-1}^{T}\in\mathcal{X}^{T}$ and $\mathbf{X}_{i-1}=(\mathbf{x}_{i-1}^{T},\ldots,\mathbf{x}_{1}^{T})$ . Then, for any $\mathbf{x}_{i}^{T}\in\mathcal{X}^{T}$ we have

\displaystyle\operatorname{\Pr}(\mathcal{E}_{i}(F)=\mathbf{x}_{i}^{T}\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F)=\mathbf{X}_{i-1})\leq e^{\varepsilon/K}\operatorname{\Pr}(\mathcal{E}_{i}(F^{\prime})=\mathbf{x}_{i}^{T}\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F^{\prime})=\mathbf{X}_{i-1})

by the guarantee of PFTAL. Note that we are referring to the PMF and not the CDF of the distribution. This is because PFTAL utilizes Gaussian noise. With this, for $\mathbf{X}_{i}=(\mathbf{x}_{i}^{T},\ldots,\mathbf{x}_{0}^{T})$ we have,

	$\displaystyle\operatorname{\Pr}((\mathcal{E}_{i},\ldots,\mathcal{E}_{0})(F)=\mathbf{X}_{i})$
	$\displaystyle=\operatorname{\Pr}(\mathcal{E}_{i}(F)=\mathbf{x}_{i}^{T}\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F)=\mathbf{X}_{i-1})\operatorname{\Pr}((\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F)=\mathbf{X}_{i-1})$
	$\displaystyle\leq e^{\varepsilon/K}\operatorname{\Pr}(\mathcal{E}_{i}(F^{\prime})=\mathbf{x}_{i}^{T}\mid(\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F^{\prime})=\mathbf{X}_{i-1})\cdot e^{i\varepsilon/K}\operatorname{\Pr}((\mathcal{E}_{i-1},\ldots,\mathcal{E}_{0})(F^{\prime})=\mathbf{X}_{i-1}),$

where we utilized induction and the previous inequality. This completes the proof. $\Box$

Proof of Lemma 5

Let $G=\sup_{\begin{subarray}{c}t=1,\ldots,T\\ \mathbf{x}\in\mathcal{X}\end{subarray}}\|\nabla f_{t}(\mathbf{x})\|_{2}$ . Let $r_{i}$ be the regret experienced by algorithm $\mathcal{E}_{i}$ in Algorithm 4.

The following result appears in the proof of Theorem 1 in Chen et al., 2018b .

Lemma 7 (Chen et al., 2018b ).

Assume $f_{t}$ is monotone DR-submodular and $\beta$ -smooth for every $t$ . Then Algorithm 4 ensures

\left(1-\frac{1}{e}\right)\max_{\mathbf{x}\in\mathcal{X}}\sum_{t=1}^{T}f_{t}(\mathbf{x})-\sum_{t=1}^{T}f_{t}(\mathbf{x}_{t})\leq\frac{1}{K}\sum_{i=0}^{K-1}r_{i}+\frac{\beta R^{2}T}{2K}.

where $R=\sup_{\mathbf{x}\in\mathcal{X}}\|\mathbf{x}\|_{2}$ and $r_{i}$ is the regret of algorithm $\mathcal{E}_{i}$ .

Using this result, we obtain

	$\displaystyle\operatorname{\mathbb{E}}\left[\left(1-\frac{1}{e}\right)\max_{\mathbf{x}\in\mathcal{X}}\sum_{t=1}^{T}f_{t}(\mathbf{x})-\sum_{t=1}^{T}f_{t}(\mathbf{x}_{t})\right]$	$\displaystyle\leq\frac{1}{K}\sum_{i=0}^{K-1}\operatorname{\mathbb{E}}[r_{i}]+\frac{\beta R^{2}}{2K}$
		$\displaystyle\leq\mathcal{O}\left(\frac{\sqrt{n\log^{2.5}T}\left(G+\sqrt{\frac{n\log^{2.5}T}{\varepsilon T/K}}\operatorname{diam}\mathcal{X}\right)^{2}}{\varepsilon/K}\sqrt{T}+\frac{\beta R^{2}T}{2K}\right).$

We can find the regret by setting $K=\left(\frac{T}{\log^{2.5}T}\right)^{1/4}$ .

	$\displaystyle\operatorname{\mathbb{E}}\left[\mathcal{R}_{T}\right]$	$\displaystyle\leq k\left(16\frac{k^{2}\|U\|\log\|U\|\sqrt{T\log(k/\delta)}}{\varepsilon\sqrt{\gamma}}\right)+\frac{k\|U\|}{\gamma}T\cdot e^{-8\gamma^{2}T}+\gamma T$
		$\displaystyle=\left(16\frac{k^{3}\|U\|\log\|U\|\sqrt{\log k/\delta}}{\varepsilon}\sqrt{\frac{T}{\gamma}}+\gamma T\right)+\frac{k\|U\|}{\gamma}T\cdot e^{-8\gamma^{2}T}$
		$\displaystyle\leq 32\frac{\sqrt{\log k/\delta}}{\varepsilon}(k(\|U\|\log\|U\|)^{1/3})^{2}T^{2/3}+\frac{\|U\|^{1/3}T^{4/3}}{(16\log\|U\|)^{2/3}}e^{-8k^{2}(16\|U\|\log\|U\|)^{4/3}T^{1/3}}.$

	$\displaystyle\frac{\gamma}{k\|U\|}\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}f_{t}^{i,a}-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}f_{t}^{i}\right]$	$\displaystyle=\operatorname{\mathbb{E}}\left[\sum_{t=1}^{T}\widehat{f}_{t}^{i,a}-\sum_{t=1}^{T}\mathbf{x}_{t}^{\top}\widehat{f}_{t}^{i}\right]$
		$\displaystyle\leq\left(\eta\sum_{t=1}^{T}\operatorname{\mathbb{E}}\left[\mathbf{x}_{t}^{\top}(\widehat{f}_{t}^{i})^{2}\right]+\frac{\log\|U\|}{\eta}\right)+T\cdot e^{-8\gamma^{2}T}$
		$\displaystyle\leq\left(\eta\frac{\gamma}{k\|U\|}T+\frac{\log\|U\|}{\eta}\right)+T\cdot e^{-8\gamma^{2}T}$

$\displaystyle\operatorname{\mathbb{E}}[r_{i}]$	$\displaystyle\leq\frac{k\|U\|}{\gamma}\operatorname{\mathbb{E}}\left[\eta M+\frac{\log\|U\|}{\eta}\right]$
	$\displaystyle\leq\frac{k\|U\|}{\gamma}\operatorname{\mathbb{E}}\left[6\frac{k\log\|U\|\sqrt{\log(k/\delta)}}{\varepsilon}\sqrt{M+1}\right]$
	$\displaystyle\leq\frac{k\|U\|}{\gamma}\left(6\frac{k\log\|U\|\sqrt{\log(k/\delta)}}{\varepsilon}\sqrt{\operatorname{\mathbb{E}}[M]+1}\right)$	(Jensen’s inequality)
	$\displaystyle=8\frac{k^{2}\|U\|\log\|U\|\log(k/\delta)}{\varepsilon}\sqrt{\frac{T}{\gamma}}.$

Differentially Private Online Submodular Maximization

Abstract

1 Introduction

Motivating Example.

1.1 Our Results

Theorem 1 (Informal).

Theorem 2 (Informal).

1.2 Related Work

2 Preliminaries

Definition 1 (Submodularity).

Theorem 3 (Freund and Schapire, (1997)).

Definition 2 (Differential Privacy (Dwork et al.,, 2006)).

Proposition 1 (Post-Processing (Dwork et al.,, 2006)).

Theorem 4 (Advanced Composition (Dwork et al., 2010b, )).

Proposition 2.

3 Full Information Setting

Theorem 5.

Theorem 6.

Proof of Theorem 5

Lemma 1.

Proof of Theorem 6

Proposition 3.

4 Bandit Setting

Theorem 7.

Theorem 8.

Proof of Theorem 7

Proof of Theorem 8

Lemma 2.

Lemma 3.

5 Extension to Continuous Functions

Theorem 9 (Informal).

References

Appendix A Appendix

A.1 Proof of Lemma 1

Proof.

A.2 Proof of Proposition 3

Proof.

Claim A.1.

Proof of Claim A.1.

A.3 Proof of Lemma 2

Proof.

A.4 Proof of Lemma 3

Proof.

Appendix B Additional Results in Bandit Setting

B.1 𝒪​(T3/4)\mathcal{O}(T^{3/4}) Regret Bound of Direct Approach in Bandit Setting

B.2 Trading Off Privacy δ\delta-Term and Space

Appendix C Extension to Continuous Functions

Theorem 10 (Thakurta and Smith, (2013)).

Lemma 4 (Privacy guarantee).

Lemma 5 (Regret guarantee).

Proof of Lemma 4

Lemma 6.

Proof.

Proof of Lemma 5

Lemma 7 (Chen et al., 2018b ).

B.1 $\mathcal{O}(T^{3/4})$ Regret Bound of Direct Approach in Bandit Setting

B.2 Trading Off Privacy $\delta$ -Term and Space