Differentially Private Online Submodular Maximization
Abstract
In this work we consider the problem of online submodular maximization under a cardinality constraint with differential privacy (DP). A stream of submodular functions over a common finite ground set arrives online, and at each time-step the decision maker must choose at most elements of before observing the function. The decision maker obtains a payoff equal to the function evaluated on the chosen set, and aims to learn a sequence of sets that achieves low expected regret.
In the full-information setting, we develop an -DP algorithm with expected -regret bound of . This algorithm contains ordered experts that learn the best marginal increments for each item over the whole time horizon while maintaining privacy of the functions. In the bandit setting, we provide an -DP algorithm with expected -regret bound of .
Our algorithms contains ordered experts that learn the best marginal item to select given the items chosen her predecessors, while maintaining privacy of the functions. One challenge for privacy in this setting is that the payoff and feedback of expert depends on the actions taken by her predecessors. This particular type of information leakage is not covered by post-processing, and new analysis is required. Our techniques for maintaining privacy with feedforward may be of independent interest.
1 Introduction
Ensuring users’ privacy has become a critical task in online learning algorithms. As an illustrative example, sponsored search engines aim to maximize the probability that displayed ads or products are clicked by incoming customers, but prospective customers do not want their privacy infringed after clicking on a product. Users visiting online retailer web-pages such as Amazon, Walmart or Target leave behind an abundance of sensitive personal information that can be use to predict their behaviors or preferences, potentially leading to catastrophic results (Zhang et al.,, 2014)333See also https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html. In this work, we introduce the first algorithms for privacy-preserving online monotone submodular maximization under a cardinality constraint.
A submodular set function exhibits diminishing returns, meaning that adding an element to a larger set creates less additional value than adding any subset of . (See Definition 1 in Section 2 for a formal definition.) Submodular functions have found widespread application in economics, computer science and operations research (see, e.g., Bach, (2013) and Krause and Golovin, (2014)), and have recently gained attention as a modeling tool for data summarization and ad display (Ahmed et al.,, 2012; Streeter et al.,, 2009; Badanidiyuru et al.,, 2014). We additionally consider monotone submodular functions, where adding elements to a set can only increase the value of . Since unconstrained monotone submodular maximization is trivial— can be maximized by choosing the entire universe —we consider cardinality constrained maximization, where the decision-maker solves: .
In the online learning setting, at each time-step a learner must choose a set of size at most and receives payoff for a monotone submodular function . Importantly, the learner does not know before she chooses , but this set can be chosen based on previous functions . Two types of informational feedback are commonly studied in the online learning literature. In the full-information setting, the learner gets full oracle access to the function after choosing , and thus is able to incorporate the entirety of previous functions into her future decisions. In the bandit setting, the learner only observes her own payoff as feedback.
Performance of an online learner is typically measured by the regret, which is the difference between the best fixed decision in hindsight and the cumulative payoff obtained by the learner (Zinkevich,, 2003; Hazan,, 2016; Shalev-Shwartz,, 2012). More precisely, the regret of a learner after rounds is: . The aim often is to design algorithms with sublinear regret, i.e., , so that the average payoff over time of the algorithm is comparable with the best average fixed profit in hindsight. Offline monotone submodular maximization under a cardinality constraint is NP-hard to approximate with a factor better than (Feige,, 1998; Mirrokni et al.,, 2008), so we instead measure the quality of our algorithms using the more restrictive notion of -regret (Streeter and Golovin,, 2009; Streeter et al.,, 2009):
(1) |
The privacy notion we consider in this work is differential privacy (Dwork et al.,, 2006), which enables accurate estimation of population-level statistics while ensuring little can be learned about the individuals in the database. Informally, a randomized algorithm is said to be differentially private if changing a single entry in the input database results in only a small distributional change in the outputs. (See Definition 2 in Section 2 for a formal definition.) This means that an adversary cannot information-theoretically infer whether or not a single individual participated in the database. Differentially private algorithms have been deployed by major organizations including Apple, Google, Microsoft, Uber, and the U.S. Census Bureau, and are seen as the gold standard in privacy-preserving data analysis. In this work, the input database to our learning algorithm consists of a stream of functions , and each individual’s data corresponds to a function . Our privacy guarantees ensure that the stream of chosen sets are differentially private with respect to this database of functions,
In both the full-information and bandit settings, we present differentially private online learning algorithms that achieve sublinear expected -regret.
Motivating Example.
While there are countless examples of practical online submodular maximization problems using sensitive data, we offer this motivating example for concreteness. Consider an online product display model where a website has display slots and wants to maximize the probability of any displayed product being clicked. Each customer has a (privately known) probability of clicking a display for product , independently of the other products displayed. Let denote the probability that customer clicks on any product in a display set . We can write this function in closed form as . Note that this function is submodular because adding products to the set exhibits diminishing returns in total click probability. Each customer’s click-probabilities contain sensitive information about his preferences or habits, and require formal privacy protections.
1.1 Our Results
Our main results are differentially private algorithms for online submodular maximization under a cardinality constraint. We provide algorithms that achieve sublinear expected -regret in both the full-information and bandit settings.
Our algorithms are based on the approach of Streeter and Golovin, (2009), who designed (non-private) online algorithms with low expected -regret for submodular maximization. We adapt and extend their techniques to additionally satisfy differential privacy. Following the spirit of Streeter and Golovin, (2009), our algorithms have ordered online learning algorithms, or experts, that together pick items at every time-step and learn from their decisions over time. Roughly speaking, expert learns how to choose an item that compliments the decisions of the previous experts. The expected -regret can be bounded by the regret of these experts, so to show a low -regret algorithm that preserves privacy, we simply need to find no-regret experts that together preserve privacy. Ideally, we would like each expert to be differentially private so that simple composition and post-processing arguments would yield overall privacy guarantees. Unfortunately this is not possible for because the choices of all previous experts alter the distribution of payoffs for expert .
Specifically, the -th expert non-privately queries the function (i.e., accesses the database) at points that depend on the action of the previous experts. A naive solution is to allow each expert to query the function at any of its values, and then privacy would be satisfied by post-processing on the differentially private outputs of previous experts. However, this larger domain size requires large quantities of noise that would harm the experts’ no-regret guarantees. Effectively, this decouples the advice of the experts, so that experts are not learning from each other. This naturally helps privacy but harms learning. Instead, we restrict each expert to a domain of size that is defined by the actions of previous experts. This ensures no-regret learning, but post-processing no longer ensures privacy. We overcome this challenge by showing that together the experts are differentially private and sufficiently low quantities of noise are needed.
Theorem 1 below is an informal version of our main results in the full-information setting (Theorems 5 and 6 in Section 3).
Theorem 1 (Informal).
In the full-information setting, Algorithm 2 for online monotone -cardinality-constrained submodular maximization is -differentially private and guarantees
In the bandit setting, each expert only receives its own payoff as feedback, and does not have oracle access to the entire function. For this setting, we modify the full-information algorithm by using a biased estimator of the marginal increments for other actions.
The algorithm also requires additional privacy considerations. The non-private approach of Streeter and Golovin, (2009) randomly decides in each round whether to explore or exploit. In exploit rounds, the experts sample a new set but play the current-optimal action, providing both learning and exploitation. Directly privatizing this algorithm incurs additional privacy loss from the exploit rounds, which leads to a weak bound of for the expected -regret, far from the best known . Instead, we have the experts sample new sets only after an exploration round has occurred. The choice to explore is data-independent, so privacy is maintained by post-processing. If the exact number and timing of explore rounds are known in advance, this results in an -DP algorithm. However, this approach requires space, which is not appealing in practical settings where is substantially larger than . Instead we allow explore-exploit decisions to be made online and obtain a high probability bound on the number of explore rounds based on the sampling parameter. At the expense of an exponentially small loss in the privacy parameter—resulting from the failure of the high probability bound—we obtain the asymptotically optimal expected -regret.
Theorem 2 is an informal version of our main results in the more challenging bandit feedback setting (Theorems 7 and 8 in Section 4).
Theorem 2 (Informal).
In the bandit feedback setting, Algorithm 3 for online monotone -cardinality-constrained submodular maximization is -differentially private and guarantees
The best known non-private expected -regret in the full-information setting is and in the bandit setting is (Streeter and Golovin,, 2009). Comparing our expected -regret bounds to these, we see that our bounds are asymptotically optimal in , and have slight gaps in terms of and . Typically, the dominating term is the time horizon with , so our results match the best expected -regret asymptotically in .
Additionally, we show that our algorithms can be extended to a continuous generalization of submodular functions, know as DR-submodular functions. We provide a differentially private online learning algorithm for DR-submodular maximization that achieves low expected regret. A brief overview of this extension is given in Section 5, with further details in the appendix.
1.2 Related Work
Online learning (Zinkevich,, 2003; Cesa-Bianchi and Lugosi,, 2006; Hazan,, 2016; Shalev-Shwartz,, 2012) has gained increasing attention for making decisions in dynamic environments when only partial information is available. Its applicability in ad placement (Chatterjee et al.,, 2003; Chapelle and Li,, 2011; Tang et al.,, 2014) has made this model attractive from a practical viewpoint.
Submodular optimization has been widely studied, due to the large number of important submodular functions, such as the cut of a graph, entropy of a set of random variables, and the rank of a matroid, to name only a few. For more applications see (Schrijver,, 2003; Williamson and Shmoys,, 2011; Bach,, 2013). While (unconstrained) submodular minimization can be solved with polynomial number of oracle calls (Schrijver,, 2003; Bach,, 2013), submodular maximization is known to be NP-hard for general submodular functions. For monotone submodular functions under cardinality constraint, it is impossible to find a polynomial time algorithm that achieves a fraction better than of the optimal solution unless P=NP (Feige,, 1998), and this approximation factor is achieved by the greedy algorithm (Fisher et al.,, 1978). For further results with more general constraints, we refer the reader to the survey (Krause and Golovin,, 2014). In the online setting, Streeter and Golovin, (2009) and Streeter et al., (2009) were the first to study online monotone submodular maximization, respectively with cardinality/knapsack constraints and partition matroid constraints. Recently, continuous submodularity, has gained attention in the continuous optimization community Hassani et al., (2017); Niazadeh et al., (2018); Zhang et al., (2020). See Chen et al., 2018a ; Chen et al., 2018b for online continuous submodular optimization algorithms.
Differential privacy (Dwork et al.,, 2006) has become the gold standard for individual privacy, and there as been a large literature developed of differentially private algorithms for a broad set of analysis tasks. See Dwork and Roth, (2014) for a textbook treatment. Due to privacy concerns in practical applications of online learning, there has been growing interest in implementing well-known methods—such as experts algorithms and gradient optimization methods–in a differentially private way. See for instance (Jain et al.,, 2012; Thakurta and Smith,, 2013).
Differential privacy and submodularity were first jointly considered in (Gupta et al.,, 2010). They studied the combinatorial public projects problem, where the objective function was a sum of monotone submodular functions, each representing an agent’s private valuation function, and a decision-maker must maximize this objective subject to a cardinality constraint. The authors designed an -DP algorithm using the Exponential Mechanism of (McSherry and Talwar,, 2007) as a private subroutine, and achieved a -approximation to the optimal non-private solution, plus an additional term. Later, Mitrovic et al., (2017) extended these results to monotone submodular functions in the cardinality, matroid and -system constraint cases. Their methods also used the Exponential Mechanism to ensure differential privacy. See also recent work by Rafiey and Yoshida, (2020).
In the online learning framework, Cardoso and Cummings, (2019) study online (unconstrained) differentially private submodular minimization. They use the Lovász extension of a set function as a convex proxy to apply known privacy tools that work in online convex optimization (Jain et al.,, 2012; Thakurta and Smith,, 2013). Since submodular minimization and maximization are fundamentally different technical problems, the techniques of Cardoso and Cummings, (2019) do not extend to our setting.
2 Preliminaries
In this section we review definitions and properties of submodular functions and differential privacy.
Definition 1 (Submodularity).
A function is submodular if it satisfies the following diminishing returns property: For all and ,444 Equivalently, is submodular if for all .
As is standard in the submodular maximization literature, we assume . In our motivating example, this means that if no items are shown to the incoming customer, then the probability of selecting an item is . We let denote the family of submodular functions with finite ground set . For the sake of simplicity, we will additionally assume that all functions take value in the interval . In this work, we additionally consider set functions that are monotone or non-decreasing, i.e., for all .
In the problem of online monotone submodular maximization under a cardinality constraint, a sequence of monotone submodular functions arrive in an online fashion. At every time-step , the decision maker has to choose a subset of size at most before observing . This decision must be based solely on previous observations. The decision maker receives a payoff and her goal is to minimize the -expected-regret , where as defined in Equation (1), and the randomness is over the algorithm’s choices.
A fundamental tool in our analysis is the Hedge algorithm (Algorithm 1) of Freund and Schapire, (1997) which chooses an action from a set based on past payoffs from each action. The algorithm takes as input a learning rate and a stream of linear functions , where the payoff of playing action at time is .
In our setting, the learner must select a set of at most items from the ground set . The learner does this by implementing ordered copies of the Hedge algorithm, each of which choses one item, so the action space for each instantiation is the ground set: . The -th copy of Hedge learns the item with the best marginal gain given the decisions made by the previous Hedge algorithms.
The Hedge algorithm exhibits the following guarantee, which is useful for analyzing its regret, as well as the regret of our algorithms which instantiate Hedge.
Theorem 3 (Freund and Schapire, (1997)).
For any , the distributions over constructed by Algorithm 1 satisfy
where is the vector with each coordinate squared.
For the privacy considerations of this work, we view the input database as the ordered input sequence of submodular functions and the algorithm’s output as the sequence of chosen sets . We say that two sequences of functions are neighboring if for at most one .
Definition 2 (Differential Privacy (Dwork et al.,, 2006)).
An online learning algorithm is -differentially private if for any neighboring function databases , and any event ,
Differential privacy is robust to post-processing, meaning that any function of a differentially private output maintains the same privacy guarantee.
Proposition 1 (Post-Processing (Dwork et al.,, 2006)).
Let be an -DP algorithm and let be an arbitrary function. Then, is also -DP.
Differentially private algorithms also compose, and the privacy guarantees degrade gracefully as addition DP computations are performed. This enables modular algorithm design using simple differentially private building blocks. Basic Composition (Dwork et al.,, 2006) says that can simply add up the privacy parameters used in an algorithm’s subroutines to get the overall privacy guarantee. The following Advanced Composition theorem provides even tighter bounds.
Theorem 4 (Advanced Composition (Dwork et al., 2010b, )).
Let each be -DP algorithms. Then, is -DP for and any .
Our algorithms rely on the Exponential Mechanism (EM) introduced by McSherry and Talwar, (2007). The EM takes in database , a finite action set , and a quality score , where assigns a numeric score to the quality of outputting on input database . The sensitivity of the quality score, denoted , is the maximum change in the value of across neighboring databases: . Given these inputs, the EM outputs with probability proportional to . The Exponential Mechanism is -DP (McSherry and Talwar,, 2007).
As noted by Jain et al., (2012) and Dwork et al., 2010a , the Hedge algorithm can be converted into a DP algorithm using advanced composition and EM.
Proposition 2.
If , then Hedge (Algorithm 1) is -DP.
3 Full Information Setting
In this section, we introduce our first algorithm for online submodular maximization under cardinality constraint. It is both differentially private and achieves the best known expected -regret in . For cardinality , the learner implements ordered copies of the Hedge algorithm. Each copy is in charge of learning the marginal gain that complements the choices of the previous Hedge algorithms. At time-step , each Hedge algorithm selects an element and the learner gathers these choices to play the corresponding set. When she obtains oracle access to the submodular function, for each , she constructs a vector with -th coordinate given by the marginal gain of adding to the choices made by the previous Hedge algorithms. Finally, she feeds back the vector to Hedge algorithm . A formal description of this procedure is presented in Algorithm 2.
To ensure differential privacy, it would be enough to show that each Hedge is -DP. Indeed, if the sequence constructed by each Hedge algorithm is -DP, then by Basic Composition and post-processing, the sequence is -DP, where . However, for , the output of expert depends on the choices made by algorithms . Moreover, algorithm by itself is again accessing the database , hence ruling out a post-processing argument. Despite this, we show that all experts together are -DP even though individually we cannot ensure they preserve -DP.
It is worth noting that the Hedge algorithms in Algorithm 2 can be replaced by any other no-regret DP method that selects items over , and the same proof structure would follow—although the regret bound would depend on the choice of no-regret algorithm. For instance, if we utilize the private experts method of(Thakurta and Smith,, 2013) instead of the Hedge algorithm, Algorithm 2 would be -DP with a regret bound of .
Theorem 5.
Algorithm 2 is -differentially private.
Theorem 6.
Algorithm 2 has -expected-regret
Proof of Theorem 5
The output of Algorithm 2 is the stream of sets . Before showing that this output preserves privacy, we deal with a simpler case from which we can deduce an inductive argument.
Note that receives as feedback the functions at each time step. By Proposition 2, we have that is -DP given that . On the other hand receives as feedback the functions at each time-step, where is computed by . Therefore, the output of depends uniquely on the choices of , hence, conditioning on these choices, should also be -DP. We generalize and formalize this in the next few paragraphs.
Consider the following family of algorithms: For let . For , let be the EM that outputs with probability proportional to . Each of these mechanisms is -DP by Proposition 2. Therefore, by Advanced Composition and our choice of , is -DP. Note that for we have
and the latter expression describes the output of an -DP algorithm. This formalizes the idea that is -DP if the choices of are fixed. We utilize this idea to show that together are -DP. This is formally presented in Lemma 1. The proof of this result (formally given in Appendix A.1) is an inductive argument that takes advantage of the DP guarantee of the mechanisms .
Lemma 1.
For any , the function which is the composition of the first Hedge algorithms is -DP.
Proof of Theorem 6
The key idea is to bound the -regret of Algorithm 2 by the regret incurred by the Hedge algorithms . We formalize this in Proposition 3 below. With this bound, we can utilize the regret bound of the Hedge algorithm and conclude the proof. The regret incurred by is
where .
Proposition 3.
The -regret of Algorithm 2 is bounded by the expected regret of .
While a full proof of Proposition 3 is deferred to in Appendix A.2, we describe the key idea here. To bound the -regret, we rewrite the regret via the function , , where as:
where . We show that , where is the extension of to . Upon unrolling this recursion, we obtain the result.
To finish the proof of Theorem 6 we need to bound the overall regret of all . Observe that once we have fixed , the feedback of expert is completely determined since the elements depend only on experts . Therefore, we have
by the Hedge regret guarantee. Integrating from to we get , and the result follows with our choice of .
4 Bandit Setting
In the bandit case, the algorithm only receives as feedback the value . Given this restricted information, the algorithm must trade-off exploration of the function with exploiting current knowledge. As in (Streeter and Golovin,, 2009), our algorithm controls this tradeoff using a parameter , and by randomly exploring in each time-step independently with probability .
The non-private approach of Streeter and Golovin, (2009) obtains expected -regret, and works as follows: In exploit rounds (prob. ), play the experts’ sampled choice and feed back to each . In explore rounds (prob. ), select and uniformly at random. Play set , observe feedback , give this value to , and feedback to the remaining experts.
As we show in Appendix B.1, directly privatizing this algorithm using the Hedge method from the full-information setting results in an expected -regret of , which is far from the optimal . The problem with this naive approach is that a new sample is obtained via the Hedge algorithms at every time-step, including exploit steps, so to ensure -DP, a learning rate of is required.
We improve upon this by calling the Hedge algorithm only after an exploration time-step has occurred, and new information is available. The learner continues playing this same set until the next exploration round, and privacy of these exploitation rounds follows from post-processing. This dramatically reduces the number of rounds that access the dataset, and reduces the overall amount of noise required for privacy.
If the exact number of exploration rounds were known, this could be plugged into the learning rate to achieve -DP. In the non-private setting, a doubling trick (see, e.g., Shalev-Shwartz, (2012)) can be employed to find the right learning rate by calling the algorithm multiple times, doubling and thus rescaling on each iteration. Unfortunately, this doubling trick does not work in the private setting due to the direct non-linear connection between the privacy parameter, the time horizon and the learning rate, as specified in Proposition 2. Instead we use concentration inequalities (Alon and Spencer,, 2004) to ensure that there are no more than exploration rounds, except with probability . With this, we can select a fixed learning rate and guarantee optimal expected -regret, and the cost of -DP.
We remark in Appendix B.2 that this additional loss in the term can be avoided by pre-sampling the exploration round, but this requires space, which may be unacceptable for large .
Algorithm 3 presents the space-efficient approach. Here is the vector with -th coordinate given by: .
Theorem 7.
Algorithm 3 is -DP.
Theorem 8.
Algorithm 3 has -regret
Proof of Theorem 7
Observe that the algorithm only releases new information right an exploration time-step. If are the exploration time-steps, with distributed as the sum of independent Bernoulli random variables with parameter , then conditioned on the event , we know that the outputs are -DP by Theorem 5. Now, conditioning again on the event , the entire output is -DP since this corresponds to post-processing over the previous output by extending the sets to exploitation time-steps. We know that occurs w.p. . Thus, for any we have
The result now follows by plugging in the value of used in Algorithm 3.
Proof of Theorem 8
Theorem 8 requires the following two lemmas, proved respectively in Appendices A.3 and A.4. The first lemma says that the -regret experienced by the learner is bounded by the regret experienced by the expert and an additional error introduced during the exploration times. The second lemma bounds the regret experienced by the experts under the biased estimator.
Lemma 2.
If denotes the regret experience by expert in Algorithm 3, then
Lemma 3.
If each is a Hedge algorithm with learning rate , then
Using these two results with :
5 Extension to Continuous Functions
We sketch an extension of our methodology for (continuous) DR-submodular functions (Hassani et al.,, 2017; Niazadeh et al.,, 2018). Further details can be found in Appendix C.
Let , where each is a closed convex set in . A function is called DR-submodular if is differentiable and for all . DR-submodular functions do not fit completely in the context of convex functions. For instance, the multilinear extension of a submodular function (Calinescu et al.,, 2011) is DR-submodular. The function is said to be -smooth if . In the online learning DR-submodular maximization problem, at each time-step , a -smooth DR-submodular function arrives and, without observing the function, the learner selects a point learned using . She gets the value and also oracle access to . The learner’s goal is to minimize the -regret .
Online DR-submodular problems have been extensively studied in the full information setting—see for instance (Chen et al., 2018b, ; Chen et al., 2018a, ; Niazadeh et al.,, 2018). Similarly to the discrete submodular case, most of these methods implement ordered algorithms for optimizing linear functions over . Algorithm computes a direction of maximum increment from a point given by the algorithms . The learner averages these directions to obtain a new point to play in the region . This is the continuous version of the Hedge approach.
We show in Algorithm 4 and Theorem 9 that a simple modification transforms the continuous method of Chen et al., 2018b into a differentially private one. For this, we utilize the Private Follow the Approximate Leader (PFTAL) framework of Thakurta and Smith, (2013) as a black-box. PFTAL is an online convex optimization algorithm for minimizing -Lipschitz convex functions over a compact convex region . In few words, their algorithm guarantees -DP and achieves an expected regret .
Theorem 9 (Informal).
Algorithm 4 is -DP with expected -regret
The big term hides dimension, bounds in gradient and diameter of and only shows terms in and privacy parameter . The proof appears in Appendix C.
References
- Ahmed et al., (2012) Ahmed, A., Teo, C. H., Vishwanathan, S., and Smola, A. (2012). Fair and balanced: Learning to present news stories. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining, WSDM ’12, pages 333–342.
- Alon and Spencer, (2004) Alon, N. and Spencer, J. H. (2004). The Probabilistic Method. John Wiley & Sons.
- Bach, (2013) Bach, F. (2013). Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, 6(2-3):145–373.
- Badanidiyuru et al., (2014) Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., and Krause, A. (2014). Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 671–680.
- Calinescu et al., (2011) Calinescu, G., Chekuri, C., Pál, M., and Vondrák, J. (2011). Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766.
- Cardoso and Cummings, (2019) Cardoso, A. R. and Cummings, R. (2019). Differentially private online submodular minimization. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, AISTATS ’19, pages 1650–1658.
- Cesa-Bianchi and Lugosi, (2006) Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press.
- Chapelle and Li, (2011) Chapelle, O. and Li, L. (2011). An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems, NIPS ’11, pages 2249–2257.
- Chatterjee et al., (2003) Chatterjee, P., Hoffman, D. L., and Novak, T. P. (2003). Modeling the clickstream: Implications for web-based advertising efforts. Marketing Science, 22(4):520–541.
- (10) Chen, L., Harshaw, C., Hassani, H., and Karbasi, A. (2018a). Projection-free online optimization with stochastic gradient: From convexity to submodularity. arXiv preprint 1802.08183.
- (11) Chen, L., Hassani, H., and Karbasi, A. (2018b). Online continuous submodular maximization. arXiv preprint 1802.06052.
- Dwork et al., (2006) Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography, TCC ’06, pages 265–284.
- (13) Dwork, C., Naor, M., Pitassi, T., and Rothblum, G. N. (2010a). Differential privacy under continual observation. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC ’10, pages 715–724.
- Dwork and Roth, (2014) Dwork, C. and Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4):211–407.
- (15) Dwork, C., Rothblum, G. N., and Vadhan, S. (2010b). Boosting and differential privacy. In Proceedings of the 51st IEEE Annual Symposium on Foundations of Computer Science, FOCS ’10, pages 51–60.
- Feige, (1998) Feige, U. (1998). A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634–652.
- Fisher et al., (1978) Fisher, M. L., Nemhauser, G. L., and Wolsey, L. A. (1978). An analysis of approximations for maximizing submodular set functions—II. In Polyhedral Combinatorics, pages 73–87. Springer.
- Freund and Schapire, (1997) Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139.
- Gupta et al., (2010) Gupta, A., Ligett, K., McSherry, F., Roth, A., and Talwar, K. (2010). Differentially private combinatorial optimization. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’10, pages 1106–1125.
- Hassani et al., (2017) Hassani, H., Soltanolkotabi, M., and Karbasi, A. (2017). Gradient methods for submodular maximization. In Advances in Neural Information Processing Systems, NIPS ’17, pages 5841–5851.
- Hazan, (2016) Hazan, E. (2016). Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3–4):157–325.
- Jain et al., (2012) Jain, P., Kothari, P., and Thakurta, A. (2012). Differentially private online learning. In Proceedings of the 25th Conference on Learning Theory, COLT ’12, pages 24.1–24.34.
- Krause and Golovin, (2014) Krause, A. and Golovin, D. (2014). Submodular function maximization, pages 71–104. Cambridge University Press.
- McSherry and Talwar, (2007) McSherry, F. and Talwar, K. (2007). Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’07, pages 94–103.
- Mirrokni et al., (2008) Mirrokni, V., Schapira, M., and Vondrák, J. (2008). Tight information-theoretic lower bounds for welfare maximization in combinatorial auctions. In Proceedings of the 9th ACM Conference on Electronic Commerce, EC ’08, pages 70–77.
- Mitrovic et al., (2017) Mitrovic, M., Bun, M., Krause, A., and Karbasi, A. (2017). Differentially private submodular maximization: Data summarization in disguise. In Proceedings of the 34th International Conference on Machine Learning, ICML ’17, pages 2478–2487.
- Niazadeh et al., (2018) Niazadeh, R., Roughgarden, T., and Wang, J. (2018). Optimal algorithms for continuous non-monotone submodular and DR-submodular maximization. In Advances in Neural Information Processing Systems, NIPS ’18, pages 9594–9604.
- Rafiey and Yoshida, (2020) Rafiey, A. and Yoshida, Y. (2020). Fast and private submodular and -submodular functions maximization with matroid constraints. arXiv preprint 2006.15744.
- Schrijver, (2003) Schrijver, A. (2003). Combinatorial optimization: Polyhedra and efficiency, volume 24. Springer Science & Business Media.
- Shalev-Shwartz, (2012) Shalev-Shwartz, S. (2012). Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194.
- Streeter and Golovin, (2009) Streeter, M. and Golovin, D. (2009). An online algorithm for maximizing submodular functions. In Advances in Neural Information Processing Systems, NIPS ’09, pages 1577–1584.
- Streeter et al., (2009) Streeter, M., Golovin, D., and Krause, A. (2009). Online learning of assignments. In Advances in Neural Information Processing Systems, NIPS ’09, pages 1794–1802.
- Tang et al., (2014) Tang, L., Jiang, Y., Li, L., and Li, T. (2014). Ensemble contextual bandits for personalized recommendation. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14, pages 73–80.
- Thakurta and Smith, (2013) Thakurta, A. G. and Smith, A. (2013). (Nearly) optimal algorithms for private online learning in full-information and bandit settings. In Advances in Neural Information Processing Systems, NIPS ’13, pages 2733–2741.
- Williamson and Shmoys, (2011) Williamson, D. P. and Shmoys, D. B. (2011). The design of approximation algorithms. Cambridge University Press.
- Zhang et al., (2014) Zhang, B., Wang, N., and Jin, H. (2014). Privacy concerns in online recommender systems: Influences of control and user data input. In Proceedings of the 10th Symposium on Usable Privacy and Security, SOUPS ’14, pages 159–173.
- Zhang et al., (2020) Zhang, M., Shen, Z., Mokhtari, A., Hassani, H., and Karbasi, A. (2020). One sample stochastic Frank-Wolfe. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, AISTATS ’20, pages 4012–4023.
- Zinkevich, (2003) Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, ICML ’03, pages 928–936.
Appendix A Appendix
A.1 Proof of Lemma 1
See 1
Proof.
We prove the lemma by induction on . The base case of follows from Proposition 2. For the inductive step, assume the result is true for some , and we now prove that it also holds for . That is, we aim to show that is -private, where and . Let denote a maximum and recall that is the behavior of the -th expert across all rounds.
Consider the neighboring databases and . Pick any set and a fixed , then
(-DP of ) | |||
This is true as long as and are non-zero probability events, which is ensured to be true since the Hedge algorithm places positive probability on all events.
We can write
where . We have for any since is -DP by the inductive hypothesis.
Now, consider any set . Then,
where and are the elements such that . This concludes the proof.
A.2 Proof of Proposition 3
See 3
Proof.
Fix the choices of the experts arbitrarily, and let the overall regret experience by . That is,
Define the new function as
where . Clearly, is submodular, nondecreasing and . Then,
where .
Let be the optimal solution of and consider its extension to , i.e., .
Claim A.1.
For any , .
Proof of Claim A.1.
Using this claim, we can see,
Unrolling the recursion, we obtain
A.3 Proof of Lemma 2
See 2
Proof.
Observe that at exploration time-steps , i.e, when , Algorithm 3 plays a set of the form . Right after this, the algorithm samples a new set given by the Hedge algorithms and will keep playing this set until the next exploration time step.
For the sake of analysis, we introduce the following set. Let be the times when a new sample set for exploitation is obtained. Note that besides time , all times are exploration times. Now, let for . Note that for times , then ; however, for times , then is not necessarily the same as . In other words, corresponds to the real full exploitation scheme. Now, as in the full information setting, we have
where . Thus
since at the end, only the exploration times could contribute to the difference and those are in expectation.
A.4 Proof of Lemma 3
See 3
Proof.
From the perspective of expert , at every time-step , she sees the vector such that
in its -th coordinate. Notice that this vector is if no exploration occurs at time . The expert samples a new element in only after exploitation times. Observe that the feedback of is independent of choices made by . Indeed, this feedback depends only on the set constructed by and the decision of the learner to explore, which is independent of the learning task. Therefore, the sequence could be considered oblivious for and we can apply the guarantee of Hedge over . That is, for any ,
where is the non-zero distribution used by expert in the Hedge algorithm and is the probability simplex over elements in . Notice that exploitation times appear in the summation with contribution. This expression is not the same as the regret of but we can relate these quantities as follows. Conditioned on we obtain,
where and . Notice that are independent of actions taken by , so
and
Let be the number of times Algorithm 3 decides to explore. That is, is distributed as the sum of Bernoulli random variables with parameter . By concentration bounds,
Now, let be the times the algorithm decides to explore and let . For , we can assume that expert releases the same vector during the time interval since she does not get any feedback during those times. If we consider , then for any we have
Therefore,
Appendix B Additional Results in Bandit Setting
B.1 Regret Bound of Direct Approach in Bandit Setting
In the bandit setting, the direct approach for differential privacy corresponds to sampling a new set from the Hedge algorithms at each time step. As in the full-information setting, to ensure -DP, a learning rate of is enough.
Similar to Lemma 3, in this setting we have
Since,
then we have,
This last bound is minimized when which gives a -regret bound of .
B.2 Trading Off Privacy -Term and Space
In this subsection, we show how to trade-off the -term by allowing additional space. For each , select as an explore round independently with probability . Let be the number of time-steps selected. Note that . Now, run Algorithm 3 with and force the algorithm to explore at the sampled time-steps and utilize the rest of the time-steps to exploit.
Appendix C Extension to Continuous Functions
In this section we prove Theorem 9. Before this, we present some preliminaries in online convex optimization.
In online convex optimization (OCO), there is compact convex set where the learner makes decisions. At time-step , a convex function arrives. Without observing this function, the learner has to select a point based on previous functions . After the decision has been made, the learner receives the cost and gains oracle access to . The learner’s objective is to minimize the regret:
Thakurta and Smith, (2013) introduced PFTAL (Private Follow the Approximate Leader) to privately solve the OCO problem.
Theorem 10 (Thakurta and Smith, (2013)).
PFTAL is -DP and for any input stream of convex and -Lipschitz functions has expected regret
Similar to the Hedge algorithm, we utilize PFTAL as a black-box in Algorithm 4.
Now, we present the proof of Theorem 9 in two parts, and prove each separately.
Lemma 4 (Privacy guarantee).
Algorithm 4 is -DP.
Lemma 5 (Regret guarantee).
Let , be a bound on the gradients , and be the smoothness parameter of . Then Algorithm 4 has -regret
Proof of Lemma 4
As with the analysis of Algorithm 2, we show that is -DP. If each were -DP, then the result would immediately follow by simple composition. However, we cannot guarantee that each is -DP since obtains as input the privatized output from in the linear function , where is computed by , while at the same time is accessing again the function (and so the database) via this linear function in the gradient . This clearly breaks the privacy that could have been gained via a simple post-processing argument and therefore and alternative method is needed.
We do not show that each is -DP but the group is -DP. The proof of the following lemma follows the same steps as the proof of Lemma 1. The proof is slightly simpler since there is no -privacy term included but it requires some care since the distributions are continuous in this case.
Lemma 6.
For any , the group is -DP.
Proof.
We proceed by induction in . The base case follows immediately from privacy of PFTAL in Thakurta and Smith, (2013) because is the only algorithm that has not its distribution perturbed by any other algorithm. For the inductive step, assume the result is true for some and let us prove it for .
Let and . Then, for any we have
by the guarantee of PFTAL. Note that we are referring to the PMF and not the CDF of the distribution. This is because PFTAL utilizes Gaussian noise. With this, for we have,
where we utilized induction and the previous inequality. This completes the proof.
Proof of Lemma 5
Let . Let be the regret experienced by algorithm in Algorithm 4.
The following result appears in the proof of Theorem 1 in Chen et al., 2018b .
Lemma 7 (Chen et al., 2018b ).
Assume is monotone DR-submodular and -smooth for every . Then Algorithm 4 ensures
where and is the regret of algorithm .
Using this result, we obtain
We can find the regret by setting .