Differentially Private Generalized Linear Models Revisited

Raman Arora Raef Bassily Cristóbal Guzmán Department of Computer Science, The Johns Hopkins University, \hrefmailto:[email protected]@cs.jhu.edu Department of Computer Science & Engineering and the Translational Data Analytics Institute (TDAI), The Ohio State University, \hrefmailto:[email protected]@osu.edu Department of Applied Mathematics, University of Twente and Institute for Mathematical and Computational Engineering, Pontificia Universidad Católica de Chile, \hrefmailto:[email protected]@utwente.nl Michael Menart Enayat Ullah Department of Computer Science & Engineering, The Ohio State University, \hrefmailto:[email protected]@osu.eduDepartment of Computer Science, The Johns Hopkins University, \hrefmailto:[email protected]@jhu.edu

Abstract

We study the problem of $(\epsilon,\delta)$ -differentially private learning of linear predictors with convex losses. We provide results for two subclasses of loss functions. The first case is when the loss is smooth and non-negative but not necessarily Lipschitz (such as the squared loss). For this case, we establish an upper bound on the excess population risk of $\tilde{O}\left(\frac{\|w^{*}\|}{\sqrt{n}}+\min\left\{\frac{\|w^{*}\|^{2}}{(n\epsilon)^{2/3}},\frac{\sqrt{d}\|w^{*}\|^{2}}{n\epsilon}\right\}\right)$ , where $n$ is the number of samples, $d$ is the dimension of the problem, and $w^{*}$ is the minimizer of the population risk. Apart from the dependence on $\|w^{\ast}\|$ , our bound is essentially tight in all parameters. In particular, we show a lower bound of $\tilde{\Omega}\left(\frac{1}{\sqrt{n}}+{\min\left\{\frac{\|w^{*}\|^{4/3}}{(n\epsilon)^{2/3}},\frac{\sqrt{d}\|w^{*}\|}{n\epsilon}\right\}}\right)$ . We also revisit the previously studied case of Lipschitz losses [SSTT21]. For this case, we close the gap in the existing work and show that the optimal rate is (up to log factors) $\Theta\left(\frac{\|w^{*}\|}{\sqrt{n}}+\min\left\{\frac{\|w^{*}\|}{\sqrt{n\epsilon}},\frac{\sqrt{\text{rank}}\|w^{*}\|}{n\epsilon}\right\}\right)$ , where $\text{rank}$ is the rank of the design matrix. This improves over existing work in the high privacy regime. Finally, our algorithms involve a private model selection approach that we develop to enable attaining the stated rates without a-priori knowledge of $\|w^{*}\|$ .

1 Adapting to \texorpdfstring $\mnorm$

Our method for privately adapting to $\mnorm$ is given in Algorithm 1. We start by giving a high level overview and defining some necessary preliminaries. The algorithm works in the following manner. First we define a number of “guesses” $K$ for $\mnorm$ , $B_{1},...,B_{K}$ where $B_{j}=2^{j}:\forall j\in[K]$ . Then given black box access to a DP optimization algorithm, $\cA$ , Algorithm 1 generates $K$ candidate vectors $w_{1},...,w_{K}$ using $\cA$ , training set $S_{1}\in(\cX\times\cY)^{n/2}$ , and the guesses $B_{1},...,B_{K}$ . We assume $\cA$ satisfies the following accuracy assumption for some confidence parameter $\beta>0$ . {assumption} There exists a function $\err:\re^{+}\mapsto\re^{+}$ such that for any $B\in\re^{+}$ , whenever $B\geq\mnorm$ , w.p. at least $1-\frac{\beta}{4K}$ under the randomness of $S_{1}\sim\cD^{\frac{n}{2}}$ and $\cA$ it holds that $\excessrisk(\cA(S_{1},B);\cD)\leq\err(B)$ . After generating the candidate vectors, the goal is to pick guess with the smallest excess population risk in a differentially private manner using a validation set $S_{2}$ . The following assumption on $\cA$ allows us both to ensure the privacy of the model selection algorithm and verify that $\eloss(w_{j};S_{2})$ provides a tight estimate of $L(w_{j};\cD)$ . {assumption} There exist a function $\Delta:\re^{+}\mapsto\re^{+}$ such that for any dataset $S_{2}\in(\cX\times\cY)^{n/2}$ and $B>0$

\underset{\cA}{\mathbb{P}}[\exists(x,y)\in S_{2}:|\ell(\cA(S_{1},B);(x,y))|\geq\Delta(B)]\leq\frac{\min\bc{\delta,\beta}}{4K}

Specifically, our strategy will be to use the Generalized Exponential Mechanism, $\gem$ , of [RS15] in conjunction with a penalized score function. Roughly, this score function penalty ensures the looser guarantees on the population loss estimate when $B$ is large do not interfere with the loss estimates at smaller values of $B$ . We provide the relevant details for $\gem$ in Appendix D.1. We now state our result.

{algorithm}

[t] Private Grid Search {algorithmic}[1] \REQUIREDataset $\cS\in(\cX\times\cY)^{n}$ , grid parameter $K\in\re$ , optimization algorithm: $\cA:(\cX\times\cY)^{n}\times\re\mapsto\re^{d}$ , privacy parameters $(\epsilon,\delta)$

\STATE

Partition $S$ into two disjoint sets, $S_{1}$ and $S_{2}$ , of size $\frac{n}{2}$

\STATE

$w_{0}=0$

\FOR

$j\in[K]$

\STATE

$B_{j}=2^{j}$

\STATE

$w_{j}=\cA(S_{1},B_{j})$

\STATE

$\tilde{L}_{j}=\eloss(w_{j};S_{2})+\frac{\Delta(B_{j})\log{K/\beta}}{n}+\sqrt{\frac{4\ybound^{2}\log{K/\beta}}{n}}$

\ENDFOR

\STATE

Set $j^{*}$ as the output of $\gem$ run with privacy parameter $\frac{\epsilon}{2}$ , confidence parameter $\frac{\beta}{4}$ , and sensitivity/score pairs $(0,\ybound^{2}),(\Delta(B_{1}),\tilde{L}_{1})...,(\Delta(B_{K}),\tilde{L}_{K})$ ,

\STATE

Output $w_{j^{*}}$

{theorem}

Let $\ell:\re^{d}\times(\cX\times\cY)$ be a smooth non-negative loss function such that $\ell(0,(x,y))\leq\ybound^{2}$ for any $x,y\in(\cX\times\cY)$ . Let $\epsilon,\delta,\beta\in[0,1]$ . Let $K>0$ satisfy $\err(2^{K})\geq\ybound^{2}$ . Let $\cA$ be an $(\frac{\epsilon}{2K},\frac{\delta}{2K})$ -DP algorithm satisfying Assumption 1. Then Algorithm 1 is $(\epsilon,\delta)$ -DP. Further, if $\cA$ satisfies Assumption 1 and $S_{1}\sim\cD^{n/2}$ then Algorithm 1 outputs $\bar{w}$ s.t. with probability at least $1-\beta$ , {align*} \excessrisk(&¯w;\cD) ≤min{\ybound^2, \err(2max\bc\normw^*,1) + 4\ybound2log4K/βn + 5Δ(2max\bc\mnorm,1)nϵ}.

We note that we develop a generic confidence boosting approach to obtain high probability guarantees from our previously described algorithms in Section 1, and thus obtaining algorithms which satisfy 1 is straightforward. We provide more details on how our algorithms satisfy Assumption 1 in Appendix D.4. The following Theorem details the guarantees implied by this method for output perturbation with boosting (see Theorems E,LABEL:thm:boosted-smooth-output-perturbation). Full details are in Appendix D.3. {theorem} Let $K,\epsilon,\delta,\beta>0$ and $\cA$ be the algorithm formed by running LABEL:alg:constrained-reg-erm-output-perturbation with boosting and privacy parameters $\epsilon^{\prime}=\frac{\epsilon}{K}$ , $\delta^{\prime}=\frac{\delta}{K}$ . Then there exists a setting of $K$ such that $K=\Theta\br{\log{\max\bc{\frac{\ybound\sqrt{n}}{\xbound\sqrt{H}},\frac{\ybound^{2}(n\epsilon)^{2/3}}{\sqrt{H}\xbound^{2}}}}}$ and Algorithm 1 run with $\cA$ and $K$ is $(\epsilon,\delta)$ -DP and when given $S\sim\cD^{n}$ , satisfies the following w.p. at least $1-\beta$ (letting $B^{*}=2\max\bc{\mnorm,1}$ ) {align*} \excessrisk(\out) &= ~O(min{\ybound^2, \brHB*\norm\cX4/3\norm\cY2/3+ \brHB*\norm\cX2(nϵ)2/3
+HB*\norm\cXmax\bc\norm\cY,1 + \norm\cY2n + \ybound2+ H(B*\xbound)2nϵ}).

Confidence Boosting:

We give an algorithm to boost the confidence of unconstrained, smooth DP-SCO (with possibly non-Lipschitz losses). We split the dataset $S$ into $m+1$ chunks and run an $(\epsilon,\delta)$ -DP algorithm over the $m$ chunks to get $m$ models, and then use Report Noisy Max mechanism to select a model with approximately the least empirical risk. We show that this achieves the optimal rate of $\tilde{O}\br{\frac{\sqrt{H}B\xbound\ybound}{\sqrt{n}}}$ whereas the previous high probability result of [SST10] had an additional $\tilde{O}\br{\frac{\ybound^{2}}{\sqrt{n}}}$ term, which was also limited to only GLMs. The key idea is that non-negativity, convexity, smoothness and loss bounded at zero, all together enable strong bounds on the variance of the loss, and consequently give stronger concentration bounds. The details are deferred to Appendix E.

Acknowledgements

RA and EU are supported, in part, by NSF BIGDATA award IIS-1838139 and NSF CAREER award IIS-1943251. RB’s and MM’s research is supported by NSF Award AF-1908281 and Google Faculty Research Award. CG’s research is partially supported by INRIA Associate Teams project, FONDECYT 1210362 grant, and National Center for Artificial Intelligence CENIA FB210017, Basal ANID. Part of this work was done while CG was at the University of Twente.

Appendix A Missing Proofs from Section LABEL:sec:smooth_upper (Smooth Upper Bounds)

Appendix B Missing Proofs from Section LABEL:sec:lip_upper (Lipschitz Upper Bounds)

Appendix C Missing proofs from Section LABEL:sec:lip_lower (Lipschitz Lower Bounds)

Appendix D Missing Details for Section 1 (Adapting to \texorpdfstring $\mnorm$ )

D.1 Generalized Exponential Mechanism

{theorem}

[RS15] Let $K>0$ and $S\in\cZ^{n}$ . Let $q_{1},...,q_{K}$ be functions s.t. for any adjacent datasets $S,S^{\prime}$ it holds that $|q_{j}(S)-q_{j}(S^{\prime})|\leq\gamma_{j}:\forall j\in[K]$ . There exists an Algorithm, $\gem$ , such that when given sensitivity-score pairs $(\gamma_{1},q_{1}(S)),...,(\gamma_{N},q_{N}(S))$ , privacy parameter $\epsilon>0$ and confidence parameter $\beta>0$ , outputs $j\in[N]$ such that with probability at least $1-\beta$ satisfies $q_{j}(S)\leq\min\limits_{j\in[N]}\bc{q_{j}(S)+\frac{4\gamma_{j}\log{N/\beta}}{\epsilon}}$ .

D.2 Proof of Theorem 1

Note that by assumptions on $\cA$ , the process of generating $w_{1},...,w_{K}$ is $(\epsilon/2,\delta/2)-DP$ . Furthermore, by Assumption 1 with probability at least $\delta/2$ the sensitivity values passed to $\gem$ bound sensitivity. Thus by the privacy guarantees of $\gem$ and composition we have that the entire algorithm is $(\epsilon,\delta)$ -DP.

We now prove accuracy. In order to do so, we first prove that with high probability every $\tilde{L}_{j}$ is an upper bound on the true population loss of $w_{j}$ . Specifically, define $\tau_{j}=\frac{\Delta(B_{j})\log{4K/\beta}}{n}+\sqrt{\frac{4\ybound^{2}\log{4K/\beta}}{n}}$ (i.e. the term added to each $L(w_{j};S_{2})$ in Algorithm 1). Note it suffices to prove

\mathbb{P}\left[\exists j\in[K]:|L(w_{j};S_{2})-L(w_{j};\cD)|\geq\tau_{j}\right]\leq\frac{\beta}{2}.

(1)

Fix some $j\in[K]$ . Note that the non-negativity of the loss implies that $\ell(w_{B}^{*};(x,y)^{2})\geq 0$ . The excess risk assumption then implies that $\ex{(x,y)\sim\cD}{\ell(w_{j};(x,y))^{2}}\leq 4\ybound^{2}$ , which in turn bounds the variance. Further, with probability at least $1-\frac{\beta}{4K}$ it holds that for all $(x,y)\in S_{2}$ that $\ell(w,(x,y))\leq\Delta_{0}+\Delta(B)$ . Thus by Bernstein’s inequality we have

\mathbb{P}\left[|L(w;S_{2})-L(w;\cD)|\geq t\right]\leq\exp{-\frac{t^{2}n^{2}}{\Delta(B_{j})tn+4n\ybound^{2}}}+\frac{\beta}{4K}

Thus it suffices to set $t=\frac{\Delta(B_{j})\log{4K/\beta}}{n}+\sqrt{\frac{4\ybound^{2}\log{4K/\beta}}{n}}$ to ensure $\mathbb{P}\left[|L(w;S_{2})-L(w;\cD)|\geq t\right]\leq\frac{\beta}{2K}$ . Taking a union bound over all $j\in K$ establishes \eqrefeq:ploss-est-bound. We now condition on this event for the rest of the proof.

Now consider the case where $j^{*}\neq 0$ and $\mnorm\leq 2^{K}$ . Note that the unconstrained minimizer $w^{*}$ is the constrained minimizer with respect to any $\ball{r}$ for $r\geq\norm{w^{*}}$ . With this in mind, let $j^{\prime}=\min\limits_{j\in[K]}\bc{j:w^{*}\in\ball{2^{j}}}$ (i.e. the index of the smallest ball containing $w^{*}$ ). In the following we condition on the event that $\forall j\in[K],j\geq j^{\prime}$ , the parameter vector $w_{j}$ satisfies excess population risk at most $\err(2^{j})$ . We note by Assumption 1 that this (in addition to the event given in \eqrefeq:ploss-est-bound) happens with probability at least $1-\frac{3\beta}{4}$ . By the guarantees of $\gem$ , with probability at least $1-\beta$ we (additionally) have {align*} L(w_j^*;\cD) ≤L(w_j^*;S_2) + τ_j^* &≤min_j∈[K]\bcL(w_j;S_2) + τ_j + 4Δ(Bj)log4K/βnϵ
≤L(w_j’;S_2) + τ_j’ + 4Δ(Bj’)log4K/βnϵ
≤L(w_j’;\cD) + 2τ_j’ + 4Δ(Bj’)log4K/βnϵ.

Since $2^{j^{\prime}}\leq\max\bc{2\mnorm,1}$ we have {align*} L(w_j^*;\cD) - L(w^*;\cD) &≤L(w_j’;\cD)- L(w^*;\cD) + 2τ_j’ + 4Δ(Bj’)log4K/βnϵ
≤\err(2max\bc\mnorm,1) + 2τ_j’ + 4Δ(max\bc2\mnorm,1)log4K/βnϵ
≤\err(2max\bc\mnorm,1) + 4\ybound2log4K/βn + 5Δ(max\bc2\mnorm,1)log4K/βnϵ where the second inequality comes from the fact the assumption that $\mnorm\leq\ybound^{2}$ . Now note that by the assumption that $\err(2^{K})\geq\ybound^{2}$ , whenever $\mnorm\geq 2^{K}$ it holds that $\ybound^{2}\leq\err(\mnorm)$ . However since the sensitivity-score pair $(0,\ybound^{2})$ is passed to $\gem$ , the excess risk of the output is bounded by at most $\ybound^{2}$ by the guarantees of $\gem$ ).

D.3 Proof of Theorem 1

Let $\out$ denote the output of the regularized output perturbation method with boosting and noise and privacy parameters $\epsilon^{\prime}=\frac{\epsilon}{K}$ and $\delta^{\prime}=\frac{\delta}{K}$ . We have by Theorem LABEL:thm:boosted-smooth-output-perturbation that with probability at least $1-\frac{\beta}{4K}$ that {align*} L(\out;\cD) - L(w^*;\cD) &= ~O(HB\norm\cX\norm\cY + \norm\cY2n +\br\brHB\norm\cX4/3\norm\cY2/3+ \brHB\norm\cX2(nϵ)2/3
+ \br\ybound2+ HB2\xbound2nϵ + \br\ybound+HB\xboundn ). Note that this is no smaller than $\ybound^{2}$ when $B=\Omega\br{\max\bc{\frac{\ybound\sqrt{n\epsilon}}{\xbound\sqrt{H}},\frac{\ybound^{2}(n\epsilon)^{2/3}}{\sqrt{H}\xbound^{2}}}}$ , and thus it suffices to set $K=\Theta\br{\log{\max\bc{\frac{\ybound\sqrt{n}}{\xbound\sqrt{H}},\frac{\ybound^{2}(n\epsilon)^{2/3}}{\sqrt{H}\xbound^{2}}}}}$ to satisfy the condition of the Theorem statement.

Let $\sigma_{j}$ denote the level noise used for when the guess for $\mnorm$ is $B_{j}$ . To establish Assumption 1, by Lemma D.4 we have that this assumption is satisfies with $\Delta(B)=\ybound^{2}+H\xbound^{2}\sigma_{j}^{2}\log{K/\min\bc{\beta,\delta}})+HB^{2}\xbound^{2}$ . In particular, we note for the setting of $\sigma_{j}$ implied by Theorem LABEL:thm:boosted-smooth-output-perturbation and the setting of $K$ we have for all $j\in[K]$ that $H\xbound^{2}\sigma_{j}^{2}=\tilde{O}(\ybound^{2})$ . Thus $\Delta(B)=\tilde{O}\br{\ybound^{2}+HB^{2}\xbound^{2}}$ . The result then follows from Theorem 1.

D.4 Stability Results for Assumption 1

{lemma}

Algorithm LABEL:alg:noisySGD run with constraint set $\ball{B}$ satisfies Assumption 1 with $\Delta(B)=\ybound^{2}+HB^{2}\xbound^{2}$ . The proof is straightforward using Lemma LABEL:lem:smooth_loss_bound (provided in the Appendix). For the output perturbation method, we can obtain similar guarantees. Here however, we must account for the fact that the output may not lie in the constraint set. We also remark that the JL-based method (Algorithm LABEL:alg:jl-constrained-dp-erm) can also enjoy this same bound. However, in this case one must apply the norm adaptation method to the intermediate vector $\tilde{w}$ , as $\Phi^{\top}\tilde{w}$ may have large norm. {lemma} Algorithm LABEL:alg:constrained-reg-erm-output-perturbation run with parameter $B$ and $\sigma$ satisfies Assumption 1 with $\Delta(B)=\ybound^{2}+H\xbound^{2}\sigma^{2}\log{K/\delta})+HB^{2}\xbound^{2}$ {proof} Note that since $S$ and $S^{\prime}$ differ in only one point, it suffices to show that for any $(x,y),(x^{\prime},y^{\prime})$ that $\ell(\out;(x,y))\leq\ybound^{2}+HB^{2}\xbound^{2}+H\xbound^{2}\sigma^{2}\log{K/\delta}$ and similarly for $\ell(\out,(x^{\prime},y^{\prime}))$ . Let $w\in\ball{B}$ and let $\out=w+b$ where $b\sim\cN(0,\mathbb{I}_{d}\sigma^{2})$ . We have by previous analysis $\ell(\out;(x,y))\leq\ybound^{2}+HB^{2}\xbound^{2}+H\ip{b}{x}^{2}$ . Since $\ip{b}{x}$ is distributed as a zero mean Gaussian with variance at most $\xbound^{2}\sigma^{2}$ , we have $\mathbb{P}[|\ip{b}{x}|\geq t]\leq\exp{\frac{-t^{2}}{\xbound^{2}\sigma^{2}}}.$ Setting $t=\xbound\sigma\log{K/\delta}$ we obtain $\mathbb{P}[|\ip{b}{x}|^{2}\geq\xbound^{2}\sigma^{2}\log{K/\delta}]\leq\delta/K$ . Thus with probability at least $1-\delta/K$ it holds that $\ell(\out;(x,y))\leq\ybound^{2}+HB^{2}\xbound^{2}+H\xbound^{2}\sigma^{2}\log{K/\delta}$ .

Appendix E Missing Details for Boosting

{algorithm}

[h] Boosting {algorithmic}[1] \REQUIREDataset $S$ , loss function $\ell$ , Algorithm $\cA$ , $\tilde{\sigma}$ privacy parameters $\epsilon,\delta$ \STATESplit the dataset $S$ into equally sized chunks $\bc{S_{i}}_{i=1}^{m+1}$ \STATEFor each $i\in[m+1]$ , $\hat{w}_{i}=\cA\br{S_{i},\frac{\epsilon}{2},\delta}$ \STATE $i^{*}=\argmax_{i\in[m]}\br{-\hat{L}(\hat{w}_{i};S_{m+1})+\operatorname{Lap}(0,\tilde{\sigma})}$ \ENSURE $\hat{w}_{i^{*}}$

We state the result of the boosting procedure in a general enough setup so as apply to our proposed algorithms. This leads to additional conditions on the base algorithm since our proposed methods may not produce the output in the constrained set.

{theorem}

Let $\ell$ be a non-negative, $\tilde{H}$ smooth, convex loss function. Let $\epsilon,\delta>0$ . Let $\cA:(S,\epsilon,\delta)\mapsto\cA(S,\epsilon,\delta)$ be an algorithm such that

1.

$\cA$ satisfies $(\epsilon,\delta)$ -DP
2.

For any fixed $S$ , $\cA(S)$ is $\gamma^{2}$ sub-Gaussian [vershynin2018high]:

$\sup_{\norm{u}=1}\mathbb{E}\left[\exp{\ip{\cA(S)}{u}^{2}/\gamma^{2}}\right]\leq 2$
3.

For any fixed $S$ , $\mathbb{P}_{(x,y)}[\ell(\cA(S);(x,y))>\lbounddef]<\beta$
4.

Given a dataset $S$ of $n$ i.i.d. points, $\mathbb{E}[L(\cA(S);\cD)-\min_{w\in\cB_{B}}L(w;\cD)]\leq\err\br{n,\epsilon,\gamma}$

Let $\tilde{\sigma}^{2}=\frac{4(\ybound^{2}+\tilde{H}\tilde{\gamma}^{2}\xbound^{2})}{n\epsilon}$ and $n_{0}=\frac{16\gamma^{2}\operatorname{log}^{8}\br{4/\beta}\tilde{H}}{\ybound^{2}}$ . \crefalg:boosting with Algorithm $\cA$ as input satisfies $(\epsilon,\delta)$ -DP. Given a dataset $S$ of $n\geq n_{0}$ samples, with probability at least $1-\beta$ , the excess risk of its output $\hat{w}_{i^{*}}$ is bounded as,

{align*}

L(^w;\cD) - L(w^*;\cD) &≤~O(\err\brn4log4/β,ϵ2,γ + 2Δ(γ,β/2)nϵ +2\lboundtwon
+ 32γ~H\yboundn + 16\yboundn +128~Hγ2n).

We first establish the following concentration bound for convex $\tilde{H}$ smooth non-negative functions.

{lemma}

Let $\ell$ be a convex $\tilde{H}$ smooth non-negative function. Let $S$ be a dataset of $n$ i.i.d. samples. Let $w$ be a random variable which is $\gamma^{2}$ sub-Gaussian and independent of $S$ and let $\lbounddef$ be such that $\mathbb{P}_{(x,y)}[\ell(w;(x,y))>\lbounddef]\leq\beta$ . Then, with probability at least $1-\beta$ , {align*} ^L(w;S) & ≤\br1+T(n,β)L(w;\cD) + U(n,β)
L(w;\cD) ≤\br1+T(n,β)^L(w;S)+V(n,β) where $T(n,\beta):=\frac{4\gamma\log{4/\beta}\sqrt{\tilde{H}}}{\ybound\sqrt{n}}$ , $U(n,\beta):=\frac{4\gamma\log{4/\beta}\ybound\sqrt{\tilde{H}}}{\sqrt{n}}+\frac{\ybound\sqrt{\log{2/\beta}}}{\sqrt{n}}$ and
$V(n,\beta):=\frac{4\gamma\log{4/\beta}\sqrt{\tilde{H}}\ybound}{\sqrt{n}}+\frac{2\lbound\log{2/\beta}}{n}+\frac{\ybound\sqrt{\log{2/\beta}}}{\sqrt{n}}+\frac{48\tilde{H}\gamma^{2}\operatorname{log}^{2}(4/\beta)}{n}$ .

{proof}

With probability at least $1-\frac{\beta}{4}$ , for each $(x,y)\in S,\ell(w;(x,y))\leq\lbound$ . We condition on this event and apply Bernstein inequality to the random variable $L(w;\cD)-\hat{L}(w;S)$ : {align*} \mathbbP[\absL(w;\cD)-^L(w;S) ¿ t] ≤exp-3nt26n\mathbbE[\brL(w;\cD)-^L(w;S)2]+2\lboundt This gives us that {align} \absL(w;\cD)-^L(w;S) ≤\lboundlog2/βn + \mathbbE\brL(w;\cD)-^L(w;S)^2log2/β

The term $\mathbb{E}[\br{L(w;\cD)-\hat{L}(w;S)}^{2}=\frac{1}{n}\mathbb{E}[\br{\ell(w;(x,y))-\mathbb{E}[\ell(w;(x,y)]}^{2}]\leq\frac{1}{n}\mathbb{E}[(\ell(w;(x,y)))^{2}]$ .

Now, {align*} \mathbbE[(ℓ(w;(x,y)))^2] &≤2\mathbbE[(ℓ(w;(x,y)) - ℓ(0;(x,y))^2] + 2\mathbbE[(ℓ(0;(x,y)))

Differentially Private Generalized Linear Models Revisited

Abstract

1 Adapting to \texorpdfstring\mnorm\mnorm

Confidence Boosting:

Acknowledgements

Appendix A Missing Proofs from Section LABEL:sec:smooth_upper (Smooth Upper Bounds)

Appendix B Missing Proofs from Section LABEL:sec:lip_upper (Lipschitz Upper Bounds)

Appendix C Missing proofs from Section LABEL:sec:lip_lower (Lipschitz Lower Bounds)

Appendix D Missing Details for Section 1 (Adapting to \texorpdfstring\mnorm\mnorm)

D.1 Generalized Exponential Mechanism

D.2 Proof of Theorem 1

D.3 Proof of Theorem 1

D.4 Stability Results for Assumption 1

Appendix E Missing Details for Boosting

1 Adapting to \texorpdfstring $\mnorm$

Appendix D Missing Details for Section 1 (Adapting to \texorpdfstring $\mnorm$ )