This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Private and Accurate Decentralized Optimization via
Encrypted and Structured Functional Perturbation

Yijie Zhou and Shi Pu *This work was supported in parts by National Natural Science Foundation of China (NSFC) (Grant No. 62003287), Shenzhen Science and Technology Program (Grant No. RCYX20210609103229031 and No. GXWD20201231105722002-20200901175001001) and Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS) (Grant No. AC01202101108).Y. Zhou is with School of Data Science, The Chinese University of Hong Kong, Shenzhen, China. S. Pu is with School of Data Science, Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen, China ([email protected], [email protected]).
Abstract

We propose a decentralized optimization algorithm that preserves the privacy of agents’ cost functions without sacrificing accuracy, termed EFPSN. The algorithm adopts Paillier cryptosystem to construct zero-sum functional perturbations. Then, based on the perturbed cost functions, any existing decentralized optimization algorithm can be utilized to obtain the accurate solution. We theoretically prove that EFPSN is (ϵ,δ\epsilon,\delta)-differentially private and can achieve nearly perfect privacy under deliberate parameter settings. Numerical experiments further confirm the effectiveness of the algorithm.

I Introduction

The problem of optimizing a global objective function through the cooperation of multiple agents has gained increased attention in recent years [1, 2]. This is driven by the wide applicability of the problem to many engineering and scientific domains, ranging from cooperative control, distributed sensing, multi-agent systems, and sensor networks to large-scale machine learning; see, e.g., [3, 4, 5, 6].

In this paper, we consider a peer-to-peer network of nn agents that solves the following optimization problem cooperatively:

minx𝒟F(x)=1ni𝒩fi(x),\min_{x\in\mathcal{D}}F(x)=\frac{1}{n}\sum_{i\in\mathcal{N}}f_{i}(x), (1)

where xx is the common parameter of all the agents, 𝒟\mathcal{D} is the domain of xx, and 𝒩\mathcal{N} denotes the collection of all the agents.

Despite the enormous success of gradient-based distributed optimization algorithms, they all require agents to explicitly share optimization variables and/or gradient estimates in every iteration. This would become a problem in applications involving sensitive data. Zhu et al. [7] showed that obtaining private training data from publicly shared gradients is possible, and the recovery is pixelwise accurate for images and token-wise matching for texts. Consequently, we wish to solve (1) in a private way.

In the machine learning community, one of the common forms of the individual function is fi(x)𝔼ξi𝒟i[li(x,ξi)]f_{i}(x)\triangleq\mathbb{E}_{\xi_{i}\sim\mathcal{D}_{i}}[l_{i}(x,\xi_{i})], where 𝒟i\mathcal{D}_{i} is the local data distribution of agent ii, and ξi\xi_{i} is a data sample or a batch of data samples. In such a case, each function fif_{i} contains information on the data distribution 𝒟i\mathcal{D}_{i}, which is usually sensitive. It is thus crucial to keep the objective functions private. More specifically, by privacy, we refer to keeping all fif_{i} from being inferred by adversaries. In this work, we consider two types of adversaries:

  • The eavesdropper: an external adversary having access to all information transmitted through the communication channels within the network.

  • The curious-but-honest adversary: an external adversary that corrupts a subset of the agents. The adversary knows all the information of every corrupted agent ii, including all the information within agent ii, e.g., the individual function fif_{i}, and all the information passed from its neighbor to agent ii. However, each corrupted agent obeys the optimization protocol precisely.

Plenty of efforts have been reported to counteract potential privacy breaches in distributed optimization. Privacy preserving algorithms either process the messages being transmitted between agents or change the functions to be optimized. We refer to them as message-based and function-based methods, respectively. Differential Privacy (DP) [8, 9], a de facto standard for privacy serving, has been introduced to the context of distributed optimization [10, 11, 12, 13]. The DP-based methods can be categorized into message-perturbing [10, 11, 12] and function-perturbing ones [13]. The message-perturbing methods inject noises to the messages each agent sends, while the latter adds functional noises into each agent’s cost function fif_{i}.

However, the direct combination of DP and distributed optimization suffers from the accuracy-privacy trade-off. Huang et al. [12] observed that, by fixing the other parameters, the accuracy level of the obtained solution is in the order of O(1ϵ2)O(\frac{1}{\epsilon^{2}}), where ϵ\epsilon is the privacy budget that is inversely proportional to privacy. The papers [14, 15] adopted encryption techniques to preserve privacy in distributed optimization. Nevertheless, the heavy communication and computation overhead prevents the methods from many real-world applications.

Adding structured noise [16] is a workaround for the accuracy-privacy trade-off and the heavy overhead mentioned above. The paper [16] constructed a set of zero-sum Gaussian noises to perturb the affine terms of the objective functions that are assumed to be quadratic. Nonetheless, the method fails under an eavesdropping attack due to the naive communication method. Besides, the privacy analysis in [16] is carried out under a self-defined privacy framework. We provide a categorization of the current privacy-preserving distributed optimization algorithms in Table I.

Paper Index [10, 11, 12] [13] [14, 15] [16] Ours
Message-based
Function-based
Structured-noise
Encryption
DP-based
Table I: Categorization of existing privacy-preserving distributed optimization algorithms.

In this paper, we integrate the encryption-based scheme and the structured noise method deliberately under the functional DP framework. The proposed new method, termed the Encrypted Functional Perturbation with Structured Noise algorithm (EFPSN), enjoys the benefits of several previous methods. In particular, EFPSN adopts the Paillier encryption scheme to construct zero-sum noises among the agents secretly. The noises are subsequently used to generate functional perturbations for each agent’s cost function. Such a procedure differs from those in [14, 15] and bypasses the heavy communication and computation overhead caused by encryption at every iteration. Then, based on the perturbed cost functions, any existing decentralized optimization algorithm can be utilized to obtain the accurate solution to problem (1) thanks to the structured noises. We further theoretically prove that EFPSN is (ϵ,δ\epsilon,\delta)-differentially private and can achieve nearly perfect privacy under deliberate parameter settings. In other words, EFPSN can achieve nearly perfect privacy without sacrificing accuracy.

The rest of this paper is organized as follows: Section 2 specifies the notation and provides related background knowledge. Then, we develop the Encrypted Functional Perturbation with Structured Noise algorithm in Section 3. Privacy analysis under the DP framework is carried out in Section 4. Finally, simulation examples and conclusion are presented in Section 5 and Section 6 correspondingly.

II Preliminaries

This section introduces the notation, graph-related concepts and some background knowledge on Hilbert space and Paillier Cryptosystem, since generating functional perturbations relies on the orthonormal systems in some Hilbert space, and Paillier Cryptosystem [17] is adopted to construct structured noises privately.

II-A Notation

We use \mathbb{R} to denote the set of real numbers and d\mathbb{R}^{d} the Euclidean space of dimension dd. The space of scalar-valued infinite sequences are denoted by \mathbb{R}^{\mathbb{N}}. Let ,>0\mathbb{Z},\mathbb{Z}_{>0} be the set of integers and positive integers, respectively. Given w>0w\in\mathbb{Z}_{>0} and w{0,1,,w}\mathbb{Z}_{w}\triangleq\{0,1,\cdots,w\}, w\mathbb{Z}_{w}^{\ast} denotes the set of positive integers which are smaller than ww and do not have common factors other than 1 with ww. Let l2l_{2}\subset\mathbb{R}^{\mathbb{N}} be the space of infinite square summable sequences. For DdD\subseteq\mathbb{R}^{d}, L2(D)L_{2}(D) denotes the set of square-integrable measurable functions over DD. 𝟏\mathbf{1} denotes a column vector with all entries equal to 1. A vector is viewed as a column vector unless otherwise stated. ATA^{T} denotes the transpose of the matrix AA and xTyx^{T}y denotes the scalar product of two vectors xx and yy. We use ,\langle\cdot,\cdot\rangle to denote the inner product and ||||||\cdot|| to denote the Euclidean norm for a vector (induced Euclidean norm for a matrix). A square matrix AA is column-stochastic when its elements in every column add up to one. A matrix AA is said to be doubly stochastic when both AA and ATA^{T} are column-stochastic matrices.

We use {𝒜}\mathbb{P}\{\mathcal{A}\} to denote the probability of an event 𝒜\mathcal{A}, 𝒫X(y)\mathcal{P}_{X}(y) the probability density function of random variable XX, and 𝔼[X|]\mathbb{E}[X|\mathcal{F}] the expectation of a random variable XX with \mathcal{F} denoting the sigma algebra, which will be omitted when clear from the context. For an encryption scheme, En(),De()\text{En}(\cdot),\text{De}(\cdot) represent the encoder and decoder respectively. Let gcd,lcm,mod\gcd,\text{lcm},\mod be the greatest common divider, the least common multiple, and the modulo operator, respectively. N(μ,σ2)N(\mu,\sigma^{2}) is the (multivariate) Gaussian distribution with (vector) mean μ\mu and variance (covariance matrix) σ2\sigma^{2}. NN^{\dagger} represents the degenerate Gaussian distribution.

II-B Graph Related Concepts

We assume that the agents interact on an undirected graph, described by a matrix Wn×nW\in\mathbb{R}^{n\times n}. More specifically, if agents ii and jj can communicate and interact with each other, then wijw_{ij}, the (i,j)(i,j)-th entry of WW, is positive. Otherwise, wijw_{ij} equals zero. The neighbor set 𝒩i\mathcal{N}_{i} of agent ii is defined as the set of agents {j|wij>0}\{j|w_{ij}>0\}. Note that i𝒩ii\in\mathcal{N}_{i} always holds. Denote \mathcal{L} as the graph Laplacian depicted by WW. Let μ1μ2μn\mu_{1}\leq\mu_{2}\leq\cdots\leq\mu_{n} be the eigenvalues of \mathcal{L} and MM be the unitary matrix that satisfies =MDiag(μ1,,μn)MT\mathcal{L}=M\text{Diag}(\mu_{1},...,\mu_{n})M^{T}. Denote μ¯()\underline{\mu}(\mathcal{L}) as the second smallest eigenvalue of \mathcal{L} and μ¯()\bar{\mu}(\mathcal{L}) as the largest eigenvalue of \mathcal{L}.

II-C Hilbert Spaces

A Hilbert space \mathcal{H} is a complete inner-product space. A set {ek}k\{e_{k}\}_{k\in\mathbb{N}}\subset\mathcal{H} is an orthonormal system if ek,ej=0\langle e_{k},e_{j}\rangle=0 for kjk\neq j and ek,ek=ek2=1\langle e_{k},e_{k}\rangle=||e_{k}||^{2}=1 for all kk\in\mathbb{N}. If, in addition, the set of linear combinations of {ek}k\{e_{k}\}_{k\in\mathbb{N}} is dense in \mathcal{H}, then {ek}kI\{e_{k}\}_{k\in I} is an orthonormal basis. If \mathcal{H} is separable, then any orthonormal basis is countable, and we have

h=k=0h,ekek,h=\sum_{k=0}^{\infty}\langle h,e_{k}\rangle e_{k}, (2)

for any hh\in\mathcal{H}. Define the coefficient sequence θ\mathbb{\theta}\in\mathbb{R}^{\mathbb{N}} by θk=h,ek\theta_{k}=\langle h,e_{k}\rangle for kk\in\mathbb{N}. Then θl2\mathbb{\theta}\in l_{2} and, by Parseval’s identity, h=θ||h||=||\mathbb{\theta}||. Let Φ:l2\Phi:l_{2}\rightarrow\mathcal{H} be the linear bijection that maps the coefficient sequence θ\mathbb{\theta} to hh. For an arbitrary Dd,L2(D)D\subseteq\mathbb{R}^{d},L_{2}(D) is a Hilbert space, and the inner product is the integral of the product of functions. Moreover, L2(D)L_{2}(D) is separable. In this paper, we denote {ek}k=0\{e_{k}\}_{k=0}^{\infty} as an orthonormal basis for L2(D)L_{2}(D) and Φ:l2L2(D)\Phi:l_{2}\rightarrow L_{2}(D) the corresponding linear bijection between coefficient sequences and functions.

II-D Paillier Cryptosystem

The Paillier Cryptosystem is an algorithm for public key cryptography. The algorithm applies to the scenario of sending a private message over open and insecure communication links, and it consists of key generation, encryption, and decryption steps as follows:

  • Key Generation

    • The message receiver chooses two large prime numbers aa and bb randomly and independently of each other such that gcd(ab,(a1)(b1))=1\gcd(ab,(a-1)(b-1))=1. This property is assured if both primes are of equal length [18].

    • Compute f=abf=ab and λ=lcm(a1,b1)\lambda=\text{lcm}(a-1,b-1)

    • Seletect random integer gg, where gf2g\in\mathbb{Z}_{f^{2}}^{\ast} such that the modular multiplicative inverse μ=((gλmodf2)1f)1modf\mu=(\frac{(g^{\lambda}\mod f^{2})-1}{f})^{-1}\mod f exists.

    • Let the public key 𝒦¯=(f,g)\bar{\mathcal{K}}=(f,g), the private key 𝒦~=(λ,μ)\tilde{\mathcal{K}}=(\lambda,\mu).

  • Encryption: To encrypt a plaintext p¯f\underline{p}\in\mathbb{Z}_{f}, the sender selects a random number rfr\in\mathbb{Z}_{f}^{\ast} and computes the ciphertext c¯=En(p¯,𝒦¯,r)=gp¯rfmodf2\underline{c}=\text{En}(\underline{p},\bar{\mathcal{K}},r)=g^{\underline{p}}\cdot r^{f}\mod f^{2}.

  • Decryption: To decrypt the ciphertext c¯f2\underline{c}\in\mathbb{Z}_{f^{2}}, the receiver computes the decrypted text p¯¯=De(c¯,𝒦¯,𝒦~)=(c¯λmodf2)1fμmodf\bar{\underline{p}}=\text{De}(\underline{c},\bar{\mathcal{K}},\tilde{\mathcal{K}})=\frac{(\underline{c}^{\lambda}\mod f^{2})-1}{f}\cdot\mu\mod f.

One notable homomorphic property the Paillier encryption scheme holds is: given any p¯1,,p¯m\underline{p}_{1},\cdots,\underline{p}_{m}\in\mathbb{N}. If l=1mp¯lZf,then De(l=1mEn(p¯l,𝒦¯,rl),𝒦¯,𝒦~)=l=1mp¯l\sum_{l=1}^{m}\underline{p}_{l}\in Z_{f},\text{then }\text{De}(\prod_{l=1}^{m}\text{En}(\underline{p}_{l},\bar{\mathcal{K}},r_{l}),\bar{\mathcal{K}},\tilde{\mathcal{K}})=\sum_{l=1}^{m}\underline{p}_{l}.

III Algorithm Design

In this section, we propose the Encrypted Functional Perturbation with Structured Noise algorithm (EFPSN in short) that privately solves the problem (1). Unlike the majority of privacy-preserving algorithms, EFPSN does not sacrifice accuracy for privacy.

To achieve privacy, EFPSN adds structured functional perturbations on individual cost functions {fi}iN\{f_{i}\}_{i\in N}. Specifically, the algorithm aims at making the functional perturbations zero-sum. EFPSN consists of two phases, as shown in Algorithm 1.

Algorithm 1 Encrypted Functional Perturbation with Structured Noise
1:Cost function {fi}i𝒩\{f_{i}\}_{i\in\mathcal{N}}, noise precision order PP, perturbation order KK, and {σk}kK\{\sigma_{k}\}_{k\in\mathbb{Z}_{K}}.
2:xx^{*}
3:  
4:Phase 1 – Masking cost functions
5:  
6:for i𝒩i\in\mathcal{N} do
7:     Generate key pair (𝒦¯i,𝒦~i)(\bar{\mathcal{K}}_{i},\tilde{\mathcal{K}}_{i}), and rir_{i} following the Paillier encryption scheme
8:     Share public key 𝒦¯i\bar{\mathcal{K}}_{i} with agent j𝒩ij\in\mathcal{N}_{i}
9:end for
10:for i𝒩i\in\mathcal{N} do
11:     for (j,k)𝒩i×K(j,k)\in\mathcal{N}_{i}\times\mathbb{Z}_{K} do
12:         Generate random noise ηijkN(0,σk2)\eta_{ijk}\sim N(0,\sigma_{k}^{2})
13:         Calculate c¯ijk=En(10Pηijk,𝒦¯j,ri)\underline{c}_{ijk}=\text{En}(\lfloor 10^{P}\eta_{ijk}\rfloor,\bar{\mathcal{K}}_{j},r_{i})
14:         Send c¯ijk\underline{c}_{ijk} to agent jj
15:     end for
16:     for kKk\in\mathbb{Z}_{K} do
17:         Calculate η¯ik=j𝒩iηijk\bar{\eta}_{ik}=\sum_{j\in\mathcal{N}_{i}}\eta_{ijk}- 10PDe(j𝒩ic¯jik,𝒦¯i,𝒦~i)\quad\quad 10^{-P}\text{De}(\prod_{j\in\mathcal{N}_{i}}\underline{c}_{jik},\bar{\mathcal{K}}_{i},\tilde{\mathcal{K}}_{i})
18:     end for
19:     f^i=Φ(Φ1(fi)+η¯i)=fi+Φ(η¯i),\hat{f}_{i}=\Phi(\Phi^{-1}(f_{i})+\bar{\eta}_{i})=f_{i}+\Phi(\bar{\eta}_{i}), η¯i=[η¯i0,,η¯iK,0,]\quad\bar{\eta}_{i}=[\bar{\eta}_{i0},...,\bar{\eta}_{iK},0,...]\in\mathbb{R}^{\mathbb{N}}
20:end for
21:  
22:Phase 2 – Distributed optimization
23:  
24:Execute any distributed optimization algorithm on the masked functions {f^i}i𝒩\{\hat{f}_{i}\}_{i\in\mathcal{N}}

In phase I, the agents in the network cooperate to generate functional perturbations in a way that is immune to eavesdropping attacks and honest-but-curious attacks to some extent. First, they generate keys and random numbers required by the Paillier encryption scheme. Then, they encrypt and send random noise 10Pηijk\lfloor 10^{P}\eta_{ijk}\rfloor to their neighbors, and the noises are further decrypted to construct the zero-sum perturbation. Due to encryption, the signals are transferred privately and securely under eavesdropping. However, since Paillier encryption only works for integers, we set a precision order PP, and encrypt and decrypt 10Pηijk\lfloor 10^{P}\eta_{ijk}\rfloor.

Subsequently, in Line 10, each agent ii calculates η¯ik\bar{\eta}_{ik} by subtracting the sum of noises it receives from the sum of noises it sends. Due to Paillier encryption’s homomorphic property, each agent only needs to decode once for each kKk\in\mathbb{Z}_{K}. This saves computation, especially when each agent has a large number of neighbors.

Paillier encryption scheme guarantees privacy under eavesdropping attacks. In terms of the honest-but-curious attacks, the noise coefficient sequence η¯i\bar{\eta}_{i} of each agent i𝒩i\in\mathcal{N} remains unknown to the attacker as long as the attacker does not corrupt all ii’s neighbors. Under such circumstances, the privacy of agent ii is maintained.

It is worth noting that i𝒩j𝒩i(ηijkηjik)=0\sum_{i\in\mathcal{N}}\sum_{j\in\mathcal{N}_{i}}(\eta_{ijk}-\eta_{jik})=0 for all kk. Such a construction forces limPiη¯ik=0\lim_{P\rightarrow\infty}\sum_{i}\bar{\eta}_{ik}=0 to hold for all kk. Therefore, we have generated a set of zero-sum signals {η¯ik}i𝒩\{\bar{\eta}_{ik}\}_{i\in\mathcal{N}} given large PP.

Finally in Line 12, the agents perturb the cost functions by adding Φ(η¯i)\Phi(\bar{\eta}_{i}). Φ()\Phi(\cdot) is a function that maps a sequence in l2l_{2} to a function in L2(D)L_{2}(D). Such a construction depends on the orthonormal system {ek}kI\{e_{k}\}_{k\in I} in \mathcal{H} one chooses. For instance, given the orthonormal system {ek}kI\{e_{k}\}_{k\in I} and a sequence η¯i{η¯ik}kI\bar{\eta}_{i}\triangleq\{\bar{\eta}_{ik}\}_{k\in I}, Φ(η¯i)=kIη¯ikek\Phi(\bar{\eta}_{i})=\sum_{k\in I}\bar{\eta}_{ik}e_{k}. The orthonormal system we use will be specified later.

Since {η¯i}i𝒩\{\bar{\eta}_{i}\}_{i\in\mathcal{N}} is zero-sum when PP\rightarrow\infty, we have

limPiΦ(η¯i)\displaystyle\lim_{P\rightarrow\infty}\sum_{i}\Phi(\bar{\eta}_{i}) =limPikη¯ikek\displaystyle=\lim_{P\rightarrow\infty}\sum_{i}\sum_{k}\bar{\eta}_{ik}e_{k} (3)
=klimP(iη¯ik)ek\displaystyle=\sum_{k}\lim_{P\rightarrow\infty}(\sum_{i}\bar{\eta}_{ik})e_{k}
=0.\displaystyle=0.

Therefore, the set of perturbing functions {Φ(η¯i)}i𝒩\{\Phi(\bar{\eta}_{i})\}_{i\in\mathcal{N}} is zero-sum when PP\rightarrow\infty. Additionally, the decrypted text in Line 10 is of precision 10P10^{-P}. Consequently, the error brought by PP will be dominated by the floating-point error once PP is set to a moderately large value. Though {Φ(η¯i)}i𝒩\{\Phi(\bar{\eta}_{i})\}_{i\in\mathcal{N}} is zero-sum, each agent ii gains privacy from the non-zero functional perturbation Φ(η¯i)\Phi(\bar{\eta}_{i}).

In phase II, the agents may conduct any distributed optimization algorithm on {f^i}i𝒩\{\hat{f}_{i}\}_{i\in\mathcal{N}}. Since if^i(x)=ifi(x)\sum_{i}\hat{f}_{i}(x)=\sum_{i}f_{i}(x), the obtained solution solves the original problem (1) when PP\rightarrow\infty. Namely, EFPSN solves problem (1) without any accuracy degeneration given proper PP.

Remark III.1.

EFPSN combines encryption, functional perturbation, and structured noise and is superior to considering any one of the techniques. In previous message-based methods, encryption at every iteration results in heavy communication and computation overhead, whereas the function-level encryption in EFPSN alleviates such a pain: only insignificant communication and computation overhead occur in phase I. Regarding the existing function-based methods, the obtained solution after functional perturbation suffers from a privacy-related deviation from the solution to problem (1), and thus leads to privacy-accuracy tradeoff. For EFPSN, however, the optimization accuracy is independent of the privacy budget, which will be elaborated more in the following sections. Moreover, using structured noise alone fails with the presence of eavesdropping, limiting its applicability.

IV Privacy Analysis

In this section, we analyse the privacy-related property of EFPSN under the framework of differential privacy. Particularly, we prove the mechanism of generating the masked functions is differentially private. And since DP is compatible with any post-processing, EFPSN remains differentially private.

First, we introduce the definition of 𝒱\mathcal{V}-adjacency, which was originally proposed in [13].

Definition 1 (𝒱\mathcal{V}-adjacency).

Given any normed vector space (𝒱,||||𝒱)(\mathcal{V},||\cdot||_{\mathcal{V}}) with 𝒱L2(D)\mathcal{V}\subseteq L_{2}(D), two sets of functions F={fi}i𝒩,F={fi}i𝒩L2(D)F=\{f_{i}\}_{i\in\mathcal{N}},F^{\prime}=\{f^{\prime}_{i}\}_{i\in\mathcal{N}}\subset L_{2}(D) are 𝒱\mathcal{V}-adjacent if there exists I𝒩I\in\mathcal{N} such that

fi=fi,iI, and fIfI𝒱.f_{i}=f_{i}^{\prime},i\neq I,\text{ and }f_{I}-f_{I}^{\prime}\in\mathcal{V}. (4)

We adopt the standard (ϵ,δ)(\epsilon,\delta)-DP definition under our functional setting.

Definition 2 ((ϵ,δ)(\epsilon,\delta)-Differential Privacy).

Consider a random map :L2(D)n𝒳\mathcal{M}:L_{2}(D)^{n}\rightarrow\mathcal{X} from the function space L2(D)nL_{2}(D)^{n} to an arbitrary set 𝒳\mathcal{X}. Given ϵ,δ0\epsilon,\delta\in\mathbb{R}_{\geq 0}, the map \mathcal{M} is (ϵ,δ)(\epsilon,\delta)-differentially private if, for any two 𝒱\mathcal{V}-adjacent sets of functions F={fi}i𝒩F=\{f_{i}\}_{i\in\mathcal{N}} and F={fi}i𝒩F^{\prime}=\{f^{\prime}_{i}\}_{i\in\mathcal{N}}, one has

{(F)𝒪}eϵ{(F)𝒪}+δ.\mathbb{P}\{\mathcal{M}(F)\in\mathcal{O}\}\leq e^{\epsilon}\cdot\mathbb{P}\{\mathcal{M}(F^{\prime})\in\mathcal{O}\}+\delta. (5)

We choose our adjacency space 𝒱q\mathcal{V}_{q} as follows. Given q>1q>1, consider the weight sequence {kq}k=1\{k^{q}\}^{\infty}_{k=1} and define the adjacency vector space to be the image of the resulting weighted l2l_{2} space under Φ\Phi, i.e.,

𝒱q=Φ({δ|k=1k2qδk4<}),\displaystyle\mathcal{V}_{q}=\Phi(\{\delta\in\mathbb{R}^{\mathbb{N}}|\sum_{k=1}^{\infty}k^{2q}\delta_{k}^{4}<\infty\}), (6)

where δk\delta_{k} is the kkth element of δ\delta. The rationale of considering such a space will be made clear from the analysis. Moreover,

f𝒱q(k=1(k2qδk4))14,with δ=Φ1(f)\displaystyle||f||_{\mathcal{V}_{q}}\triangleq(\sum_{k=1}^{\infty}(k^{2q}\delta_{k}^{4}))^{\frac{1}{4}},\text{with }\delta=\Phi^{-1}(f) (7)

is a norm on 𝒱q\mathcal{V}_{q}.

Now we introduce our main theorem about the privacy-preserving property of EFPSN.

Theorem IV.1.

Given q>1,γ>0,p(1/2,q1/2)q>1,\gamma>0,p\in(1/2,q-1/2), the chosen 𝒱q\mathcal{V}_{q} space, and σk2=γkp\sigma_{k}^{2}=\frac{\gamma}{k^{p}}, the mechanism in Alg. 1 is (ϵ,δ)(\epsilon,\delta)-differential private when the precision order and the perturbation order P,KP,K\rightarrow\infty, with ϵ={1μ¯()(A4+Rμ¯()A2)},δ=eR22\epsilon=\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\frac{A}{4}+\frac{R\sqrt{\bar{\mu}(\mathcal{L})A}}{\sqrt{2}})\right\},\delta=e^{-\frac{R^{2}}{2}}, where A=1γζ(2(qp))fIfI𝒱q2A=\frac{1}{\gamma}\sqrt{\zeta(2(q-p))}||f_{I}-f_{I}^{\prime}||^{2}_{\mathcal{V}_{q}}, and RR is an arbitrary positive real number.

Proof.

When the precision order PP\rightarrow\infty, the encryption process is precise. For convenience, we can ignore the encryption process and the flooring in Alg. 1. Denote

ηik\displaystyle\eta_{ik} j𝒩iηijkj𝒩iηjik,\displaystyle\triangleq\sum_{j\in\mathcal{N}_{i}}\eta_{ijk}-\sum_{j\in\mathcal{N}_{i}}\eta_{jik}, (8)
ηi\displaystyle\eta_{i} [ηi0,,ηiK].\displaystyle\triangleq[\eta_{i0},...,\eta_{iK}].

Under such a case, \mathcal{M} in algorithm 1 is essentially adding functional perturbations constructed from a set of degenerate Gaussian noises. Specifically, given F={fi}i𝒩F=\{f_{i}\}_{i\in\mathcal{N}}, we have

(F)={fi+Φ(ηi)}i𝒩,\mathcal{M}(F)=\{f_{i}+\Phi(\eta_{i})\}_{i\in\mathcal{N}}, (9)

and

𝜼k=[η1k,,ηnk]TN(0n,2σk2),kK\boldsymbol{\eta}_{k}=[\eta_{1k},...,\eta_{nk}]^{T}\sim N^{\dagger}(0_{n},2\sigma_{k}^{2}\mathcal{L}),\forall k\in\mathbb{Z}_{K} (10)

are the zero-sum coefficients.

From [16], we have

𝒫𝜼k(𝒚)={1det(4πσk2)exp(𝒚T𝒚4σk2),𝒚T𝟏=00,else,\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{y})=\left\{\begin{aligned} \frac{1}{\sqrt{\det^{*}(4\pi\sigma_{k}^{2}\mathcal{L})}}\exp(-\frac{\boldsymbol{y}^{T}\mathcal{L}^{\dagger}\boldsymbol{y}}{4\sigma_{k}^{2}})&,\boldsymbol{y}^{T}\boldsymbol{1}=0\\ 0&,\text{else},\end{aligned}\right. (11)

where =MDiag(0,1/μ2,,1/μn)MT\mathcal{L}^{\dagger}=M\text{Diag}(0,1/\mu_{2},...,1/\mu_{n})M^{T} and det(4πσk2)=(4πσk2)n1i=2nμi\det^{*}(4\pi\sigma_{k}^{2}\mathcal{L})=(4\pi\sigma_{k}^{2})^{n-1}\prod_{i=2}^{n}\mu_{i}.

Let F={fi}i𝒩F^{\prime}=\{f_{i}^{\prime}\}_{i\in\mathcal{N}} be a set of 𝒱\mathcal{V}-adjacent functions of FF that differ only in the II-th element. Let Ψ1:L2(D)nn×\Psi^{-1}:L_{2}(D)^{n}\rightarrow\mathbb{R}^{n\times\mathbb{N}} be the map such that Ψ1(F)={Φ1(fi)}i𝒩\Psi^{-1}(F)=\{\Phi^{-1}(f_{i})\}_{i\in\mathcal{N}}. Define Φ0:k1:L2(D)k+1\Phi_{0:k}^{-1}:L_{2}(D)\rightarrow\mathbb{R}^{k+1}, Φk1:L2(D)\Phi_{k}^{-1}:L_{2}(D)\rightarrow\mathbb{R} as the map that returns the first k+1k+1 coefficients and the (k+1)(k+1)-th coefficient of Φ1()\Phi^{-1}(\cdot), respectively. Similarly, define Ψ0:k1:L2(D)nn×(k+1),Ψk1:L2(D)nn\Psi_{0:k}^{-1}:L_{2}(D)^{n}\rightarrow\mathbb{R}^{n\times(k+1)},\Psi_{k}^{-1}:L_{2}(D)^{n}\rightarrow\mathbb{R}^{n} as the map that returns the first k+1k+1 columns and the (k+1)(k+1)-th column of Ψ1()\Psi^{-1}(\cdot), respectively. For any 𝒪L2(D)n\mathcal{O}\in L_{2}(D)^{n}, 𝒪i\mathcal{O}_{i} is the iith element of 𝒪\mathcal{O}. Let 𝒪ifi{giL2(D)|gi+fi𝒪i}\mathcal{O}_{i}-f_{i}\triangleq\{g_{i}\in L_{2}(D)|g_{i}+f_{i}\in\mathcal{O}_{i}\} and 𝒪F{{gi}i𝒩L2(D)n|{gi+fi}i𝒩𝒪}\mathcal{O}-F\triangleq\{\{g_{i}\}_{i\in\mathcal{N}}\in L_{2}(D)^{n}|\{g_{i}+f_{i}\}_{i\in\mathcal{N}}\in\mathcal{O}\}.

We have

{(F)𝒪}\displaystyle\mathbb{P}\{\mathcal{M}(F)\in\mathcal{O}\} (12)
={{ηi}i𝒩Ψ1(𝒪F)}\displaystyle=\mathbb{P}\{\{\eta_{i}\}_{i\in\mathcal{N}}\in\Psi^{-1}(\mathcal{O}-F)\}
=limKΨ0:K1(𝒪F)k=0K𝒫𝜼k(𝒚k)d𝒚0d𝒚K\displaystyle=\lim_{K\rightarrow\infty}\int_{\Psi_{0:K}^{-1}(\mathcal{O}-F)}\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{y}_{k})d\boldsymbol{y}_{0}\cdots d\boldsymbol{y}_{K}

and

{(F)𝒪}\displaystyle\mathbb{P}\{\mathcal{M}(F^{\prime})\in\mathcal{O}\} (13)
={{ηi}i𝒩Ψ1(𝒪F)}\displaystyle=\mathbb{P}\{\{\eta_{i}\}_{i\in\mathcal{N}}\in\Psi^{-1}(\mathcal{O}-F^{\prime})\}
=limKΨ0:K1(𝒪F)k=0K𝒫𝜼k(𝒚k)d𝒚0d𝒚K\displaystyle=\lim_{K\rightarrow\infty}\int_{\Psi_{0:K}^{-1}(\mathcal{O}-F^{\prime})}\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{y}_{k})d\boldsymbol{y}_{0}\cdots d\boldsymbol{y}_{K}

By the linearity of Φ\Phi, we have Φ1(𝒪ifi)=Φ1(𝒪ifi)+ξi\Phi^{-1}(\mathcal{O}_{i}-f^{\prime}_{i})=\Phi^{-1}(\mathcal{O}_{i}-f_{i})+\xi_{i} for all 𝒪L2(D)n\mathcal{O}\subseteq L_{2}(D)^{n} and i𝒩i\in\mathcal{N}, where ξiΦ1(fifi)\xi_{i}\triangleq\Phi^{-1}(f_{i}-f^{\prime}_{i}). Denoting 𝝃k=[ξ1k,,ξnk]T\boldsymbol{\xi}_{k}=[{\xi}_{1k},...,{\xi}_{nk}]^{T}, we have

Ψk1(𝒪F)=Ψk1(𝒪F)+𝝃k\Psi_{k}^{-1}(\mathcal{O}-F^{\prime})=\Psi_{k}^{-1}(\mathcal{O}-F)+\boldsymbol{\xi}_{k} (14)

Combing (13) and (14), we have

{(F)𝒪}\displaystyle\mathbb{P}\{\mathcal{M}(F^{\prime})\in\mathcal{O}\} (15)
=limKΨ0:K1(𝒪F)k=0K𝒫𝜼k(𝒚k+𝝃k)d𝒚0d𝒚K\displaystyle=\lim_{K\rightarrow\infty}\int_{\Psi_{0:K}^{-1}(\mathcal{O}-F)}\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{y}_{k}+\boldsymbol{\xi}_{k})d\boldsymbol{y}_{0}\cdots d\boldsymbol{y}_{K}

To prove that \mathcal{M} is (ϵ,δ)(\epsilon,\delta)-DP, it suffices to show the ratio of k=0K𝒫𝜼k(𝜼k)\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{\eta}_{k}) over k=0K𝒫𝜼k(𝜼k+𝝃k)\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{\eta}_{k}+\boldsymbol{\xi}_{k}) is bounded by eϵe^{\epsilon} with probability at least (1δ)(1-\delta).

We know that

k=0K𝒫𝜼k(𝜼k)𝒫𝜼k(𝜼k+𝝃k)\displaystyle\prod_{k=0}^{K}\frac{\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{\eta}_{k})}{\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{\eta}_{k}+\boldsymbol{\xi}_{k})} (16)
=exp{k=0K2𝝃kT𝜼k+𝝃kT𝝃k4σk2}\displaystyle=\exp\left\{\sum_{k=0}^{K}\frac{2\boldsymbol{\xi}_{k}^{T}\mathcal{L}^{\dagger}\boldsymbol{\eta}_{k}+\boldsymbol{\xi}_{k}^{T}\mathcal{L}^{\dagger}\boldsymbol{\xi}_{k}}{4\sigma_{k}^{2}}\right\}
exp{1μ¯()(k=0K𝝃kT𝜼k2σk2+k=0K𝝃k24σk2)}\displaystyle\leq\exp\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\sum_{k=0}^{K}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}+\sum_{k=0}^{K}\frac{||\boldsymbol{\xi}_{k}||^{2}}{4\sigma_{k}^{2}})\right\}

Denote

Ratexp{1μ¯()(k=0𝝃kT𝜼k2σk2+k=0𝝃k24σk2)}.\textbf{Rat}\triangleq\exp\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\sum_{k=0}^{\infty}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}+\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{4\sigma_{k}^{2}})\right\}.

We show that Rat is bounded with certain probability.

Since {f}i𝒩\{f\}_{i\in\mathcal{N}} and {f}i𝒩\{f^{\prime}\}_{i\in\mathcal{N}} only differ in one element, 𝝃k\boldsymbol{\xi}_{k} has at most one non-zero element. Noting that 𝜼k\boldsymbol{\eta}_{k} is random and drawn from N(0n,2σk2)N^{\dagger}(0_{n},2\sigma_{k}^{2}\mathcal{L}), 𝝃kT𝜼k2σk2\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}} is a univariate Gaussian random variable. From (10), each element of 𝜼k\boldsymbol{\eta}_{k} is at most of variance 2σk2μ¯2\sigma_{k}^{2}\bar{\mu}_{\mathcal{L}}. Thus, k=0𝝃kT𝜼k2σk2\sum_{k=0}^{\infty}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}} is a univariate Gaussian random variable with variance less than or equal to μ¯()2k=0𝝃k2σk2\frac{\bar{\mu}(\mathcal{L})}{2}\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{\sigma^{2}_{k}}.

We further bound the summation term in its variance following a similar way in the proof of Theorem V.2 in [13]:

k=0𝝃k2σk2\displaystyle\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{\sigma_{k}^{2}} =k=0kq𝝃k2kqσk2\displaystyle=\sum_{k=0}^{\infty}\frac{k^{q}||\boldsymbol{\xi}_{k}||^{2}}{k^{q}\sigma_{k}^{2}} (17)
(k=01(kqσk2)2)12(k=0(kq𝝃k2)2)12\displaystyle\leq(\sum_{k=0}^{\infty}\frac{1}{(k^{q}\sigma_{k}^{2})^{2}})^{\frac{1}{2}}(\sum_{k=0}^{\infty}(k^{q}||\boldsymbol{\xi}_{k}||^{2})^{2})^{\frac{1}{2}}
=1γζ(2(qp))fIfI𝒱q2\displaystyle=\frac{1}{\gamma}\sqrt{\zeta(2(q-p))}||f_{I}-f_{I}^{\prime}||^{2}_{\mathcal{V}_{q}}
A.\displaystyle\triangleq A.

Let RR be an arbitrary positive real number. When k=0𝝃kT𝜼k2σk2Rμ¯()2k=0𝝃k2σk2\sum_{k=0}^{\infty}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}\leq R\sqrt{\frac{\bar{\mu}(\mathcal{L})}{2}\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{\sigma^{2}_{k}}} holds, we have

Rat exp{1μ¯()(Rk=0𝝃k2σk2+14k=0𝝃k2σk2)}\displaystyle\leq\exp\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(R\sqrt{\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{\sigma^{2}_{k}}}+\frac{1}{4}\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{\sigma^{2}_{k}})\right\} (18)
exp{1μ¯()(A4+Rμ¯()A2)}.\displaystyle\leq\exp\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\frac{A}{4}+\frac{R\sqrt{\bar{\mu}(\mathcal{L})A}}{\sqrt{2}})\right\}.

By adopting the Chernoff bound for Gaussian random variable, we have

{k=0𝝃kT𝜼k2σk2Rμ¯()2k=0𝝃k2σk2}\displaystyle\mathbb{P}\left\{\sum_{k=0}^{\infty}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}\geq R\sqrt{\frac{\bar{\mu}(\mathcal{L})}{2}\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{\sigma^{2}_{k}}}\right\} eR22.\displaystyle\leq e^{-\frac{R^{2}}{2}}. (19)

Namely, Rat is bounded by eϵe^{\epsilon} with probability (1δ)(1-\delta), where ϵ={1μ¯()(A4+Rμ¯()A2)}\epsilon=\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\frac{A}{4}+\frac{R\sqrt{\bar{\mu}(\mathcal{L})A}}{\sqrt{2}})\right\}, δ=eR22\delta=e^{-\frac{R^{2}}{2}}, and A=1γζ(2(qp))fIfI𝒱q2A=\frac{1}{\gamma}\sqrt{\zeta(2(q-p))}||f_{I}-f_{I}^{\prime}||^{2}_{\mathcal{V}_{q}}.

Therefore, under proper parameter settings, the mechanism \mathcal{M} is (ϵ,δ)(\epsilon,\delta)-DP with ϵ,δ\epsilon,\delta defined above. ∎

Remark IV.2.

Several factors contribute to the privacy parameter ϵ\epsilon. First, the communication graph topology affects μ¯()\bar{\mu}(\mathcal{L}) and μ¯()\underline{\mu}(\mathcal{L}). Since μ¯()\bar{\mu}(\mathcal{L}) is bounded by 2(G)2\triangle(G), where (G)\triangle(G) is the maximum degree, and μ¯()\underline{\mu}(\mathcal{L}) represents the algebraic connectivity affected by the graph connectivity, a strongly and evenly connected graph helps EFPSN to achieve the best privacy performance. In addition, the parameters γ,q,p\gamma,q,p need to be deliberately chosen to decrease ϵ\epsilon. Since γ\gamma is the noise magnitude, it is natural that a larger γ\gamma contributes to a smaller ϵ\epsilon and, consequently, better privacy.

Remark IV.3.

The term fIfI𝒱q2||f_{I}-f_{I}^{\prime}||^{2}_{\mathcal{V}_{q}} measures the privacy-preserving capacity of EFPSN from a different perspective. Given some privacy budget ϵ\epsilon, the larger it is, the more functions can be protected by our mechanism. Additionally, this term is inversely proportional to γ\gamma, suggesting protecting a larger dataset requires more noise.

Remark IV.4.

Choosing arbitrarily large R,γR,\gamma while keeping Rγo(1)\frac{R}{\sqrt{\gamma}}\sim o(1) results in arbitrarily small ϵ\epsilon and δ\delta simultaneously. Namely, we are able to attain a nearly perfectly private mechanism while the algorithmic accuracy is not sacrificed at the same time.

V Numerical Experiments

In this section, we evaluate the performance of the proposed EFPSN algorithm using numerical experiments. We consider both convex and non-convex objective functions, and EFPSN is compared with the non-zero-sum functional perturbation mechanism proposed in [13]. Before displaying the results, we first demonstrate how the orthonormal system {ek}kI\{e_{k}\}_{k\in I} based on which we generate the coefficients-function mapping Φ\Phi, is constructed.

V-A Generating the Orthonormal System

Since the number of elements of an orthonormal basis grows factorially as the number of variables increases, even the simplest real-world applications, such as conducting logistic regression on pictures, require at least hundreds of parameters, making the number of elements in a basis prohibitively large. Consequently, for a function with MM variables, we pick m<Mm<M variables to perturb. Also, instead of generating an orthonormal basis, we only generate an orthonormal system in L2L_{2} with NN elements.

Since the generated orthonormal system must belong to some orthonormal basis, perturbing the orthonormal system is equivalent to perturbing the orthonormal basis with some noise coefficient being 0. Therefore, all of our previous discussion on accuracy and privacy holds under such a perturbing mechanism.

The orthonormal system is constructed from the Gram-Schimidt orthonormalization of the Taylor functions. Given the tuple (K,m,N)(K,m,N), we randomly generate NN elements of Taylor functions of mm variables, of which the sum of the order is smaller than or equal to KK. Then we orthonormalizaing all terms using Gram-Schimidt method.

We present an example of one perturbing function when (K,m,N)=(3,2,5)(K,m,N)=(3,2,5) under noise level γ=1\gamma=1. The orthonormal system is {0.5,0.866x2,3.307x231.984x2,0.866x1,2.905x12x20.968x2}\{0.5,0.866x_{2},3.307x_{2}^{3}-1.984x_{2},0.866x_{1},2.905x_{1}^{2}x_{2}-0.968x_{2}\} and the noise sequence is {0.180,0.628,0.374,0.817,2.015}\{0.180,0.628,-0.374,0.817,2.015\}. The corresponding perturbing function is 5.853x12x22.137x23+0.708x11.310x2+0.0905.853x_{1}^{2}x_{2}-2.137x_{2}^{3}+0.708x_{1}-1.310x_{2}+0.090. We visualize the perturbing function in Fig. 1.

Refer to caption
Figure 1: Visualization of a 2d perturbing function.

V-B Accuracy Test

In this part, we validate the accuracy of the proposed EFPSN method.

V-B1 Convex Case

We consider a classification task on MNIST dataset using logistic regression.

We implement the logistic regression model using PyTorch. Specifically, since the picture in MNIST is of size 28×2828\times 28 and of channel 1, there is only one linear layer in the model, of which the input and output dimensions are 784 and 10 respectively. Together with bias, the model has 7850 parameters. We set the perturbing parameters (K,m,N)=(1,10,10)(K,m,N)=(1,10,10). And the experiments are conducted under different noise level γ{1e2,1e1,1e0,1e1,1e2,1e3,1e4}\gamma\in\{1e^{-2},1e^{-1},1e^{0},1e^{1},1e^{2},1e^{3},1e^{4}\}.

Consider 5 agents in the network, connected as shown in Fig. 2. Each agent holds the same number of randomly assigned training data points. We adopt decentralized stochastic gradient descent in Phase II in Alg. 1. The batch size is set to 6464. The initial learning rate equals 0.20.2. Each agent conducts 1000010000 gradient updates. The learning rate remains fixed in the first 20002000 steps, and drops to 4e54e^{-5} at the last step.

Refer to caption
Figure 2: Network structure between agents.

The result is shown in Fig. 3. In Fig. 3(a), the horizontal axis displays different noise magnitude γ\gamma, and the vertical axis represents the deviation from the optimal solution, which is 𝐱¯𝐱||\overline{\mathbf{x}}-\mathbf{x}^{\ast}||. Specifically, 𝐱¯=1ni𝒩𝐱i\overline{\mathbf{x}}=\frac{1}{n}\sum_{i\in\mathcal{N}}\mathbf{x}_{i}, and 𝐱\mathbf{x}^{\ast} is the solution generated by centralized gradient descent.

When γ=1e2\gamma=1e^{-2}, the results from EFPSN and the non-zero-sum algorithm are nearly identical, both close to the noise-free case. However, as γ\gamma increases, the solution obtained from the non-zero-sum method starts to deviate quickly from optima, and such deviation spikes at a roughly constant rate. The blue line (EFPSN solution) does not rise until γ=1e3\gamma=1e^{3}, at which point the orange line is 4 magnitudes higher.

The rise in the blue line stems from the slight disagreements between local decision variables 𝐱i\mathbf{x}_{i}. Note that our perturbed function only guarantees zero-sum when each agent holds the same decision variable. Our previous error analysis assumes such exact agreement between agents. Yet, in practice, slight differences between 𝐱i\mathbf{x}_{i} exist due to finite step size. With EFPSN, one can always generate a more accurate solution using a finer learning rate.

Fig. 3(b) demonstrates the classification accuracy of the logistic model under different algorithms and noise levels. The pattern matches that of Fig. 3(a). Basically, for the non-zero-sum algorithm, the test accuracy starts to drop dramatically when γ\gamma reaches 1e01e^{0}. In contrast, the model trained by EFPSN remains as accurate as the noise-free case until γ=1e4\gamma=1e^{4}. Namely, EFPSN provides much more (at least 4 magnitudes larger) privacy budgets while not degenerating accuracy.

Refer to caption
(a) Deviation
Refer to caption
(b) Accuracy
Figure 3: Deviation and Accuracy for Logistic Regression

V-B2 Non-convex Case

To further justify our method under the non-convex setting, we consider an image classification task on MNIST using Convolutional Neural Network. We adopt the classic LeNet [19], which consists of 13426 parameters. We choose to perturb the bias of its last linear layer, and again we set (K,m,N)=(1,10,10)(K,m,N)=(1,10,10). Except for the model, all the settings are identical to the convex case.

Refer to caption
(a) Squared Norm of Average Gradient
Refer to caption
(b) Accuracy
Figure 4: Squared Norm of Average Gradient and Accuracy for LeNet

In fig. 4(a), the y-axis depicts the squared norm of the average gradient of all the agents, i𝒩fi(xik)|𝒩|2||\frac{\sum_{i\in\mathcal{N}}\nabla f_{i}(x_{i}^{k})}{|\mathcal{N}|}||^{2}. Trends similar to the convex case reappear in the non-convex case. Unlike the non-zero-sum method, EFPSN generates solutions closer to the stationary point under all noise levels. Also, in fig. 4(b), EFPSN remains as accurate as the noise-free case even when γ=1e4\gamma=1e^{4}. The non-zero-sum method, nonetheless, could barely hold its accuracy as γ=1e0\gamma=1e^{0}.

The experiments imply the efficacy of our method under both convex and non-convex problems.

V-C Privacy Test

To further validate EFPSN’s efficacy in privacy-preserving, we conduct DLG [7] attack on agents with zero-sum and non-zero-sum noise, respectively. The general idea of DLG is to construct some dummy data and try to match its gradient with the ground truth. The algorithm is shown in alg. 2. For better results, we use iDLG [20], a more efficient and stable version of DLG.

Algorithm 2 Deep Leakage from Gradients
1:F(𝐱,W)F(\mathbf{x},W): Differentiable model; WW: parameter weights; W\nabla W : gradients calculated by training data
2:Private training data 𝐱,𝐲\mathbf{x},\mathbf{y}
3:𝐱1N(0,1),𝐲1N(0,1)\mathbf{x}^{\prime}_{1}\leftarrow N(0,1),\mathbf{y}^{\prime}_{1}\leftarrow N(0,1)
4:for i1i\leftarrow 1 to nn do
5:     Wil(F(𝐱i,Wt),𝐲i)/Wt\nabla W_{i}^{\prime}\leftarrow\partial l(F(\mathbf{x}^{\prime}_{i},W_{t}),\mathbf{y}_{i}^{\prime})/\partial W_{t}
6:     𝔻iWiW2\mathbb{D}_{i}\leftarrow||\nabla W_{i}^{\prime}-\nabla W||^{2}
7:     𝐱i+1𝐱iα𝐱i+1𝔻i\mathbf{x}_{i+1}^{\prime}\leftarrow\mathbf{x}_{i}^{\prime}-\alpha\nabla_{\mathbf{x}_{i+1}^{\prime}}\mathbb{D}_{i}
8:     𝐲i+1𝐲iα𝐲i+1𝔻i\mathbf{y}_{i+1}^{\prime}\leftarrow\mathbf{y}_{i}^{\prime}-\alpha\nabla_{\mathbf{y}_{i+1}^{\prime}}\mathbb{D}_{i}
9:end for
10:return 𝐱n+1,𝐲n+1\mathbf{x}_{n+1}^{\prime},\mathbf{y}_{n+1}^{\prime}

With either EFPSN or the non-zero-sum noise method, each agent receives some non-zero functional perturbation of roughly the same magnitude. Since iDLG is carried out at the agent level, it makes no difference between the two algorithms. Therefore, we conduct iDLG on agent 1 in fig. 2, with the noise generated from EFPSN at different noise level (γ{1e1,1e2,1e3,1e4}\gamma\in\{1e^{1},1e^{2},1e^{3},1e^{4}\}).

Specifically, we assume the mixing matrix is known to the attacker. And the attacker has access to at least one of the communication channels connected to agent 1 (either eavesdropping or corrupting 1’s neighbor will do). Therefore, the attacker knows agent 1’s perturbed gradient f^1(x1k)\nabla\hat{f}_{1}(x_{1}^{k}) and agent 1’s decision parameters x1kx_{1}^{k}. The attacker does not know the functional perturbation Φ(η¯i)\Phi(\bar{\eta}_{i}). Consequently, the true gradient f1(x1k)\nabla f_{1}(x_{1}^{k}) remains unrevealed. Namely, the attacker is trying to recover the raw data using inexact gradient information. And the larger the noise level γ\gamma, the more inexact the gradient is.

Fig. 5 and fig. 6 represent the iDLG attacker’s typical inference result on the Logistic model and LeNet. The top left subfigure is the raw data. And the remaining subfigures are the adversary’s estimate of the raw data at different iterations (from 0 to 240). As γ\gamma increases, the retrieved picture becomes blurred. Interestingly, though we are perturbing the original problem functionally, it is equivalent to directly perturbing the dataset. Generally, after γ1e3\gamma\geq 1e^{3}, the recovered picture is unrecognizable for humans. When γ1e3\gamma\geq 1e^{3}, however, the accuracy of the model trained by the non-zero-sum method has dropped below 10% for both convex and non-convex problems (as shown in fig. 3 and fig. 4). This suggests that the non-zero-sum solution would be too inaccurate to provide enough privacy. In contrast, the EFPSN solution has comparable accuracy to the noise-free case. Thus, EFPSN is capable of preserving privacy without degenerating accuracy.

Refer to caption
(a) γ=1e1\gamma=1e^{1}
Refer to caption
(b) γ=1e2\gamma=1e^{2}
Refer to caption
(c) γ=1e3\gamma=1e^{3}
Refer to caption
(d) γ=1e4\gamma=1e^{4}
Figure 5: iLDG attacker’s inference results on Logistic Regression
Refer to caption
(a) γ=1e1\gamma=1e^{1}
Refer to caption
(b) γ=1e2\gamma=1e^{2}
Refer to caption
(c) γ=1e3\gamma=1e^{3}
Refer to caption
(d) γ=1e4\gamma=1e^{4}
Figure 6: iLDG attacker’s inference results on LeNet

VI Conclusion

In the paper, we proposed the Encrypted Functional Perturbation with Structured Noise algorithm that solves the decentralized optimization problem 1 privately and accurately. Given exact consensus between agents, EFPSN could eliminate the privacy-accuracy trade-off by constructing a zero-sum functional perturbation. Since such construction requires secure communication between agents, we adopt the Paillier encryption scheme to fight against eavesdropping attackers. We rigorously proved the privacy property of EFPSN under the differential privacy framework. Simulations confirmed the efficacy of EFPSN in protecting privacy while maintaining accuracy.

References

  • [1] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
  • [2] S. Pu and A. Nedić, “Distributed stochastic gradient tracking methods,” Mathematical Programming, vol. 187, no. 1, pp. 409–457, 2021.
  • [3] S. Pu, A. Garcia, and Z. Lin, “Noise reduction by swarming in social foraging,” IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4007–4013, 2016.
  • [4] T. Yan, Y. Gu, T. He, and J. A. Stankovic, “Design and optimization of distributed sensing coverage in wireless sensor networks,” ACM Transactions on Embedded Computing Systems (TECS), vol. 7, no. 3, pp. 1–40, 2008.
  • [5] R. Murphey and P. M. Pardalos, Cooperative control and optimization, vol. 66. Springer Science & Business Media, 2002.
  • [6] A. G. Roy, S. Siddiqui, S. Pölsterl, N. Navab, and C. Wachinger, “Braintorrent: A peer-to-peer environment for decentralized federated learning,” arXiv preprint arXiv:1905.06731, 2019.
  • [7] L. Zhu, Z. Liu, and S. Han, “Deep Leakage from Gradients,” p. 11.
  • [8] J. Cortes, G. E. Dullerud, S. Han, J. Le Ny, S. Mitra, and G. J. Pappas, “Differential privacy in control and network systems,” in 2016 IEEE 55th Conference on Decision and Control (CDC), (Las Vegas, NV), pp. 4252–4272, IEEE, Dec. 2016.
  • [9] C. Dwork and A. Roth, “The Algorithmic Foundations of Differential Privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–407, 2013.
  • [10] X. Chen, L. Huang, L. He, S. Dey, and L. Shi, “A Differential Private Method for Distributed Optimization in Directed Networks via State Decomposition,” p. 8, 2021.
  • [11] Y. Lou, L. Yu, and S. Wang, “Privacy Preservation in Distributed Subgradient Optimization Algorithms,” arXiv:1512.08822 [cs, math], Dec. 2015. arXiv: 1512.08822.
  • [12] Z. Huang, S. Mitra, and N. Vaidya, “Differentially Private Distributed Optimization,” in Proceedings of the 2015 International Conference on Distributed Computing and Networking, (Goa India), pp. 1–10, ACM, Jan. 2015.
  • [13] E. Nozari, P. Tallapragada, and J. Cortes, “Differentially Private Distributed Convex Optimization via Functional Perturbation,” IEEE Transactions on Control of Network Systems, vol. 5, pp. 395–408, Mar. 2016.
  • [14] Y. Lu and M. Zhu, “Privacy preserving distributed optimization using homomorphic encryption,” Automatica, vol. 96, pp. 314–325, Oct. 2018.
  • [15] C. Zhang and Y. Wang, “Enabling Privacy-Preservation in Decentralized Optimization,” IEEE Transactions on Control of Network Systems, vol. 6, pp. 679–689, June 2019.
  • [16] N. Gupta, S. Gade, N. Chopra, and N. H. Vaidya, “Preserving Statistical Privacy in Distributed Optimization,” IEEE Control Systems Letters, vol. 5, pp. 779–784, July 2021.
  • [17] P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in International conference on the theory and applications of cryptographic techniques, pp. 223–238, Springer, 1999.
  • [18] B. Fulton, “Review of introduction to modern cryptography by jonathan katz and yehuda lindell publisher: Chapman; hall-crc 2008 1-58488-551-3,” ACM SIGACT News, vol. 41, no. 4, p. 44–47, 2010.
  • [19] Y. LeCun, L. Bottou, Y. Bengio, and P. Ha, “Gradient-Based Learning Applied to Document Recognition,” p. 46, 1998.
  • [20] B. Zhao, K. R. Mopuri, and H. Bilen, “idlg: Improved deep leakage from gradients,” CoRR, vol. abs/2001.02610, 2020.