Private and Accurate Decentralized Optimization via
Encrypted and Structured Functional Perturbation

Yijie Zhou and Shi Pu *This work was supported in parts by National Natural Science Foundation of China (NSFC) (Grant No. 62003287), Shenzhen Science and Technology Program (Grant No. RCYX20210609103229031 and No. GXWD20201231105722002-20200901175001001) and Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS) (Grant No. AC01202101108).Y. Zhou is with School of Data Science, The Chinese University of Hong Kong, Shenzhen, China. S. Pu is with School of Data Science, Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen, China ([email protected], [email protected]).

Abstract

We propose a decentralized optimization algorithm that preserves the privacy of agents’ cost functions without sacrificing accuracy, termed EFPSN. The algorithm adopts Paillier cryptosystem to construct zero-sum functional perturbations. Then, based on the perturbed cost functions, any existing decentralized optimization algorithm can be utilized to obtain the accurate solution. We theoretically prove that EFPSN is ( $\epsilon,\delta$ )-differentially private and can achieve nearly perfect privacy under deliberate parameter settings. Numerical experiments further confirm the effectiveness of the algorithm.

I Introduction

The problem of optimizing a global objective function through the cooperation of multiple agents has gained increased attention in recent years [1, 2]. This is driven by the wide applicability of the problem to many engineering and scientific domains, ranging from cooperative control, distributed sensing, multi-agent systems, and sensor networks to large-scale machine learning; see, e.g., [3, 4, 5, 6].

In this paper, we consider a peer-to-peer network of $n$ agents that solves the following optimization problem cooperatively:

\min_{x\in\mathcal{D}}F(x)=\frac{1}{n}\sum_{i\in\mathcal{N}}f_{i}(x),

(1)

where $x$ is the common parameter of all the agents, $\mathcal{D}$ is the domain of $x$ , and $\mathcal{N}$ denotes the collection of all the agents.

Despite the enormous success of gradient-based distributed optimization algorithms, they all require agents to explicitly share optimization variables and/or gradient estimates in every iteration. This would become a problem in applications involving sensitive data. Zhu et al. [7] showed that obtaining private training data from publicly shared gradients is possible, and the recovery is pixelwise accurate for images and token-wise matching for texts. Consequently, we wish to solve (1) in a private way.

In the machine learning community, one of the common forms of the individual function is $f_{i}(x)\triangleq\mathbb{E}_{\xi_{i}\sim\mathcal{D}_{i}}[l_{i}(x,\xi_{i})]$ , where $\mathcal{D}_{i}$ is the local data distribution of agent $i$ , and $\xi_{i}$ is a data sample or a batch of data samples. In such a case, each function $f_{i}$ contains information on the data distribution $\mathcal{D}_{i}$ , which is usually sensitive. It is thus crucial to keep the objective functions private. More specifically, by privacy, we refer to keeping all $f_{i}$ from being inferred by adversaries. In this work, we consider two types of adversaries:

•

The eavesdropper: an external adversary having access to all information transmitted through the communication channels within the network.
•

The curious-but-honest adversary: an external adversary that corrupts a subset of the agents. The adversary knows all the information of every corrupted agent $i$ , including all the information within agent $i$ , e.g., the individual function $f_{i}$ , and all the information passed from its neighbor to agent $i$ . However, each corrupted agent obeys the optimization protocol precisely.

Plenty of efforts have been reported to counteract potential privacy breaches in distributed optimization. Privacy preserving algorithms either process the messages being transmitted between agents or change the functions to be optimized. We refer to them as message-based and function-based methods, respectively. Differential Privacy (DP) [8, 9], a de facto standard for privacy serving, has been introduced to the context of distributed optimization [10, 11, 12, 13]. The DP-based methods can be categorized into message-perturbing [10, 11, 12] and function-perturbing ones [13]. The message-perturbing methods inject noises to the messages each agent sends, while the latter adds functional noises into each agent’s cost function $f_{i}$ .

However, the direct combination of DP and distributed optimization suffers from the accuracy-privacy trade-off. Huang et al. [12] observed that, by fixing the other parameters, the accuracy level of the obtained solution is in the order of $O(\frac{1}{\epsilon^{2}})$ , where $\epsilon$ is the privacy budget that is inversely proportional to privacy. The papers [14, 15] adopted encryption techniques to preserve privacy in distributed optimization. Nevertheless, the heavy communication and computation overhead prevents the methods from many real-world applications.

Adding structured noise [16] is a workaround for the accuracy-privacy trade-off and the heavy overhead mentioned above. The paper [16] constructed a set of zero-sum Gaussian noises to perturb the affine terms of the objective functions that are assumed to be quadratic. Nonetheless, the method fails under an eavesdropping attack due to the naive communication method. Besides, the privacy analysis in [16] is carried out under a self-defined privacy framework. We provide a categorization of the current privacy-preserving distributed optimization algorithms in Table I.

Paper Index	[10, 11, 12]	[13]	[14, 15]	[16]	Ours
Message-based	✓	✗	✓	✗	✗
Function-based	✗	✓	✗	✓	✓
Structured-noise	✗	✗	✗	✓	✓
Encryption	✗	✗	✓	✗	✓
DP-based	✓	✓	✗	✗	✓

Table I: Categorization of existing privacy-preserving distributed optimization algorithms.

In this paper, we integrate the encryption-based scheme and the structured noise method deliberately under the functional DP framework. The proposed new method, termed the Encrypted Functional Perturbation with Structured Noise algorithm (EFPSN), enjoys the benefits of several previous methods. In particular, EFPSN adopts the Paillier encryption scheme to construct zero-sum noises among the agents secretly. The noises are subsequently used to generate functional perturbations for each agent’s cost function. Such a procedure differs from those in [14, 15] and bypasses the heavy communication and computation overhead caused by encryption at every iteration. Then, based on the perturbed cost functions, any existing decentralized optimization algorithm can be utilized to obtain the accurate solution to problem (1) thanks to the structured noises. We further theoretically prove that EFPSN is ( $\epsilon,\delta$ )-differentially private and can achieve nearly perfect privacy under deliberate parameter settings. In other words, EFPSN can achieve nearly perfect privacy without sacrificing accuracy.

The rest of this paper is organized as follows: Section 2 specifies the notation and provides related background knowledge. Then, we develop the Encrypted Functional Perturbation with Structured Noise algorithm in Section 3. Privacy analysis under the DP framework is carried out in Section 4. Finally, simulation examples and conclusion are presented in Section 5 and Section 6 correspondingly.

II Preliminaries

This section introduces the notation, graph-related concepts and some background knowledge on Hilbert space and Paillier Cryptosystem, since generating functional perturbations relies on the orthonormal systems in some Hilbert space, and Paillier Cryptosystem [17] is adopted to construct structured noises privately.

II-A Notation

We use $\mathbb{R}$ to denote the set of real numbers and $\mathbb{R}^{d}$ the Euclidean space of dimension $d$ . The space of scalar-valued inﬁnite sequences are denoted by $\mathbb{R}^{\mathbb{N}}$ . Let $\mathbb{Z},\mathbb{Z}_{>0}$ be the set of integers and positive integers, respectively. Given $w\in\mathbb{Z}_{>0}$ and $\mathbb{Z}_{w}\triangleq\{0,1,\cdots,w\}$ , $\mathbb{Z}_{w}^{\ast}$ denotes the set of positive integers which are smaller than $w$ and do not have common factors other than 1 with $w$ . Let $l_{2}\subset\mathbb{R}^{\mathbb{N}}$ be the space of infinite square summable sequences. For $D\subseteq\mathbb{R}^{d}$ , $L_{2}(D)$ denotes the set of square-integrable measurable functions over $D$ . $\mathbf{1}$ denotes a column vector with all entries equal to 1. A vector is viewed as a column vector unless otherwise stated. $A^{T}$ denotes the transpose of the matrix $A$ and $x^{T}y$ denotes the scalar product of two vectors $x$ and $y$ . We use $\langle\cdot,\cdot\rangle$ to denote the inner product and $||\cdot||$ to denote the Euclidean norm for a vector (induced Euclidean norm for a matrix). A square matrix $A$ is column-stochastic when its elements in every column add up to one. A matrix $A$ is said to be doubly stochastic when both $A$ and $A^{T}$ are column-stochastic matrices.

We use $\mathbb{P}\{\mathcal{A}\}$ to denote the probability of an event $\mathcal{A}$ , $\mathcal{P}_{X}(y)$ the probability density function of random variable $X$ , and $\mathbb{E}[X|\mathcal{F}]$ the expectation of a random variable $X$ with $\mathcal{F}$ denoting the sigma algebra, which will be omitted when clear from the context. For an encryption scheme, $\text{En}(\cdot),\text{De}(\cdot)$ represent the encoder and decoder respectively. Let $\gcd,\text{lcm},\mod$ be the greatest common divider, the least common multiple, and the modulo operator, respectively. $N(\mu,\sigma^{2})$ is the (multivariate) Gaussian distribution with (vector) mean $\mu$ and variance (covariance matrix) $\sigma^{2}$ . $N^{\dagger}$ represents the degenerate Gaussian distribution.

II-B Graph Related Concepts

We assume that the agents interact on an undirected graph, described by a matrix $W\in\mathbb{R}^{n\times n}$ . More specifically, if agents $i$ and $j$ can communicate and interact with each other, then $w_{ij}$ , the $(i,j)$ -th entry of $W$ , is positive. Otherwise, $w_{ij}$ equals zero. The neighbor set $\mathcal{N}_{i}$ of agent $i$ is defined as the set of agents $\{j|w_{ij}>0\}$ . Note that $i\in\mathcal{N}_{i}$ always holds. Denote $\mathcal{L}$ as the graph Laplacian depicted by $W$ . Let $\mu_{1}\leq\mu_{2}\leq\cdots\leq\mu_{n}$ be the eigenvalues of $\mathcal{L}$ and $M$ be the unitary matrix that satisfies $\mathcal{L}=M\text{Diag}(\mu_{1},...,\mu_{n})M^{T}$ . Denote $\underline{\mu}(\mathcal{L})$ as the second smallest eigenvalue of $\mathcal{L}$ and $\bar{\mu}(\mathcal{L})$ as the largest eigenvalue of $\mathcal{L}$ .

II-C Hilbert Spaces

A Hilbert space $\mathcal{H}$ is a complete inner-product space. A set $\{e_{k}\}_{k\in\mathbb{N}}\subset\mathcal{H}$ is an orthonormal system if $\langle e_{k},e_{j}\rangle=0$ for $k\neq j$ and $\langle e_{k},e_{k}\rangle=||e_{k}||^{2}=1$ for all $k\in\mathbb{N}$ . If, in addition, the set of linear combinations of $\{e_{k}\}_{k\in\mathbb{N}}$ is dense in $\mathcal{H}$ , then $\{e_{k}\}_{k\in I}$ is an orthonormal basis. If $\mathcal{H}$ is separable, then any orthonormal basis is countable, and we have

h=\sum_{k=0}^{\infty}\langle h,e_{k}\rangle e_{k},

(2)

for any $h\in\mathcal{H}$ . Define the coefficient sequence $\mathbb{\theta}\in\mathbb{R}^{\mathbb{N}}$ by $\theta_{k}=\langle h,e_{k}\rangle$ for $k\in\mathbb{N}$ . Then $\mathbb{\theta}\in l_{2}$ and, by Parseval’s identity, $||h||=||\mathbb{\theta}||$ . Let $\Phi:l_{2}\rightarrow\mathcal{H}$ be the linear bijection that maps the coefficient sequence $\mathbb{\theta}$ to $h$ . For an arbitrary $D\subseteq\mathbb{R}^{d},L_{2}(D)$ is a Hilbert space, and the inner product is the integral of the product of functions. Moreover, $L_{2}(D)$ is separable. In this paper, we denote $\{e_{k}\}_{k=0}^{\infty}$ as an orthonormal basis for $L_{2}(D)$ and $\Phi:l_{2}\rightarrow L_{2}(D)$ the corresponding linear bijection between coefficient sequences and functions.

II-D Paillier Cryptosystem

The Paillier Cryptosystem is an algorithm for public key cryptography. The algorithm applies to the scenario of sending a private message over open and insecure communication links, and it consists of key generation, encryption, and decryption steps as follows:

•
Key Generation
- –
  
  The message receiver chooses two large prime numbers $a$ and $b$ randomly and independently of each other such that $\gcd(ab,(a-1)(b-1))=1$ . This property is assured if both primes are of equal length [18].
- –
  
  Compute $f=ab$ and $\lambda=\text{lcm}(a-1,b-1)$
- –
  
  Seletect random integer $g$ , where $g\in\mathbb{Z}_{f^{2}}^{\ast}$ such that the modular multiplicative inverse $\mu=(\frac{(g^{\lambda}\mod f^{2})-1}{f})^{-1}\mod f$ exists.
- –
  
  Let the public key $\bar{\mathcal{K}}=(f,g)$ , the private key $\tilde{\mathcal{K}}=(\lambda,\mu)$ .
•

Encryption: To encrypt a plaintext $\underline{p}\in\mathbb{Z}_{f}$ , the sender selects a random number $r\in\mathbb{Z}_{f}^{\ast}$ and computes the ciphertext $\underline{c}=\text{En}(\underline{p},\bar{\mathcal{K}},r)=g^{\underline{p}}\cdot r^{f}\mod f^{2}$ .
•

Decryption: To decrypt the ciphertext $\underline{c}\in\mathbb{Z}_{f^{2}}$ , the receiver computes the decrypted text $\bar{\underline{p}}=\text{De}(\underline{c},\bar{\mathcal{K}},\tilde{\mathcal{K}})=\frac{(\underline{c}^{\lambda}\mod f^{2})-1}{f}\cdot\mu\mod f$ .

One notable homomorphic property the Paillier encryption scheme holds is: given any $\underline{p}_{1},\cdots,\underline{p}_{m}\in\mathbb{N}$ . If $\sum_{l=1}^{m}\underline{p}_{l}\in Z_{f},\text{then }\text{De}(\prod_{l=1}^{m}\text{En}(\underline{p}_{l},\bar{\mathcal{K}},r_{l}),\bar{\mathcal{K}},\tilde{\mathcal{K}})=\sum_{l=1}^{m}\underline{p}_{l}$ .

III Algorithm Design

In this section, we propose the Encrypted Functional Perturbation with Structured Noise algorithm (EFPSN in short) that privately solves the problem (1). Unlike the majority of privacy-preserving algorithms, EFPSN does not sacrifice accuracy for privacy.

To achieve privacy, EFPSN adds structured functional perturbations on individual cost functions $\{f_{i}\}_{i\in N}$ . Specifically, the algorithm aims at making the functional perturbations zero-sum. EFPSN consists of two phases, as shown in Algorithm 1.

Algorithm 1 Encrypted Functional Perturbation with Structured Noise

1:Cost function

\{f_{i}\}_{i\in\mathcal{N}}

, noise precision order

P

, perturbation order

K

, and

\{\sigma_{k}\}_{k\in\mathbb{Z}_{K}}

x^{*}

4:Phase 1 – Masking cost functions

6:for

i\in\mathcal{N}

7: Generate key pair

(\bar{\mathcal{K}}_{i},\tilde{\mathcal{K}}_{i})

, and

r_{i}

following the Paillier encryption scheme

8: Share public key

\bar{\mathcal{K}}_{i}

with agent

j\in\mathcal{N}_{i}

9:end for

10:for

i\in\mathcal{N}

11: for

(j,k)\in\mathcal{N}_{i}\times\mathbb{Z}_{K}

12: Generate random noise

\eta_{ijk}\sim N(0,\sigma_{k}^{2})

13: Calculate

\underline{c}_{ijk}=\text{En}(\lfloor 10^{P}\eta_{ijk}\rfloor,\bar{\mathcal{K}}_{j},r_{i})

14: Send

\underline{c}_{ijk}

to agent

j

15: end for

16: for

k\in\mathbb{Z}_{K}

17: Calculate

\bar{\eta}_{ik}=\sum_{j\in\mathcal{N}_{i}}\eta_{ijk}-

\quad\quad 10^{-P}\text{De}(\prod_{j\in\mathcal{N}_{i}}\underline{c}_{jik},\bar{\mathcal{K}}_{i},\tilde{\mathcal{K}}_{i})

18: end for

19:

\hat{f}_{i}=\Phi(\Phi^{-1}(f_{i})+\bar{\eta}_{i})=f_{i}+\Phi(\bar{\eta}_{i}),

\quad\bar{\eta}_{i}=[\bar{\eta}_{i0},...,\bar{\eta}_{iK},0,...]\in\mathbb{R}^{\mathbb{N}}

20:end for

21:

22:Phase 2 – Distributed optimization

23:

24:Execute any distributed optimization algorithm on the masked functions

\{\hat{f}_{i}\}_{i\in\mathcal{N}}

In phase I, the agents in the network cooperate to generate functional perturbations in a way that is immune to eavesdropping attacks and honest-but-curious attacks to some extent. First, they generate keys and random numbers required by the Paillier encryption scheme. Then, they encrypt and send random noise $\lfloor 10^{P}\eta_{ijk}\rfloor$ to their neighbors, and the noises are further decrypted to construct the zero-sum perturbation. Due to encryption, the signals are transferred privately and securely under eavesdropping. However, since Paillier encryption only works for integers, we set a precision order $P$ , and encrypt and decrypt $\lfloor 10^{P}\eta_{ijk}\rfloor$ .

Subsequently, in Line 10, each agent $i$ calculates $\bar{\eta}_{ik}$ by subtracting the sum of noises it receives from the sum of noises it sends. Due to Paillier encryption’s homomorphic property, each agent only needs to decode once for each $k\in\mathbb{Z}_{K}$ . This saves computation, especially when each agent has a large number of neighbors.

Paillier encryption scheme guarantees privacy under eavesdropping attacks. In terms of the honest-but-curious attacks, the noise coefficient sequence $\bar{\eta}_{i}$ of each agent $i\in\mathcal{N}$ remains unknown to the attacker as long as the attacker does not corrupt all $i$ ’s neighbors. Under such circumstances, the privacy of agent $i$ is maintained.

It is worth noting that $\sum_{i\in\mathcal{N}}\sum_{j\in\mathcal{N}_{i}}(\eta_{ijk}-\eta_{jik})=0$ for all $k$ . Such a construction forces $\lim_{P\rightarrow\infty}\sum_{i}\bar{\eta}_{ik}=0$ to hold for all $k$ . Therefore, we have generated a set of zero-sum signals $\{\bar{\eta}_{ik}\}_{i\in\mathcal{N}}$ given large $P$ .

Finally in Line 12, the agents perturb the cost functions by adding $\Phi(\bar{\eta}_{i})$ . $\Phi(\cdot)$ is a function that maps a sequence in $l_{2}$ to a function in $L_{2}(D)$ . Such a construction depends on the orthonormal system $\{e_{k}\}_{k\in I}$ in $\mathcal{H}$ one chooses. For instance, given the orthonormal system $\{e_{k}\}_{k\in I}$ and a sequence $\bar{\eta}_{i}\triangleq\{\bar{\eta}_{ik}\}_{k\in I}$ , $\Phi(\bar{\eta}_{i})=\sum_{k\in I}\bar{\eta}_{ik}e_{k}$ . The orthonormal system we use will be specified later.

Since $\{\bar{\eta}_{i}\}_{i\in\mathcal{N}}$ is zero-sum when $P\rightarrow\infty$ , we have

$\displaystyle\lim_{P\rightarrow\infty}\sum_{i}\Phi(\bar{\eta}_{i})$	$\displaystyle=\lim_{P\rightarrow\infty}\sum_{i}\sum_{k}\bar{\eta}_{ik}e_{k}$	(3)
	$\displaystyle=\sum_{k}\lim_{P\rightarrow\infty}(\sum_{i}\bar{\eta}_{ik})e_{k}$
	$\displaystyle=0.$

Therefore, the set of perturbing functions $\{\Phi(\bar{\eta}_{i})\}_{i\in\mathcal{N}}$ is zero-sum when $P\rightarrow\infty$ . Additionally, the decrypted text in Line 10 is of precision $10^{-P}$ . Consequently, the error brought by $P$ will be dominated by the floating-point error once $P$ is set to a moderately large value. Though $\{\Phi(\bar{\eta}_{i})\}_{i\in\mathcal{N}}$ is zero-sum, each agent $i$ gains privacy from the non-zero functional perturbation $\Phi(\bar{\eta}_{i})$ .

In phase II, the agents may conduct any distributed optimization algorithm on $\{\hat{f}_{i}\}_{i\in\mathcal{N}}$ . Since $\sum_{i}\hat{f}_{i}(x)=\sum_{i}f_{i}(x)$ , the obtained solution solves the original problem (1) when $P\rightarrow\infty$ . Namely, EFPSN solves problem (1) without any accuracy degeneration given proper $P$ .

Remark III.1.

EFPSN combines encryption, functional perturbation, and structured noise and is superior to considering any one of the techniques. In previous message-based methods, encryption at every iteration results in heavy communication and computation overhead, whereas the function-level encryption in EFPSN alleviates such a pain: only insignificant communication and computation overhead occur in phase I. Regarding the existing function-based methods, the obtained solution after functional perturbation suffers from a privacy-related deviation from the solution to problem (1), and thus leads to privacy-accuracy tradeoff. For EFPSN, however, the optimization accuracy is independent of the privacy budget, which will be elaborated more in the following sections. Moreover, using structured noise alone fails with the presence of eavesdropping, limiting its applicability.

IV Privacy Analysis

In this section, we analyse the privacy-related property of EFPSN under the framework of differential privacy. Particularly, we prove the mechanism of generating the masked functions is differentially private. And since DP is compatible with any post-processing, EFPSN remains differentially private.

First, we introduce the definition of $\mathcal{V}$ -adjacency, which was originally proposed in [13].

Definition 1 ( $\mathcal{V}$ -adjacency).

Given any normed vector space $(\mathcal{V},||\cdot||_{\mathcal{V}})$ with $\mathcal{V}\subseteq L_{2}(D)$ , two sets of functions $F=\{f_{i}\}_{i\in\mathcal{N}},F^{\prime}=\{f^{\prime}_{i}\}_{i\in\mathcal{N}}\subset L_{2}(D)$ are $\mathcal{V}$ -adjacent if there exists $I\in\mathcal{N}$ such that

f_{i}=f_{i}^{\prime},i\neq I,\text{ and }f_{I}-f_{I}^{\prime}\in\mathcal{V}.

(4)

We adopt the standard $(\epsilon,\delta)$ -DP definition under our functional setting.

Definition 2 ( $(\epsilon,\delta)$ -Differential Privacy).

Consider a random map $\mathcal{M}:L_{2}(D)^{n}\rightarrow\mathcal{X}$ from the function space $L_{2}(D)^{n}$ to an arbitrary set $\mathcal{X}$ . Given $\epsilon,\delta\in\mathbb{R}_{\geq 0}$ , the map $\mathcal{M}$ is $(\epsilon,\delta)$ -differentially private if, for any two $\mathcal{V}$ -adjacent sets of functions $F=\{f_{i}\}_{i\in\mathcal{N}}$ and $F^{\prime}=\{f^{\prime}_{i}\}_{i\in\mathcal{N}}$ , one has

\mathbb{P}\{\mathcal{M}(F)\in\mathcal{O}\}\leq e^{\epsilon}\cdot\mathbb{P}\{\mathcal{M}(F^{\prime})\in\mathcal{O}\}+\delta.

(5)

We choose our adjacency space $\mathcal{V}_{q}$ as follows. Given $q>1$ , consider the weight sequence $\{k^{q}\}^{\infty}_{k=1}$ and define the adjacency vector space to be the image of the resulting weighted $l_{2}$ space under $\Phi$ , i.e.,

\displaystyle\mathcal{V}_{q}=\Phi(\{\delta\in\mathbb{R}^{\mathbb{N}}|\sum_{k=1}^{\infty}k^{2q}\delta_{k}^{4}<\infty\}),

(6)

where $\delta_{k}$ is the $k$ th element of $\delta$ . The rationale of considering such a space will be made clear from the analysis. Moreover,

\displaystyle||f||_{\mathcal{V}_{q}}\triangleq(\sum_{k=1}^{\infty}(k^{2q}\delta_{k}^{4}))^{\frac{1}{4}},\text{with }\delta=\Phi^{-1}(f)

(7)

is a norm on $\mathcal{V}_{q}$ .

Now we introduce our main theorem about the privacy-preserving property of EFPSN.

Theorem IV.1.

Given $q>1,\gamma>0,p\in(1/2,q-1/2)$ , the chosen $\mathcal{V}_{q}$ space, and $\sigma_{k}^{2}=\frac{\gamma}{k^{p}}$ , the mechanism in Alg. 1 is $(\epsilon,\delta)$ -differential private when the precision order and the perturbation order $P,K\rightarrow\infty$ , with $\epsilon=\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\frac{A}{4}+\frac{R\sqrt{\bar{\mu}(\mathcal{L})A}}{\sqrt{2}})\right\},\delta=e^{-\frac{R^{2}}{2}}$ , where $A=\frac{1}{\gamma}\sqrt{\zeta(2(q-p))}||f_{I}-f_{I}^{\prime}||^{2}_{\mathcal{V}_{q}}$ , and $R$ is an arbitrary positive real number.

Proof.

When the precision order $P\rightarrow\infty$ , the encryption process is precise. For convenience, we can ignore the encryption process and the flooring in Alg. 1. Denote

	$\displaystyle\eta_{ik}$	$\displaystyle\triangleq\sum_{j\in\mathcal{N}_{i}}\eta_{ijk}-\sum_{j\in\mathcal{N}_{i}}\eta_{jik},$		(8)
	$\displaystyle\eta_{i}$	$\displaystyle\triangleq[\eta_{i0},...,\eta_{iK}].$		(8)

Under such a case, $\mathcal{M}$ in algorithm 1 is essentially adding functional perturbations constructed from a set of degenerate Gaussian noises. Specifically, given $F=\{f_{i}\}_{i\in\mathcal{N}}$ , we have

\mathcal{M}(F)=\{f_{i}+\Phi(\eta_{i})\}_{i\in\mathcal{N}},

(9)

and

\boldsymbol{\eta}_{k}=[\eta_{1k},...,\eta_{nk}]^{T}\sim N^{\dagger}(0_{n},2\sigma_{k}^{2}\mathcal{L}),\forall k\in\mathbb{Z}_{K}

(10)

are the zero-sum coefficients.

From [16], we have

\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{y})=\left\{\begin{aligned} \frac{1}{\sqrt{\det^{*}(4\pi\sigma_{k}^{2}\mathcal{L})}}\exp(-\frac{\boldsymbol{y}^{T}\mathcal{L}^{\dagger}\boldsymbol{y}}{4\sigma_{k}^{2}})&,\boldsymbol{y}^{T}\boldsymbol{1}=0\\ 0&,\text{else},\end{aligned}\right.

(11)

where $\mathcal{L}^{\dagger}=M\text{Diag}(0,1/\mu_{2},...,1/\mu_{n})M^{T}$ and $\det^{*}(4\pi\sigma_{k}^{2}\mathcal{L})=(4\pi\sigma_{k}^{2})^{n-1}\prod_{i=2}^{n}\mu_{i}$ .

Let $F^{\prime}=\{f_{i}^{\prime}\}_{i\in\mathcal{N}}$ be a set of $\mathcal{V}$ -adjacent functions of $F$ that differ only in the $I$ -th element. Let $\Psi^{-1}:L_{2}(D)^{n}\rightarrow\mathbb{R}^{n\times\mathbb{N}}$ be the map such that $\Psi^{-1}(F)=\{\Phi^{-1}(f_{i})\}_{i\in\mathcal{N}}$ . Define $\Phi_{0:k}^{-1}:L_{2}(D)\rightarrow\mathbb{R}^{k+1}$ , $\Phi_{k}^{-1}:L_{2}(D)\rightarrow\mathbb{R}$ as the map that returns the first $k+1$ coefficients and the $(k+1)$ -th coefficient of $\Phi^{-1}(\cdot)$ , respectively. Similarly, define $\Psi_{0:k}^{-1}:L_{2}(D)^{n}\rightarrow\mathbb{R}^{n\times(k+1)},\Psi_{k}^{-1}:L_{2}(D)^{n}\rightarrow\mathbb{R}^{n}$ as the map that returns the first $k+1$ columns and the $(k+1)$ -th column of $\Psi^{-1}(\cdot)$ , respectively. For any $\mathcal{O}\in L_{2}(D)^{n}$ , $\mathcal{O}_{i}$ is the $i$ th element of $\mathcal{O}$ . Let $\mathcal{O}_{i}-f_{i}\triangleq\{g_{i}\in L_{2}(D)|g_{i}+f_{i}\in\mathcal{O}_{i}\}$ and $\mathcal{O}-F\triangleq\{\{g_{i}\}_{i\in\mathcal{N}}\in L_{2}(D)^{n}|\{g_{i}+f_{i}\}_{i\in\mathcal{N}}\in\mathcal{O}\}$ .

We have

		$\displaystyle\mathbb{P}\{\mathcal{M}(F)\in\mathcal{O}\}$		(12)
		$\displaystyle=\mathbb{P}\{\{\eta_{i}\}_{i\in\mathcal{N}}\in\Psi^{-1}(\mathcal{O}-F)\}$
		$\displaystyle=\lim_{K\rightarrow\infty}\int_{\Psi_{0:K}^{-1}(\mathcal{O}-F)}\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{y}_{k})d\boldsymbol{y}_{0}\cdots d\boldsymbol{y}_{K}$

and

		$\displaystyle\mathbb{P}\{\mathcal{M}(F^{\prime})\in\mathcal{O}\}$		(13)
		$\displaystyle=\mathbb{P}\{\{\eta_{i}\}_{i\in\mathcal{N}}\in\Psi^{-1}(\mathcal{O}-F^{\prime})\}$
		$\displaystyle=\lim_{K\rightarrow\infty}\int_{\Psi_{0:K}^{-1}(\mathcal{O}-F^{\prime})}\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{y}_{k})d\boldsymbol{y}_{0}\cdots d\boldsymbol{y}_{K}$

By the linearity of $\Phi$ , we have $\Phi^{-1}(\mathcal{O}_{i}-f^{\prime}_{i})=\Phi^{-1}(\mathcal{O}_{i}-f_{i})+\xi_{i}$ for all $\mathcal{O}\subseteq L_{2}(D)^{n}$ and $i\in\mathcal{N}$ , where $\xi_{i}\triangleq\Phi^{-1}(f_{i}-f^{\prime}_{i})$ . Denoting $\boldsymbol{\xi}_{k}=[{\xi}_{1k},...,{\xi}_{nk}]^{T}$ , we have

\Psi_{k}^{-1}(\mathcal{O}-F^{\prime})=\Psi_{k}^{-1}(\mathcal{O}-F)+\boldsymbol{\xi}_{k}

(14)

Combing (13) and (14), we have

		$\displaystyle\mathbb{P}\{\mathcal{M}(F^{\prime})\in\mathcal{O}\}$		(15)
		$\displaystyle=\lim_{K\rightarrow\infty}\int_{\Psi_{0:K}^{-1}(\mathcal{O}-F)}\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{y}_{k}+\boldsymbol{\xi}_{k})d\boldsymbol{y}_{0}\cdots d\boldsymbol{y}_{K}$		(15)

To prove that $\mathcal{M}$ is $(\epsilon,\delta)$ -DP, it suffices to show the ratio of $\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{\eta}_{k})$ over $\prod_{k=0}^{K}\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{\eta}_{k}+\boldsymbol{\xi}_{k})$ is bounded by $e^{\epsilon}$ with probability at least $(1-\delta)$ .

We know that

		$\displaystyle\prod_{k=0}^{K}\frac{\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{\eta}_{k})}{\mathcal{P}_{\boldsymbol{\eta}_{k}}(\boldsymbol{\eta}_{k}+\boldsymbol{\xi}_{k})}$		(16)
		$\displaystyle=\exp\left\{\sum_{k=0}^{K}\frac{2\boldsymbol{\xi}_{k}^{T}\mathcal{L}^{\dagger}\boldsymbol{\eta}_{k}+\boldsymbol{\xi}_{k}^{T}\mathcal{L}^{\dagger}\boldsymbol{\xi}_{k}}{4\sigma_{k}^{2}}\right\}$
		$\displaystyle\leq\exp\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\sum_{k=0}^{K}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}+\sum_{k=0}^{K}\frac{\|\|\boldsymbol{\xi}_{k}\|\|^{2}}{4\sigma_{k}^{2}})\right\}$

Denote

\textbf{Rat}\triangleq\exp\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\sum_{k=0}^{\infty}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}+\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{4\sigma_{k}^{2}})\right\}.

We show that Rat is bounded with certain probability.

Since $\{f\}_{i\in\mathcal{N}}$ and $\{f^{\prime}\}_{i\in\mathcal{N}}$ only differ in one element, $\boldsymbol{\xi}_{k}$ has at most one non-zero element. Noting that $\boldsymbol{\eta}_{k}$ is random and drawn from $N^{\dagger}(0_{n},2\sigma_{k}^{2}\mathcal{L})$ , $\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}$ is a univariate Gaussian random variable. From (10), each element of $\boldsymbol{\eta}_{k}$ is at most of variance $2\sigma_{k}^{2}\bar{\mu}_{\mathcal{L}}$ . Thus, $\sum_{k=0}^{\infty}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}$ is a univariate Gaussian random variable with variance less than or equal to $\frac{\bar{\mu}(\mathcal{L})}{2}\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{\sigma^{2}_{k}}$ .

We further bound the summation term in its variance following a similar way in the proof of Theorem V.2 in [13]:

$\displaystyle\sum_{k=0}^{\infty}\frac{\|\|\boldsymbol{\xi}_{k}\|\|^{2}}{\sigma_{k}^{2}}$	$\displaystyle=\sum_{k=0}^{\infty}\frac{k^{q}\|\|\boldsymbol{\xi}_{k}\|\|^{2}}{k^{q}\sigma_{k}^{2}}$	(17)
	$\displaystyle\leq(\sum_{k=0}^{\infty}\frac{1}{(k^{q}\sigma_{k}^{2})^{2}})^{\frac{1}{2}}(\sum_{k=0}^{\infty}(k^{q}\|\|\boldsymbol{\xi}_{k}\|\|^{2})^{2})^{\frac{1}{2}}$
	$\displaystyle=\frac{1}{\gamma}\sqrt{\zeta(2(q-p))}\|\|f_{I}-f_{I}^{\prime}\|\|^{2}_{\mathcal{V}_{q}}$
	$\displaystyle\triangleq A.$

Let $R$ be an arbitrary positive real number. When $\sum_{k=0}^{\infty}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}\leq R\sqrt{\frac{\bar{\mu}(\mathcal{L})}{2}\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{\sigma^{2}_{k}}}$ holds, we have

	Rat	$\displaystyle\leq\exp\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(R\sqrt{\sum_{k=0}^{\infty}\frac{\|\|\boldsymbol{\xi}_{k}\|\|^{2}}{\sigma^{2}_{k}}}+\frac{1}{4}\sum_{k=0}^{\infty}\frac{\|\|\boldsymbol{\xi}_{k}\|\|^{2}}{\sigma^{2}_{k}})\right\}$		(18)
		$\displaystyle\leq\exp\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\frac{A}{4}+\frac{R\sqrt{\bar{\mu}(\mathcal{L})A}}{\sqrt{2}})\right\}.$		(18)

By adopting the Chernoff bound for Gaussian random variable, we have

\displaystyle\mathbb{P}\left\{\sum_{k=0}^{\infty}\frac{\boldsymbol{\xi}_{k}^{T}\boldsymbol{\eta}_{k}}{2\sigma_{k}^{2}}\geq R\sqrt{\frac{\bar{\mu}(\mathcal{L})}{2}\sum_{k=0}^{\infty}\frac{||\boldsymbol{\xi}_{k}||^{2}}{\sigma^{2}_{k}}}\right\}

\displaystyle\leq e^{-\frac{R^{2}}{2}}.

(19)

Namely, Rat is bounded by $e^{\epsilon}$ with probability $(1-\delta)$ , where $\epsilon=\left\{\frac{1}{\underline{\mu}(\mathcal{L})}(\frac{A}{4}+\frac{R\sqrt{\bar{\mu}(\mathcal{L})A}}{\sqrt{2}})\right\}$ , $\delta=e^{-\frac{R^{2}}{2}}$ , and $A=\frac{1}{\gamma}\sqrt{\zeta(2(q-p))}||f_{I}-f_{I}^{\prime}||^{2}_{\mathcal{V}_{q}}$ .

Therefore, under proper parameter settings, the mechanism $\mathcal{M}$ is $(\epsilon,\delta)$ -DP with $\epsilon,\delta$ defined above. ∎

Remark IV.2.

Several factors contribute to the privacy parameter $\epsilon$ . First, the communication graph topology affects $\bar{\mu}(\mathcal{L})$ and $\underline{\mu}(\mathcal{L})$ . Since $\bar{\mu}(\mathcal{L})$ is bounded by $2\triangle(G)$ , where $\triangle(G)$ is the maximum degree, and $\underline{\mu}(\mathcal{L})$ represents the algebraic connectivity affected by the graph connectivity, a strongly and evenly connected graph helps EFPSN to achieve the best privacy performance. In addition, the parameters $\gamma,q,p$ need to be deliberately chosen to decrease $\epsilon$ . Since $\gamma$ is the noise magnitude, it is natural that a larger $\gamma$ contributes to a smaller $\epsilon$ and, consequently, better privacy.

Remark IV.3.

The term $||f_{I}-f_{I}^{\prime}||^{2}_{\mathcal{V}_{q}}$ measures the privacy-preserving capacity of EFPSN from a different perspective. Given some privacy budget $\epsilon$ , the larger it is, the more functions can be protected by our mechanism. Additionally, this term is inversely proportional to $\gamma$ , suggesting protecting a larger dataset requires more noise.

Remark IV.4.

Choosing arbitrarily large $R,\gamma$ while keeping $\frac{R}{\sqrt{\gamma}}\sim o(1)$ results in arbitrarily small $\epsilon$ and $\delta$ simultaneously. Namely, we are able to attain a nearly perfectly private mechanism while the algorithmic accuracy is not sacrificed at the same time.

V Numerical Experiments

In this section, we evaluate the performance of the proposed EFPSN algorithm using numerical experiments. We consider both convex and non-convex objective functions, and EFPSN is compared with the non-zero-sum functional perturbation mechanism proposed in [13]. Before displaying the results, we first demonstrate how the orthonormal system $\{e_{k}\}_{k\in I}$ based on which we generate the coefficients-function mapping $\Phi$ , is constructed.

V-A Generating the Orthonormal System

Since the number of elements of an orthonormal basis grows factorially as the number of variables increases, even the simplest real-world applications, such as conducting logistic regression on pictures, require at least hundreds of parameters, making the number of elements in a basis prohibitively large. Consequently, for a function with $M$ variables, we pick $m<M$ variables to perturb. Also, instead of generating an orthonormal basis, we only generate an orthonormal system in $L_{2}$ with $N$ elements.

Since the generated orthonormal system must belong to some orthonormal basis, perturbing the orthonormal system is equivalent to perturbing the orthonormal basis with some noise coefficient being $0$ . Therefore, all of our previous discussion on accuracy and privacy holds under such a perturbing mechanism.

The orthonormal system is constructed from the Gram-Schimidt orthonormalization of the Taylor functions. Given the tuple $(K,m,N)$ , we randomly generate $N$ elements of Taylor functions of $m$ variables, of which the sum of the order is smaller than or equal to $K$ . Then we orthonormalizaing all terms using Gram-Schimidt method.

We present an example of one perturbing function when $(K,m,N)=(3,2,5)$ under noise level $\gamma=1$ . The orthonormal system is $\{0.5,0.866x_{2},3.307x_{2}^{3}-1.984x_{2},0.866x_{1},2.905x_{1}^{2}x_{2}-0.968x_{2}\}$ and the noise sequence is $\{0.180,0.628,-0.374,0.817,2.015\}$ . The corresponding perturbing function is $5.853x_{1}^{2}x_{2}-2.137x_{2}^{3}+0.708x_{1}-1.310x_{2}+0.090$ . We visualize the perturbing function in Fig. 1.

Refer to caption — Figure 1: Visualization of a 2d perturbing function.

V-B Accuracy Test

In this part, we validate the accuracy of the proposed EFPSN method.

V-B1 Convex Case

We consider a classification task on MNIST dataset using logistic regression.

We implement the logistic regression model using PyTorch. Specifically, since the picture in MNIST is of size $28\times 28$ and of channel 1, there is only one linear layer in the model, of which the input and output dimensions are 784 and 10 respectively. Together with bias, the model has 7850 parameters. We set the perturbing parameters $(K,m,N)=(1,10,10)$ . And the experiments are conducted under different noise level $\gamma\in\{1e^{-2},1e^{-1},1e^{0},1e^{1},1e^{2},1e^{3},1e^{4}\}$ .

Consider 5 agents in the network, connected as shown in Fig. 2. Each agent holds the same number of randomly assigned training data points. We adopt decentralized stochastic gradient descent in Phase II in Alg. 1. The batch size is set to $64$ . The initial learning rate equals $0.2$ . Each agent conducts $10000$ gradient updates. The learning rate remains fixed in the first $2000$ steps, and drops to $4e^{-5}$ at the last step.

The result is shown in Fig. 3. In Fig. 3(a), the horizontal axis displays different noise magnitude $\gamma$ , and the vertical axis represents the deviation from the optimal solution, which is $||\overline{\mathbf{x}}-\mathbf{x}^{\ast}||$ . Specifically, $\overline{\mathbf{x}}=\frac{1}{n}\sum_{i\in\mathcal{N}}\mathbf{x}_{i}$ , and $\mathbf{x}^{\ast}$ is the solution generated by centralized gradient descent.

When $\gamma=1e^{-2}$ , the results from EFPSN and the non-zero-sum algorithm are nearly identical, both close to the noise-free case. However, as $\gamma$ increases, the solution obtained from the non-zero-sum method starts to deviate quickly from optima, and such deviation spikes at a roughly constant rate. The blue line (EFPSN solution) does not rise until $\gamma=1e^{3}$ , at which point the orange line is 4 magnitudes higher.

The rise in the blue line stems from the slight disagreements between local decision variables $\mathbf{x}_{i}$ . Note that our perturbed function only guarantees zero-sum when each agent holds the same decision variable. Our previous error analysis assumes such exact agreement between agents. Yet, in practice, slight differences between $\mathbf{x}_{i}$ exist due to finite step size. With EFPSN, one can always generate a more accurate solution using a finer learning rate.

Fig. 3(b) demonstrates the classification accuracy of the logistic model under different algorithms and noise levels. The pattern matches that of Fig. 3(a). Basically, for the non-zero-sum algorithm, the test accuracy starts to drop dramatically when $\gamma$ reaches $1e^{0}$ . In contrast, the model trained by EFPSN remains as accurate as the noise-free case until $\gamma=1e^{4}$ . Namely, EFPSN provides much more (at least 4 magnitudes larger) privacy budgets while not degenerating accuracy.

V-B2 Non-convex Case

To further justify our method under the non-convex setting, we consider an image classification task on MNIST using Convolutional Neural Network. We adopt the classic LeNet [19], which consists of 13426 parameters. We choose to perturb the bias of its last linear layer, and again we set $(K,m,N)=(1,10,10)$ . Except for the model, all the settings are identical to the convex case.

In fig. 4(a), the y-axis depicts the squared norm of the average gradient of all the agents, $||\frac{\sum_{i\in\mathcal{N}}\nabla f_{i}(x_{i}^{k})}{|\mathcal{N}|}||^{2}$ . Trends similar to the convex case reappear in the non-convex case. Unlike the non-zero-sum method, EFPSN generates solutions closer to the stationary point under all noise levels. Also, in fig. 4(b), EFPSN remains as accurate as the noise-free case even when $\gamma=1e^{4}$ . The non-zero-sum method, nonetheless, could barely hold its accuracy as $\gamma=1e^{0}$ .

The experiments imply the efficacy of our method under both convex and non-convex problems.

V-C Privacy Test

To further validate EFPSN’s efficacy in privacy-preserving, we conduct DLG [7] attack on agents with zero-sum and non-zero-sum noise, respectively. The general idea of DLG is to construct some dummy data and try to match its gradient with the ground truth. The algorithm is shown in alg. 2. For better results, we use iDLG [20], a more efficient and stable version of DLG.

Algorithm 2 Deep Leakage from Gradients

F(\mathbf{x},W)

: Differentiable model;

W

: parameter weights;

\nabla W

: gradients calculated by training data

2:Private training data

\mathbf{x},\mathbf{y}

\mathbf{x}^{\prime}_{1}\leftarrow N(0,1),\mathbf{y}^{\prime}_{1}\leftarrow N(0,1)

4:for

i\leftarrow 1

n

\nabla W_{i}^{\prime}\leftarrow\partial l(F(\mathbf{x}^{\prime}_{i},W_{t}),\mathbf{y}_{i}^{\prime})/\partial W_{t}

\mathbb{D}_{i}\leftarrow||\nabla W_{i}^{\prime}-\nabla W||^{2}

\mathbf{x}_{i+1}^{\prime}\leftarrow\mathbf{x}_{i}^{\prime}-\alpha\nabla_{\mathbf{x}_{i+1}^{\prime}}\mathbb{D}_{i}

\mathbf{y}_{i+1}^{\prime}\leftarrow\mathbf{y}_{i}^{\prime}-\alpha\nabla_{\mathbf{y}_{i+1}^{\prime}}\mathbb{D}_{i}

9:end for

10:return

\mathbf{x}_{n+1}^{\prime},\mathbf{y}_{n+1}^{\prime}

With either EFPSN or the non-zero-sum noise method, each agent receives some non-zero functional perturbation of roughly the same magnitude. Since iDLG is carried out at the agent level, it makes no difference between the two algorithms. Therefore, we conduct iDLG on agent 1 in fig. 2, with the noise generated from EFPSN at different noise level ( $\gamma\in\{1e^{1},1e^{2},1e^{3},1e^{4}\}$ ).

Specifically, we assume the mixing matrix is known to the attacker. And the attacker has access to at least one of the communication channels connected to agent 1 (either eavesdropping or corrupting 1’s neighbor will do). Therefore, the attacker knows agent 1’s perturbed gradient $\nabla\hat{f}_{1}(x_{1}^{k})$ and agent 1’s decision parameters $x_{1}^{k}$ . The attacker does not know the functional perturbation $\Phi(\bar{\eta}_{i})$ . Consequently, the true gradient $\nabla f_{1}(x_{1}^{k})$ remains unrevealed. Namely, the attacker is trying to recover the raw data using inexact gradient information. And the larger the noise level $\gamma$ , the more inexact the gradient is.

Fig. 5 and fig. 6 represent the iDLG attacker’s typical inference result on the Logistic model and LeNet. The top left subfigure is the raw data. And the remaining subfigures are the adversary’s estimate of the raw data at different iterations (from 0 to 240). As $\gamma$ increases, the retrieved picture becomes blurred. Interestingly, though we are perturbing the original problem functionally, it is equivalent to directly perturbing the dataset. Generally, after $\gamma\geq 1e^{3}$ , the recovered picture is unrecognizable for humans. When $\gamma\geq 1e^{3}$ , however, the accuracy of the model trained by the non-zero-sum method has dropped below 10% for both convex and non-convex problems (as shown in fig. 3 and fig. 4). This suggests that the non-zero-sum solution would be too inaccurate to provide enough privacy. In contrast, the EFPSN solution has comparable accuracy to the noise-free case. Thus, EFPSN is capable of preserving privacy without degenerating accuracy.

VI Conclusion

In the paper, we proposed the Encrypted Functional Perturbation with Structured Noise algorithm that solves the decentralized optimization problem 1 privately and accurately. Given exact consensus between agents, EFPSN could eliminate the privacy-accuracy trade-off by constructing a zero-sum functional perturbation. Since such construction requires secure communication between agents, we adopt the Paillier encryption scheme to fight against eavesdropping attackers. We rigorously proved the privacy property of EFPSN under the differential privacy framework. Simulations confirmed the efficacy of EFPSN in protecting privacy while maintaining accuracy.

References

[1] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
[2] S. Pu and A. Nedić, “Distributed stochastic gradient tracking methods,” Mathematical Programming, vol. 187, no. 1, pp. 409–457, 2021.
[3] S. Pu, A. Garcia, and Z. Lin, “Noise reduction by swarming in social foraging,” IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4007–4013, 2016.
[4] T. Yan, Y. Gu, T. He, and J. A. Stankovic, “Design and optimization of distributed sensing coverage in wireless sensor networks,” ACM Transactions on Embedded Computing Systems (TECS), vol. 7, no. 3, pp. 1–40, 2008.
[5] R. Murphey and P. M. Pardalos, Cooperative control and optimization, vol. 66. Springer Science & Business Media, 2002.
[6] A. G. Roy, S. Siddiqui, S. Pölsterl, N. Navab, and C. Wachinger, “Braintorrent: A peer-to-peer environment for decentralized federated learning,” arXiv preprint arXiv:1905.06731, 2019.
[7] L. Zhu, Z. Liu, and S. Han, “Deep Leakage from Gradients,” p. 11.
[8] J. Cortes, G. E. Dullerud, S. Han, J. Le Ny, S. Mitra, and G. J. Pappas, “Differential privacy in control and network systems,” in 2016 IEEE 55th Conference on Decision and Control (CDC), (Las Vegas, NV), pp. 4252–4272, IEEE, Dec. 2016.
[9] C. Dwork and A. Roth, “The Algorithmic Foundations of Differential Privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–407, 2013.
[10] X. Chen, L. Huang, L. He, S. Dey, and L. Shi, “A Differential Private Method for Distributed Optimization in Directed Networks via State Decomposition,” p. 8, 2021.
[11] Y. Lou, L. Yu, and S. Wang, “Privacy Preservation in Distributed Subgradient Optimization Algorithms,” arXiv:1512.08822 [cs, math], Dec. 2015. arXiv: 1512.08822.
[12] Z. Huang, S. Mitra, and N. Vaidya, “Differentially Private Distributed Optimization,” in Proceedings of the 2015 International Conference on Distributed Computing and Networking, (Goa India), pp. 1–10, ACM, Jan. 2015.
[13] E. Nozari, P. Tallapragada, and J. Cortes, “Differentially Private Distributed Convex Optimization via Functional Perturbation,” IEEE Transactions on Control of Network Systems, vol. 5, pp. 395–408, Mar. 2016.
[14] Y. Lu and M. Zhu, “Privacy preserving distributed optimization using homomorphic encryption,” Automatica, vol. 96, pp. 314–325, Oct. 2018.
[15] C. Zhang and Y. Wang, “Enabling Privacy-Preservation in Decentralized Optimization,” IEEE Transactions on Control of Network Systems, vol. 6, pp. 679–689, June 2019.
[16] N. Gupta, S. Gade, N. Chopra, and N. H. Vaidya, “Preserving Statistical Privacy in Distributed Optimization,” IEEE Control Systems Letters, vol. 5, pp. 779–784, July 2021.
[17] P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in International conference on the theory and applications of cryptographic techniques, pp. 223–238, Springer, 1999.
[18] B. Fulton, “Review of introduction to modern cryptography by jonathan katz and yehuda lindell publisher: Chapman; hall-crc 2008 1-58488-551-3,” ACM SIGACT News, vol. 41, no. 4, p. 44–47, 2010.
[19] Y. LeCun, L. Bottou, Y. Bengio, and P. Ha, “Gradient-Based Learning Applied to Document Recognition,” p. 46, 1998.
[20] B. Zhao, K. R. Mopuri, and H. Bilen, “idlg: Improved deep leakage from gradients,” CoRR, vol. abs/2001.02610, 2020.

$\displaystyle\sum_{k=0}^{\infty}\frac{\|\|\boldsymbol{\xi}_{k}\|\|^{2}}{\sigma_{k}^{2}}$	$\displaystyle=\sum_{k=0}^{\infty}\frac{k^{q}\|\|\boldsymbol{\xi}_{k}\|\|^{2}}{k^{q}\sigma_{k}^{2}}$	(17)
	$\displaystyle\leq(\sum_{k=0}^{\infty}\frac{1}{(k^{q}\sigma_{k}^{2})^{2}})^{\frac{1}{2}}(\sum_{k=0}^{\infty}(k^{q}\|\|\boldsymbol{\xi}_{k}\|\|^{2})^{2})^{\frac{1}{2}}$
	$\displaystyle=\frac{1}{\gamma}\sqrt{\zeta(2(q-p))}\|\|f_{I}-f_{I}^{\prime}\|\|^{2}_{\mathcal{V}_{q}}$
	$\displaystyle\triangleq A.$

Private and Accurate Decentralized Optimization via Encrypted and Structured Functional Perturbation

Abstract

I Introduction

II Preliminaries

II-A Notation

II-B Graph Related Concepts

II-C Hilbert Spaces

II-D Paillier Cryptosystem

III Algorithm Design

Remark III.1.

IV Privacy Analysis

Definition 1 (𝒱\mathcal{V}-adjacency).

Definition 2 ((ϵ,δ)(\epsilon,\delta)-Differential Privacy).

Theorem IV.1.

Proof.

Remark IV.2.

Remark IV.3.

Remark IV.4.

V Numerical Experiments

V-A Generating the Orthonormal System

V-B Accuracy Test

V-B1 Convex Case

V-B2 Non-convex Case

V-C Privacy Test

VI Conclusion

References

Private and Accurate Decentralized Optimization via
Encrypted and Structured Functional Perturbation

Definition 1 ( $\mathcal{V}$ -adjacency).

Definition 2 ( $(\epsilon,\delta)$ -Differential Privacy).