Bayesian Differential Privacy for Linear Dynamical Systems

Genki Sugiura, Kaito Ito, and Kenji Kashima The authors are with the Graduate School of Informatics, Kyoto University, Kyoto, Japan e-mail: [email protected].

\copyright

2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Abstract

Differential privacy is a privacy measure based on the difficulty of discriminating between similar input data. In differential privacy analysis, similar data usually implies that their distance does not exceed a predetermined threshold. It, consequently, does not take into account the difficulty of distinguishing data sets that are far apart, which often contain highly private information. This problem has been pointed out in the research on differential privacy for static data, and Bayesian differential privacy has been proposed, which provides a privacy protection level even for outlier data by utilizing the prior distribution of the data. In this study, we introduce this Bayesian differential privacy to dynamical systems, and provide privacy guarantees for distant input data pairs and reveal its fundamental property. For example, we design a mechanism that satisfies the desired level of privacy protection, which characterizes the trade-off between privacy and information utility.

Index Terms:

Control system security, differential privacy, stochastic system.

I Introduction

As the Internet-of-Things (IoT) and cloud computing are attracting more and more attention for their convenience, privacy protection and security have become key technologies in control systems. To cope with privacy threats, many privacy protection methods have been studied so far [1, 2, 3]. Among them, differential privacy [4] has been used to solve many privacy-related problems in areas such as smart grid [5], health management [6], and blockchain [7], because it can mathematically quantify privacy guarantees. Differential privacy has originally been applied to static data, but as shown in the example of power systems above, there is an urgent need to establish privacy protection techniques for dynamical systems. In recent years, the concept of differential privacy has been introduced to dynamical systems [8], and from the viewpoint of control systems theory, the relationship between privacy protection and the observability of systems has been clarified, and methods of controller design with privacy protection in mind have been studied [9, 10, 11].

Conventional differential privacy is a privacy measure based on the difficulty of distinguishing similar data, where $x$ and $x^{\prime}$ are regarded as being similar if $|x-x^{\prime}|\leq c$ for a prescribed $c>0$ . To put it conversely, there is no indistinguishability guarantee for $x$ and $x^{\prime}$ if $|x-x^{\prime}|>c$ . This implies there is a risk of information leakage when there are outliers from normal data as pointed out in [12]. For example, unusual electricity consumption patterns may contain highly private information about the lifestyle. In the literature [13], a new concept called Bayesian Differential Privacy is developed for static data to solve this problem. Bayesian differential privacy considers the underlying probability distribution of the data, and attempts to guarantee privacy for data sets that are far apart.

In this study, we consider a prior distribution for the signal that we want to keep secret and introduce Bayesian differential privacy for linear dynamical systems. Similar to the conventional differential privacy cases [9], we consider a mechanism where stochastic noise is added to the output data. Note that applying a large noise increases the privacy protection level, but decreases the information usefulness [14]. In Theorem 7 below, a lower bound of noise scale to guarantee a prescribed Bayesian differential privacy level will be derived. Other properties including the relation to the conventional case are investigated based on this. The rest of this paper is organized as follows. In Section II, we introduce differential privacy for dynamical systems. In Section III, we propose Bayesian differential privacy for dynamical systems and derive a sufficient condition for added noise to achieve its privacy guarantee. In Section IV, considering the trade-off between privacy and information utility, we derive the Gaussian noise with the minimum energy while guaranteeing the Bayesian differential privacy. In Section V, the usefulness of Bayesian differential privacy is described via a numerical example. Some concluding remarks are given in Section VI.

Notations

The sets of real numbers and nonnegative integers are denoted by ${\mathbb{R}}$ and ${\mathbb{Z}}_{+}$ , respectively. The imaginary unit is denoted by ${\rm j}$ . For vectors $x_{1},\dots,x_{m}\in{\mathbb{R}}^{n}$ , a collective vector $[x_{1}^{\top}\cdots x_{m}^{\top}]^{\top}\in{\mathbb{R}}^{nm}$ is described by $[x_{1};\cdots;x_{m}]$ for the sake of simplicity of description. For a sequence $u(t)\in{\mathbb{R}}^{n},\ t=0,1,\ldots,T$ , a collective vector is denoted by $U_{T}\coloneqq[u(0);\cdots;u(T)]\in{\mathbb{R}}^{(T+1)n}$ using a capital alphabet. For a square matrix $A\in{\mathbb{R}}^{n\times n}$ , its determinant is denoted by $\mathrm{det}(A)$ , and when its eigenvalues are real, its maximum and minimum eigenvalues are denoted by $\lambda_{\max}(A)$ and $\lambda_{\min}(A)$ , respectively. We write $A\succ 0$ (resp. $A\succeq 0$ ) if $A$ is positive definite (resp. semidefinite). For $A\succeq 0$ , the principal square root of $A$ is denoted by $A^{1/2}$ . The identity matrix of size $n$ is denoted by $I_{n}$ . The subscript $n$ is omitted when it is clear from the context. The Euclidean norm of a vector $x\in{\mathbb{R}}^{n}$ is denoted by $|x|$ , and its weighted norm with $A\succ 0$ is denoted by $|x|_{A}\coloneqq(x^{\top}Ax)^{1/2}$ . The indicator function of a set $S\subset{\mathbb{R}}^{n}$ is denoted by $1_{S}$ , i.e., $1_{S}(x)=1$ if $x\in S$ , and $0$ , otherwise. For a topological space $X$ , the Borel algebra on $X$ is denoted by ${\mathcal{B}}(X)$ . Fix some complete probability space $(\Omega,{\mathcal{F}},{\mathbb{P}})$ , and let ${\mathbb{E}}$ be the expectation with respect to ${\mathbb{P}}$ . For an $\mathbb{R}^{n}$ -valued random vector $w$ , $w\sim{\mathcal{N}}_{n}(\mu,\Sigma)$ means that $w$ has a nondegenerate multivariate Gaussian distribution with mean $\mu\in{\mathbb{R}}^{n}$ and covariance matrix $\Sigma\succ 0$ . The so called ${\mathsf{Q}}$ -function is defined by ${\mathsf{Q}}(c)\coloneqq\frac{1}{\sqrt{2\pi}}\int_{c}^{\infty}{\rm e}^{-\frac{v^{2}}{2}}dv$ , where ${\mathsf{Q}}(c)<1/2$ for $c>0$ , and ${\mathsf{R}}(\varepsilon,\delta)\coloneqq({\mathsf{Q}}^{-1}(\delta)+\sqrt{({\mathsf{Q}}^{-1}(\delta))^{2}+2\varepsilon})/2\varepsilon$ . The gamma function is denoted by $\Gamma(\cdot)$ . A random variable $z$ is said to have a $\chi^{2}$ distribution with $k$ degrees of freedom, denoted by $z\sim\chi^{2}_{k}$ , if its distribution has the following probability density:

\displaystyle p(z;k)=\frac{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{z}}^{k/2-1}\mathrm{e}^{-{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{z}}/2}}{\,2^{k/2}\Gamma(k/2)},\quad z\geq 0,\ k\in\{1,2,\ldots\}.

II Conventional differential Privacy for dynamical systems

In this section, we briefly overview fundamental results on differential privacy for dynamical systems. Consider the following discrete-time linear system:

\displaystyle\left\{\begin{array}[]{l}x(t+1)=Ax(t)+Bu(t),\\ y(t)=Cx(t)+Du(t),\end{array}\right.

(3)

for $t\in{\mathbb{Z}}_{+}$ , where $x(t)\in{\mathbb{R}}^{n}$ , $u(t)\in{\mathbb{R}}^{m}$ , and $y(t)\in{\mathbb{R}}^{q}$ denote the state, input, and output, respectively, and $A\in{\mathbb{R}}^{n\times n}$ , $B\in{\mathbb{R}}^{n\times m}$ , $C\in{\mathbb{R}}^{q\times n}$ , and $D\in{\mathbb{R}}^{q\times m}$ . For simplicity, we assume $x(0)=0$ and the information to be kept secret is the input sequence $U_{T}$ up to a finite time $T\in{\mathbb{Z}}_{+}$ . For (3), the output sequence $Y_{T}\in{\mathbb{R}}^{(T+1)q}$ is described by

\displaystyle Y_{T}=N_{T}U_{T},

(4)

where $N_{T}\in{\mathbb{R}}^{(T+1)q\times(T+1)m}$ is

	$\displaystyle N_{T}$	$\displaystyle={\cal T}_{T}(A,B,C,D)$		(5)
		$\displaystyle\coloneqq\left[\begin{array}[]{cccccc}D&0&\cdots&\cdots&0\\ CB&D&\ddots&&\vdots\\ CAB&CB&D&\ddots&\vdots\\ \vdots&\vdots&\ddots&\ddots&0\\ CA^{T-1}B&CA^{T-2}B&\cdots&CB&D\end{array}\right].$		(11)

To proceed with differential privacy analysis, we consider the output $y_{w}(t)\coloneqq y(t)+w(t)$ after adding the noise $w(t)\in{\mathbb{R}}^{q}$ ; see Fig. 1. From (4), $Y_{w,T}\in{\mathbb{R}}^{(T+1)q}$ can be described by

\displaystyle Y_{w,T}=N_{T}U_{T}+W_{T},

(12)

which defines a mapping ${\mathcal{M}}\colon{\mathbb{R}}^{(T+1)m}\times\Omega\ni(U_{T},\omega)\mapsto Y_{w,T}\in{\mathbb{R}}^{(T+1)q}.$

Refer to caption — Figure 1: Mechanism with output noise.

In differential privacy analysis, this mapping is called a mechanism.

Next, the definition of the differential privacy is given. We begin with the definition of the data similarity:

Definition 1.

Given a positive definite matrix $K\in{\mathbb{R}}^{(T+1)m\times(T+1)m}$ , a pair of input data $(U_{T},U^{\prime}_{T})\allowbreak\in{\mathbb{R}}^{(T+1)m}\times{\mathbb{R}}^{(T+1)m}$ is said to belong to the binary relation $K$ -adjacency if

\displaystyle|U_{T}-U^{\prime}_{T}|_{K}\leq 1.

(13)

The set of all pairs of the input data that are $K$ -adjacent is denoted by $\mathrm{Adj}_{K}$ .

This $K$ -adjacency is an extension of $c$ -adjacency, which corresponds to $K=I/c^{2}$ , for the $2$ -norm in previous work [9]. Next, we describe the definition of $(K,\varepsilon,\delta)$ -differential privacy in dynamical systems in the same way as for static data [9, Definition 2.4].

Definition 2 ( $(K,\varepsilon,\delta)$ -differential privacy).

Given $\varepsilon>0$ and $\delta\geq 0$ , the mechanism (12) is said to be $(K,\varepsilon,\delta)$ -differentially private ( $(K,\varepsilon,\delta)$ -DP) at a finite time instant $T\in{\mathbb{Z}}_{+}$ , if

	$\displaystyle{\mathbb{P}}[N_{T}U_{T}+W_{T}\in{\mathcal{S}}]$
	$\displaystyle\leq{\rm e}^{\varepsilon}{\mathbb{P}}[N_{T}U^{\prime}_{T}+W_{T}\in{\mathcal{S}}]+\delta,\quad\forall{\mathcal{S}}\in{\mathcal{B}}\left({\mathbb{R}}^{(T+1)q}\right)$		(14)

for any $(U_{T},U^{\prime}_{T})\in{\rm Adj}_{K}$ .

Suppose that the output sequence $Y_{w,T}$ and the state equation (3) are available for an attacker trying to estimate the value of the input sequence $U_{T}$ . Differential privacy requires the output sequence statistics are close enough at least for adjacent data pairs. A sufficient condition for the mechanism induced by Gaussian noise to be $(\varepsilon,\delta)$ -differentially private under $c$ -adjacency is derived in [9, Theorem 2.6]. This result can be straightforwardly extended as follows:

Theorem 3.

The Gaussian mechanism (12) induced by $W_{T}\sim{\mathcal{N}}_{(T+1)q}(\mu_{w},\Sigma_{w})$ is $(K,\varepsilon,\delta)$ -differentially private at a finite time $T\in{\mathbb{Z}}_{+}$ with $\varepsilon>0$ and $1/2>\delta>0$ , if the covariance matrix $\Sigma_{w}\succ 0$ is chosen such that

\displaystyle\lambda_{\max}^{-1/2}\left({\mathcal{O}}_{K,\Sigma_{w},T}\right)\geq{\mathsf{R}}(\varepsilon,\delta),

(15)

where

\displaystyle{\mathcal{O}}_{K,\Sigma_{w},T}\coloneqq K^{-1/2}N_{T}^{\top}\Sigma_{w}^{-1}N_{T}K^{-1/2}.

(16)

Larger noise $W_{T}$ and lower threshold for the adjacency in the sense of $\Sigma$ and $K$ , respectively, make the left-hand side of (15) larger. This implies that differential privacy is ensured for smaller $\varepsilon$ and $\delta$ because ${\mathsf{R}}$ is a decreasing function. Further insight of $K$ will be revealed in Corollary 8 below.

III Bayesian differential privacy for dynamical systems

III-A Formulation

In Definition 2, the difficulty of distinguishing data pairs whose $K$ -weighted distance is larger than a threshold $1$ is not taken into account. Note that there is no design guideline for $K$ . In this section, we introduce Bayesian differential privacy for dynamical systems. To this end, we assume the following availability of the prior distribution of the data to be protected, and provide a privacy guarantee that takes into account the discrimination difficulty of data pairs based on the prior.

Assumption 4.

The input data to the mechanism, $U_{T}$ , is an ${\mathbb{R}}^{(T+1)m}$ -valued random variable with distribution ${\mathbb{P}}_{U_{T}}$ . In addition, one can use the prior ${\mathbb{P}}_{U_{T}}$ to design a mechanism.

The following is a typical example where a private input signal is a realization of a random variable.

Example 5.

Suppose that the input data $u(t)$ to be protected is the reference for tracking control; see [14]. In many applications, tracking to the reference signal over specified frequency ranges is required. Such a control objective can be represented by filtering white noise $\xi(t)$ . To be more precise, we assume $u(t)=r(t)$ is generated by

	$\displaystyle x_{r}(t+1)=A_{r}x_{r}(t)+B_{r}\xi(t),\ x_{r}(0)=0,$		(17)
	$\displaystyle r(t)=C_{r}x_{r}(t)+D_{r}\xi(t),$		(18)
	$\displaystyle\xi(t)\sim{\mathcal{N}}(0,I),\quad t\in\{0,1,\ldots,T\},$		(19)

where $A_{r}\in{\mathbb{R}}^{l\times l},\ B_{r}\in{\mathbb{R}}^{l\times m},\ C_{r}\in{\mathbb{R}}^{m\times l},\ D_{r}\in{\mathbb{R}}^{m\times m}$ . The power spectrum of $u(t)$ is characterized by the frequency transfer function

\displaystyle G_{r}({\rm e}^{{\rm j}\lambda}):=C_{r}({\rm e}^{{\rm j}\lambda}I-A_{r})^{-1}B_{r}+D_{r},\ \lambda\in[0,\pi).

(20)

In this case,

\displaystyle r(t)=\begin{cases}D_{r}\xi(0)&(t=0),\\ D_{r}\xi(t)+\sum_{j=1}^{t}C_{r}A_{r}^{j-1}B_{r}\xi(t-j)&(t\geq 1),\end{cases}

(21)

and the prior distribution is given by $U_{T}\sim{\cal N}(0,\Sigma_{U})$ with

	$\displaystyle\Sigma_{U}:=\Xi_{T}\Xi_{T}^{\top},$		(22)
	$\displaystyle\Xi_{T}:={\cal T}_{T}(A_{r},B_{r},C_{r},D_{r}).$		(23)

∎

Note that the step reference signal whose value obeys $r(t)\equiv\bar{r}\sim\mathcal{N}(0,\Sigma_{\rm s})$ can be modeled by setting $A_{r}=C_{r}=I,\ B_{r}=D_{r}=0$ with $x_{r}(0)\sim\mathcal{N}(0,\Sigma_{\rm s})$ in (17), (18). This corresponds to the case where the initial state is the private information rather than the input sequence $U_{T}$ as discussed in [9].

Then, based on the Bayesian differential privacy for static data [13], we define $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -Bayesian differential privacy, which is an extension of differential privacy for dynamical systems.

Definition 6 ( $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -Bayesian differential privacy).

Assume that the random variables $U_{T},U^{\prime}_{T}$ are independent and both follow the distribution ${\mathbb{P}}_{U_{T}}$ . Given $1\geq\gamma\geq 0,\ \varepsilon>0$ and $\delta\geq 0$ , the mechanism (12) is said to be $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -Bayesian differentially private ( $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -BDP) at a finite time instant $T\in{\mathbb{Z}}_{+}$ , if

	$\displaystyle{\mathbb{P}}\Bigl{[}{\mathbb{P}}[N_{T}U_{T}+W_{T}\in{\mathcal{S}}\mid U_{T}]$
	$\displaystyle\leq\mathrm{e}^{\varepsilon}{\mathbb{P}}[N_{T}U^{\prime}_{T}+W_{T}\in{\mathcal{S}}\mid U^{\prime}_{T}]+\delta\Bigr{]}\geq\gamma,\quad\forall{\mathcal{S}}\in{\mathcal{B}}({\mathbb{R}}^{(T+1)q}).$		(24)

In (24), the outer (resp. inner) $\mathbb{P}$ is taken with respect to $(U_{T},U^{\prime}_{T})$ (resp. $W_{T}$ ). Roughly speaking, the definition of BDP is that the probability that the mechanism satisfies $(K,\varepsilon,\delta)$ -DP is greater than or equal to $\gamma$ . Note that this definition places no direct restriction on the distance between a pair of input data $U_{T},U^{\prime}_{T}$ .

III-B Sufficient condition for noise scale

It is desirable that the added noise $w$ is small to retain the data usefulness; see e.g., Section V. The following theorem gives a sufficient condition for noise scale to guarantee $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -Bayesian differential privacy.

Theorem 7.

Suppose that the prior distribution ${\mathbb{P}}_{U_{T}}$ of $U_{T}$ is ${\mathcal{N}}_{(T+1)m}(0,\Sigma)$ . The Gaussian mechanism (12) induced by $W_{T}\sim{\mathcal{N}}_{(T+1)q}(\mu_{w},\Sigma_{w})$ is $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -Bayesian differentially private at a finite time $T\in{\mathbb{Z}}_{+}$ with $1\geq\gamma\geq 0$ , $\varepsilon>0$ , and $1/2>\delta>0$ , if the covariance matrix $\Sigma_{w}\succ 0$ is chosen such that

\displaystyle\lambda_{\max}^{-1/2}({\mathcal{O}}_{\Sigma,\Sigma_{w},T})\geq c(\gamma,T){\mathsf{R}}(\varepsilon,\delta)

(25)

where ${\mathcal{O}}_{\Sigma,\Sigma_{w},T}$ is defined by

\displaystyle{\mathcal{O}}_{\Sigma,\Sigma_{w},T}

\displaystyle\coloneqq\Sigma^{1/2}N_{T}^{\top}\Sigma_{w}^{-1}N_{T}\Sigma^{1/2}

(26)

and $c(\gamma,T)$ is the unique $c>0$ that satisfies

\displaystyle\gamma

\displaystyle=\int_{0}^{c^{2}/2}\dfrac{1}{2^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{(T+1)m/2}}}\Gamma({\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{(T+1)m/2)}}}x^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{(T+1)m/2}}-1}\mathrm{e}^{-x/2}dx.

(27)

Proof.

Using a similar argument as in the proof for [9, Theorem 2.6], for any fixed $U,U^{\prime}\in{\mathbb{R}}^{(T+1)m}$ , one has

	$\displaystyle{\mathbb{P}}[N_{T}U_{T}+W_{T}\in{\mathcal{S}}\mid U_{T}=U]$
	$\displaystyle\leq{\rm e}^{\varepsilon}{\mathbb{P}}[N_{T}U^{\prime}_{T}+W_{T}\in{\mathcal{S}}\mid U^{\prime}_{T}=U^{\prime}]$
	$\displaystyle+{\mathbb{P}}\left[Z\geq{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\varepsilon h-\frac{1}{2h}}}\mid U_{T}=U,\ U^{\prime}_{T}=U^{\prime}\right]$

where

\displaystyle h\coloneqq|Y-Y^{\prime}|_{\Sigma^{-1}_{w}}^{-1},\ Y\coloneqq N_{T}U_{T},\ Y^{\prime}\coloneqq N_{T}U^{\prime}_{T},

and $Z\sim{\mathcal{N}}(0,1)$ . Then, the mechanism is $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -Bayesian differentially private, if ${\mathsf{Q}}({\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\varepsilon h-\frac{1}{2h}}})\leq\delta$ with probability at least $\gamma$ , i.e.,

\displaystyle{\mathbb{P}}[h\geq{\mathsf{R}}(\varepsilon,\delta)]\geq\gamma.

(28)

The inequality (28) holds if (25) is satisfied. This is because

	$\displaystyle h^{-2}$	$\displaystyle=\|N_{T}(U_{T}-U^{\prime}_{T})\|^{2}_{\Sigma^{-1}_{w}}$
		$\displaystyle=(U_{T}-U^{\prime}_{T})^{\top}\Sigma^{-1/2}{\mathcal{O}}_{\Sigma,\Sigma_{w},T}\Sigma^{-1/2}(U_{T}-U^{\prime}_{T})$
		$\displaystyle\leq\|\Sigma^{-1/2}(U_{T}-U^{\prime}_{T})\|^{2}\lambda_{\max}({\mathcal{O}}_{\Sigma,\Sigma_{w},T}),$

and then, from the fact that $|\Sigma^{-1/2}(U_{T}-U^{\prime}_{T})|^{2}/2$ follows $\chi^{2}$ distribution with $(T+1)m$ degrees of freedom and the definition of $c(\gamma,T)$ , $|\Sigma^{-1/2}(U_{T}-U^{\prime}_{T})|^{2}\leq c(\gamma,T)^{2}$ with probability $\gamma$ . ∎

In order to clarify the connection between conventional and Bayesian DP, it is worthwhile comparing Theorems 3 and 7. Bayesian differential privacy with the prior distribution ${\mathbb{P}}_{U_{T}}\sim{\mathcal{N}}_{(T+1)m}(0,\Sigma)$ corresponds to differential privacy with an adjacency weight $K$ .

Corollary 8.

Suppose that the prior distribution ${\mathbb{P}}_{U_{T}}$ of $U_{T}$ is ${\mathcal{N}}_{(T+1)m}(0,\Sigma)$ . Let a finite time $T\in{\mathbb{Z}}_{+}$ with $1\geq\gamma\geq 0$ , $\varepsilon>0$ , and $1/2>\delta>0$ be given. Then, the Gaussian mechanism (12) induced by $W_{T}\sim{\mathcal{N}}_{(T+1)q}(\mu_{w},\Sigma_{w})$ is $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -Bayesian differentially private at the time $T$ , if the mechanism is $(K,\varepsilon,\delta)$ -differentially private at the time $T$ with

\displaystyle K:=\Sigma^{-1}/c(\gamma,T)^{2}

(29)

with $c(\gamma,T)$ defined in Theorem 7.

Proof.

Let us define $Y_{w,T}:=N_{T}U_{T}+W_{T},\ Y^{\prime}_{w,T}:=N_{T}U^{\prime}_{T}+W_{T}.$ If the mechanism is $(K,\varepsilon,\delta)$ -differentially private with $K$ defined in (29),

	$\displaystyle{\mathbb{P}}\Bigl{[}{\mathbb{P}}[Y_{w,T}\in{\mathcal{S}}\mid U_{T}]\leq\mathrm{e}^{\varepsilon}{\mathbb{P}}[Y^{\prime}_{w,T}\in{\mathcal{S}}\mid U^{\prime}_{T}]+\delta\Bigr{]}$
	$\displaystyle\geq{\mathbb{P}}\Bigl{[}{\mathbb{P}}[Y_{w,T}\in{\mathcal{S}}\mid U_{T}]\leq$
	$\displaystyle\qquad\qquad\mathrm{e}^{\varepsilon}{\mathbb{P}}[Y^{\prime}_{w,T}\in{\mathcal{S}}\mid U^{\prime}_{T}]+\delta\ \big{\|}\ \|U_{T}-U^{\prime}_{T}\|_{\Sigma^{-1}}\leq c(\gamma,T)\Bigr{]}$
	$\displaystyle\times{\mathbb{P}}\bigl{[}\|U_{T}-U^{\prime}_{T}\|_{\Sigma^{-1}}\leq c(\gamma,T)\bigr{]}.$

Note that it holds

	$\displaystyle\mathbb{P}\Bigl{[}\mathbb{P}[Y_{w,T}\in\mathcal{S}\mid U_{T}]\leq$
	$\displaystyle\qquad\mathrm{e}^{\varepsilon}\mathbb{P}[Y^{\prime}_{w,T}\in\mathcal{S}\mid U^{\prime}_{T}]+\delta\ \big{\|}\ \|U_{T}-U^{\prime}_{T}\|_{\Sigma^{-1}}\leq c(\gamma,T)\Bigr{]}=1,$

since the mechanism satisfies

\mathbb{P}[Y_{w,T}\in\mathcal{S}\mid U_{T}]\leq\mathrm{e}^{\varepsilon}\mathbb{P}[Y^{\prime}_{w,T}\in\mathcal{S}\mid U^{\prime}_{T}]+\delta

whenever $|U_{T}-U^{\prime}_{T}|_{\Sigma^{-1}}\leq c(\gamma,T)$ (i.e. $(U_{T},U^{\prime}_{T})\in\mathrm{Adj}_{K}$ ). Next, by definition of $c(\gamma,T)$ , we have

\mathbb{P}\bigl{[}|U_{T}-U^{\prime}_{T}|_{\Sigma^{-1}}\leq c(\gamma,T)\bigr{]}=\gamma.

Consequently, we obtain the desired result. ∎

It should be emphasized that such a simple relation is obtained since the prior is Gaussian and the system is linear.

III-C Asymptotic analysis

For the conventional DP, it is known that when the system (3) is asymptotically stable, one can design a Gaussian noise which makes the induced mechanism differentially private for any time horizon $T$ [9, Corollary 2.9]. This is because, for an asymptotically stable system, the incremental gain from $|U_{T}-U^{\prime}_{T}|$ to $|Y_{T}-Y^{\prime}_{T}|$ is bounded by its $H^{\infty}$ -norm for any $T$ , and by the definition of DP, the distance $|U_{T}-U^{\prime}_{T}|$ is also bounded by a predetermined threshold. That is, even when the horizon of the data to be protected becomes longer, the distance between data sets where their indistinguishability is guaranteed is fixed.

On the other hand, for the proposed BDP, as $T$ becomes larger, $|U_{T}-U^{\prime}_{T}|$ tends to take larger values according to the prior ${\mathbb{P}}_{U_{T}}$ . Consequently, to achieve BDP for a large time horizon $T$ , large noise is required. To see this from Theorem 7, $c(\gamma,T)>0$ is plotted in Fig. 2 as a function of $T$ . As can be seen, as $T$ increases, $c(\gamma,T)$ becomes large, and therefore, from (25), the scale parameter $\Sigma_{w}$ of the noise is required to be large to guarantee BDP. This fact suggests that the privacy requirement of BDP (with fixed $\varepsilon,\gamma,\delta$ ) for the long (possibly infinite) horizon case is too strong. This issue will be resolved by an appropriate scaling with $T$ to quantify the long-time average privacy.

IV Design of mechanism

To motivate additional analysis in this section, let us consider a feedback interconnection of the plant $\cal P$ and the controller $\cal C$ :

	$\displaystyle{\cal P}:\left\{\begin{array}[]{l}x_{p}(t+1)=A_{p}x_{p}(t)+B_{p}u_{p}(t),\\ y_{p}(t)=C_{p}x_{p}(t),\\ \end{array}\right.$
	$\displaystyle{\cal C}:\left\{\begin{array}[]{l}x_{c}(t+1)=A_{c}x_{c}(t)+B_{c}e(t),\\ u_{p}(t)=C_{c}x_{c}(t),\end{array}\right.$

where the control objective is to make the tracking error $e(t):=r(t)-y_{p}(t)$ small, and its private information is the reference signal $r(t)$ . The attacker, who can access to the output $y_{p}(t)$ , attempts to estimate $r(t)$ . To prevent this inference, we add noise $v$ to $r$ , which leads to the following closed loop dynamics:

\displaystyle\left\{\begin{array}[]{l}\bar{x}(t+1)=\bar{A}\bar{x}(t)+\bar{B}(r(t)+v(t)),\\ y_{p}(t)=\bar{C}\bar{x}(t),\\ e(t)=r(t)-\bar{C}\bar{x}(t),\\ \end{array}\right.

(33)

where $\bar{x}(t):=[x_{p}(t);x_{c}(t)]$ and

\displaystyle\bar{A}:=\begin{bmatrix}A_{p}&B_{p}C_{c}\\ {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{-B_{c}C_{p}}}&A_{c}\end{bmatrix},\ \bar{B}:=\begin{bmatrix}0\\ {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{B_{c}}}\end{bmatrix},\ \bar{C}:=\begin{bmatrix}C_{p}&0\end{bmatrix}.

Suppose that the distribution of $V_{T}$ is given by ${\cal N}(0,\Sigma_{v})$ . Then, larger noise $v$ fluctuates $e$ more so that the variance of $E_{T}:=[e(0);\cdots;e(T)]$ is given by

\displaystyle\Theta_{T}\Sigma_{v}\Theta_{T}^{\top}

(34)

with $\Theta_{T}:={\cal T}_{T}(\bar{A},\bar{B},-\bar{C},0)$ .

IV-A Minimization of the noise

The expression (34) motivates us to seek the Gaussian noise with the minimum energy among those satisfying the sufficient condition (25) for Bayesian differential privacy derived by Theorem 7. More specifically, we consider the following optimization problem with the covariance matrix of Gaussian noise $\Sigma_{w}\succ 0$ as the decision variable.

Problem 9.

	$\displaystyle\underset{\Sigma_{w}\succ 0}{\text{minimize}}$	$\displaystyle{\rm Tr}(\Sigma_{w})$		(35)
	subject to	$\displaystyle\lambda_{\max}(\Sigma^{1/2}N_{T}^{\top}\Sigma_{w}^{-1}N_{T}\Sigma^{1/2})\leq\frac{1}{c(\gamma,T)^{2}{\mathsf{R}}(\varepsilon,\delta)^{2}}.$		(36)

The constraint (36) is an inequality that is equivalent to the inequality (25). Under certain assumptions, this solution can be obtained as follows.

Theorem 10.

Assume that $N_{T}$ is full row rank. The optimal solution to problem 9 is $\Sigma_{w}^{*}\coloneqq c(\gamma,T)^{2}{\mathsf{R}}(\varepsilon,\delta)^{2}N_{T}\Sigma N_{T}^{\top}$ .

Proof.

Denote ${\sf N}:=c(\gamma,T){\mathsf{R}}(\varepsilon,\delta)N_{T}\Sigma^{1/2}$ so that $\Sigma_{w}^{*}={\sf N}{\sf N}^{\top}$ . Then, (36) is equivalent to $I-{\sf N}^{\top}\Sigma_{w}^{-1}{\sf N}\succeq 0$ . By the Schur complement,

\displaystyle\begin{bmatrix}\Sigma_{w}&{\sf N}\\ {\sf N}^{\top}&I\end{bmatrix}\succeq 0.

(37)

This implies $\Sigma_{w}\succeq\Sigma_{w}^{*}$ , and consequently ${\rm Tr}(\Sigma_{w})\geq{\rm Tr}(\Sigma_{w}^{*})$ . ∎

The obtained optimal solution $\Sigma_{w}^{*}$ is a constant multiple of the covariance matrix of the distribution of the output data $Y_{T}=N_{T}U_{T}$ when the covariance matrix of the input data $U_{T}$ is $\Sigma$ . This means that it is possible to efficiently conceal the input data from the output data by applying the noise having the same statistics (up to scaling) as the observed data.

IV-B Input noise mechanism

In this subsection, we study the case where noise is added to the input channel; see Fig. 3. Consider the following system with input noise:

\displaystyle\left\{\begin{array}[]{l}x(t+1)=Ax(t)+B(u(t)+v(t)),\\ y_{v}(t)=Cx(t)+D(u(t)+v(t)).\end{array}\right.

(40)

As in the aforementioned section, we assume $x(0)=0$ . The output sequence $Y_{v,T}=[y_{v}(0);\cdots;y_{v}(T)]$ can be described as

\displaystyle Y_{v,T}=N_{T}U_{T}+N_{T}V_{T}.

(41)

For the system in (3), adding noise $V_{T}$ to the input channel is equivalent to adding noise $W_{T}=N_{T}V_{T}$ to the output channel. For simplicity, we assume that $N_{T}$ is square and nonsingular; this can be relaxed as in [9, Corollary 2.16]. From Theorem 7, we obtain the following corollary.

Corollary 11.

Suppose that the prior distribution ${\mathbb{P}}_{U_{T}}$ of $U_{T}$ is ${\mathcal{N}}_{(T+1)m}(0,\Sigma)$ . The Gaussian mechanism (41) induced by $V_{T}\sim{\mathcal{N}}_{(T+1)q}(\mu_{v},\Sigma_{v})$ is $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -Bayesian differentially private at a finite time $T\in{\mathbb{Z}}_{+}$ with $1\geq\gamma\geq 0$ , $\varepsilon>0$ , and $1/2>\delta>0$ , if the covariance matrix $\Sigma_{v}\succ 0$ is chosen such that

\displaystyle\lambda_{\min}^{1/2}(\Sigma^{-1/2}\Sigma_{v}\Sigma^{-1/2})\geq c(\gamma,T){\mathsf{R}}(\varepsilon,\delta)

(42)

with $c(\gamma,T)>0$ defined in Theorem 7.

Proof.

The desired result is a straightforward consequence of Theorem 7. ∎

In [9, Corollary 2.16], a sufficient condition for $(\varepsilon,\delta)$ -differential privacy in the sense of Definition 2 is given. The result concludes that the differential privacy level for the input noise mechanism does not depend on the system itself. Similarly, (42) does not depend on the system matrices in (3) either. The difference for the Bayesian case is that (42) depends on the covariance $\Sigma$ of the prior distribution of the signals to be protected.

Remark 12.

It is clear from Corollary 11 and Theorem 10 that the minimum energy Gaussian noise that satisfies the sufficient condition for privacy guarantee (42) for the input noise mechanism can be easily obtained by

\displaystyle\Sigma^{*}_{v}=c(\gamma,T)^{2}{\mathsf{R}}(\varepsilon,\delta)^{2}\Sigma.

(43)

This characterization allows the natural interpretation that large noise is needed to protect large inputs; see also the next section.

V Numerical example

Consider the feedback system in Fig. 4, where the plant and controller in (33) are given by

	$\displaystyle A_{p}$	$\displaystyle=\left[\begin{array}[]{cc}1.2&-0.5\\ 1&0\end{array}\right],\ B_{p}=[-0.3,0]^{\top},\ C_{p}=[0.2,0],$		(46)
	$\displaystyle A_{c}$	$\displaystyle=\left[\begin{array}[]{cc}1&1\\ 0&0.1\end{array}\right],\ B_{c}=[0,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{-1}}]^{\top},\ C_{c}=[{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{1.5}},0].$		(49)

The integral property of the controller enhances the low-frequency tracking performance. The Bode gain diagram is shown in Fig. 5. Suppose that the reference $r$ is the signals to be protected, and it is public information that its spectrum is concentrated over the frequency range below $3\times 10^{-2}\ {\rm rad/s}$ . To represent this prior information, we took the frequency model for $r$ as in Example 5, which is set to be a lowpass filter (generated by lowpass(xi, 3e-2) in MATLAB). Recall that $U_{T}\sim{\mathcal{N}}(0,\Sigma_{U})$ with (22) and (23).

We design input noise to make the system Bayesian differentially private for $\gamma=0.5,\ T=100,\ \varepsilon=100,\ \delta=0.1$ . This leads to

c(\gamma,T)=14.1657,\ {\mathsf{R}}(\varepsilon,\delta)=0.0774.

(50)

In what follows we compare the following three cases:

•

noise-free,

•

i.i.d. noise: ${\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{V}}_{T}\sim{\mathcal{N}}_{(T+1)m}(0,\sigma_{\rm iid}^{2}I)$ with

	$\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{{\rm Tr}(\sigma_{\rm iid}^{2}I)}}$	$\displaystyle={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{c(\gamma,T)^{2}{\mathsf{R}}(\varepsilon,\delta)^{2}\lambda_{\max}(\Sigma_{U})(T+1)m}}$
		$\displaystyle={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{121.604}},$		(51)

where $\sigma_{\rm iid}$ is the minimum $\sigma$ satisfying (42), and

•

the minimum noise obtained in Theorem 10: ${\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{V}}_{T}\sim{\mathcal{N}}_{(T+1)m}(0,\Sigma_{v}^{*}$ ), $\Sigma_{v}^{*}:=c(\gamma,T)^{2}{\mathsf{R}}(\varepsilon,\delta)^{2}\Sigma_{U}$ with

\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{{\rm Tr}(\Sigma_{v}^{*})=8.2574}}.

(52)

Fig. 6 shows the reference signal $r(t)$ and plant output $y_{p}(t)$ for these three cases. It can be seen that the output error for the noise-free case is the smallest. This is because the (realized) trajectory of $r$ is fully utilized allowing for some possibility that information about $r$ may leak from $y_{p}$ . On the other hand, the other two cases guarantee the same level of Bayesian differential privacy. Note that the error fluctuation is suppressed in the minimum noise case. Statistically, the output fluctuation caused by the added noise $v$ can be evaluated by (34). The value ${\rm Tr}(\Theta_{T}\Sigma_{v}\Theta_{T}^{\top})/(c(\gamma,T){\mathsf{R}}(\varepsilon,\delta))^{2}$ is given by ${\rm Tr}(\Theta_{T}\Sigma_{U}\Theta_{T}^{\top})={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{8.1998}}$ for the minimum noise case, which is smaller than $\lambda_{\max}(\Sigma_{U}){\rm Tr}(\Theta_{T}\Theta_{T}^{\top})={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{55.4202}}$ for the i.i.d. case.

The interpretation is as follows: The i.i.d. noise has uniform frequency distribution, which implies it adds more out-of-band noise than the minimum one. However, this out-of-band component does not contribute to the protection of $r$ since it is easily distinguished from $r$ thanks to its prior information in Fig. 5. Nevertheless, this out-of-band noise largely degrades the tracking performance.

Remark 13.

The out-of-band noise as in the i.i.d. case is effective when the prior distribution of the signals to be protected is not public information. That is, this noise can prevent the attacker from inferring the prior distribution e.g., via the empirical Bayes.

Remark 14.

Lastly, we would like to note that for the Gaussian prior $\mathcal{N}(0,\Sigma_{U})$ in the numerical example, a reference signal $U^{\prime}_{T}$ that deviates significantly from mean $0$ in the sense of $|0-U^{\prime}_{T}|_{\Sigma_{U}^{-1}}$ can be seen as an outlier. Therefore, out-of-band signals having large values $|U^{\prime}_{T}|_{\Sigma_{U}^{-1}}$ can be regarded as outliers. Bayesian differentially private mechanism provides privacy guarantees not only for in-band signals but also for out-of-band ones. In particular, the parameter $\gamma$ for Bayesian differential privacy determines the extent to which privacy guarantees can be provided for out-of-band signals.

VI Conclusion

In this study, we introduced Bayesian differential privacy for linear dynamical systems using prior distributions of input data to provide privacy guarantees even for input data pairs with large differences, and gave sufficient conditions to achieve it. Furthermore, we derived the minimum energy Gaussian noise that satisfies the condition. As noticed in Subsection III-C, any finite noise cannot guarantee the Bayesian differential privacy for the infinite horizon case. This issue will be addressed in future work.

Acknowledgment

This work was supported in part by JSPS KAKENHI under Grant Number JP21H04875.

References

[1] L. Sweeney, “k-anonymity: A model for protecting privacy,” Int. J. Unc. Fuzz. Knowl. Based Syst., vol. 10, no. 5, pp. 557–570, Oct. 2002.
[2] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “l-diversity: Privacy beyond k-anonymity,” ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, pp. 3–es, 2007.
[3] N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and l-diversity,” in Proc. 23rd IEEE Int. Conf. Data Eng., Apr. 2007, pp. 106–115.
[4] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Proc. 3rd Theory Cryptography Conf., 2006, pp. 265–284.
[5] H. Sandberg, G. Dán, and R. Thobaben, “Differentially private state estimation in distribution networks with smart meters,” in Proc. 54th IEEE Conf. Decis. Control, Dec. 2015, pp. 4492–4498.
[6] F. K. Dankar and K. El Emam, “The application of differential privacy to health data,” in Proc. Int. Conf. Extending Database Technol., Mar. 2012, pp. 158–166.
[7] M. Yang, A. Margheri, R. Hu, and V. Sassone, “Differentially private data sharing in a cloud federation with blockchain,” IEEE Cloud Comput., vol. 5, no. 6, pp. 69–79, Nov. 2018.
[8] J. Le Ny and G. J. Pappas, “Differentially private filtering,” IEEE Trans. Automat. Control, vol. 59, no. 2, pp. 341–354, Feb. 2014.
[9] Y. Kawano and M. Cao, “Design of privacy-preserving dynamic controllers,” IEEE Trans. Automat. Control, vol. 65, no. 9, pp. 3863–3878, Sep. 2020.
[10] J. Cortés, G. E. Dullerud, S. Han, J. Le Ny, S. Mitra, and G. J. Pappas, “Differential privacy in control and network systems,” in Proc. 55th IEEE Conf. Decis. Control, Dec. 2016, pp. 4252–4272.
[11] V. Katewa, F. Pasqualetti, and V. Gupta, “On privacy vs cooperation in multi-agent systems,” Int. J. of Control, vol. 91, no. 7, pp. 1693–1707, 2018.
[12] K. Ito, Y. Kawano, and K. Kashima, “Privacy protection with heavy-tailed noise for linear dynamical systems,” Automatica, vol. 131, p. 109732, 2021.
[13] A. Triastcyn and B. Faltings, “Bayesian differential privacy for machine learning,” in Proc. Int. Conf. Mach. Learn., Nov. 2020.
[14] Y. Kawano, K. Kashima, and M. Cao, “Modular control under privacy protection: Fundamental trade-offs,” Automatica, vol. 127, p. 109518, 2021.

Bayesian Differential Privacy for Linear Dynamical Systems

Abstract

Index Terms:

I Introduction

Notations

II Conventional differential Privacy for dynamical systems

Definition 1.

Definition 2 ((K,ε,δ)(K,\varepsilon,\delta)-differential privacy).

Theorem 3.

III Bayesian differential privacy for dynamical systems

III-A Formulation

Assumption 4.

Example 5.

Definition 6 ((ℙUT,γ,ε,δ)({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)-Bayesian differential privacy).

III-B Sufficient condition for noise scale

Theorem 7.

Proof.

Corollary 8.

Proof.

III-C Asymptotic analysis

IV Design of mechanism

IV-A Minimization of the noise

Problem 9.

Theorem 10.

Proof.

IV-B Input noise mechanism

Corollary 11.

Proof.

Remark 12.

V Numerical example

Remark 13.

Remark 14.

VI Conclusion

Acknowledgment

References

Definition 2 ( $(K,\varepsilon,\delta)$ -differential privacy).

Definition 6 ( $({\mathbb{P}}_{U_{T}},\gamma,\varepsilon,\delta)$ -Bayesian differential privacy).