Enhancing Utility in the Watchdog Privacy Mechanism

Abstract

This paper is concerned with enhancing data utility in the privacy watchdog method for attaining information-theoretic privacy. For a specific privacy constraint, the watchdog method filters out the high-risk data symbols through applying a uniform data regulation scheme, e.g., merging all high-risk symbols together. While this method entirely trades the symbols resolution off for privacy, we show that the data utility can be greatly improved by partitioning the high-risk symbols set and individually privatizing each subset. We further propose an agglomerative merging algorithm that finds a suitable partition of high-risk symbols: it starts with a singleton high-risk symbol, which is iteratively fused with others until the resulting subsets are private. Numerical simulations demonstrate the efficacy of this algorithm in privately achieving higher utilities in the watchdog scheme.

Index Terms— Information-theoretic privacy; Watchdog privacy mechanism; Privacy-utility trade-off.

1 Introduction

Industries and governments are increasingly sharing data to unlock economic and societal benefits through advances in data analytics and machine learning. However, such data also contains sensitive information about individuals or businesses, which makes the privacy regulators, users, and data providers concerned about the leakage of confidential information, either explicitly or implicitly. In signal processing and information theory, data privacy is underpinned in terms of a measure called information lift[1, 2].

To evaluate how much private data $X$ is informative about the confidential data $S$ , the lift measures the change in the posterior belief $p(s|x)$ from the prior belief $p(s)$ for each instance of $s$ and $x$ by

l(s,x)=\frac{p(s|x)}{p(s)}.

(1)

It is clear that a small lift indicates limited private information gain by the adversary and therefore the more private $X$ is. The lift is the elementary measure in almost all information leakage measures, e.g., mutual information [1, 3], Sibson mutual information [4, 5, 6], $\alpha$ -leakage [7] and local differential privacy [8, 9]: as proved in [2], if lift is bounded, all these leakage measures are also bounded.

$\varepsilon=1$	utility	privacy leakage
complete merging	0.5913	0.8037
two-subset merging	0.7335	0.8488

Table 1: An example of how subset merging can enhance utility in the watchdog mechanism.

The existing approach to attain lift-based privacy is the watchdog method proposed in [2]. For a specific privacy constraint, i.e., a threshold $\epsilon$ on the lift, the watchdog method filters out and privatizes the high-risk symbols of $X$ . The authors in [2] adopted a uniform approach: merging all high-risk symbols together into a ‘super’ symbol, which is proved in [10, 6] to be the optimal privatization scheme in attaining data privacy. However, this uniform approach neglects an important issue in data privacy: to achieve the benefits of data sharing, the privatized data should provide a satisfactory level of usefulness in $X$ .¹¹1The data utility is usually quantified by average performance measures such as mutual information [3], $f$ -divergence [11] and Hamming distortion [12]. We use mutual information in this paper. Despite the relaxation attempts in [10, 6], the complete merging method minimizes the resolution of high-risk symbols and significantly deteriorates data utility, which is at odds with the purpose of data sharing.

In fact, even a small alteration can greatly enhance the data utility. In Table 1, we arbitrarily cut the high-risk symbol set (of size 7) into two subsets, each of which is then privatized individually. The utility (measured by mutual information) is increased significantly without sacrificing too much data privacy, which remains below the design constraint $\epsilon$ . This not only shows that the complete merging approach is an ‘overkill’ in terms of data utility, but also suggests a partitioned privatization approach.

In this paper, we propose the subset merging method for watchdog mechanism to enhance the data utility. Finding the best partition of high-risk symbols set to achieve optimal utility is generally a combinatorial problem. Accordingly, we propose a heuristic greedy algorithm to find good subsets to merge, which guarantees data privacy. To do so, this greedy algorithm tries to search the finest partition of the original high-risk symbols set that ensures the resulting lift of the whole dataset does not exceed $\epsilon$ . It starts with the singleton partition of high-risk symbols (highest resolution) and iteratively merges symbols until lift values of the resulting subsets are all below $\epsilon$ . Numerical simulations show that our proposed algorithm enhances utility significantly and maintains the privacy leakage constraint.

2 System Model

Denote random variables $S$ and $X$ the sensitive and public data, respectively. The joint distribution $p(s,x)$ describes the statistical correlation between $S$ and $X$ . To protect the privacy of $S$ , we sanitize $X$ to $Y$ by the transition probability $p(y|x)$ . Here, for each $x,y$ , $p(y|x)=p(y|x,s),\forall s$ and therefore the Markov chain $S\rightarrow X\rightarrow Y$ is formed.

The watchdog method is based on the logarithm of the lift measure

i(s;x)=\log{l(s;x)}.

For each $x\in\mathcal{X}$ , denote the maximum symbol-wise information leakage by $\max_{s\in\mathcal{S}}|i(s,x)|$ , where

\omega(x)=\max_{s\in\mathcal{S}}|i(s,x)|.

(2)

Applying an upper bound $\epsilon$ to $\omega(x)$ for all symbols $x\in\mathcal{X}$ , the whole alphabet $\mathcal{X}$ is divided into two subsets: the low-risk subset is given by $\mathcal{X}_{\varepsilon}\triangleq\{x\in\mathcal{X}:\omega(x)\leq\varepsilon\}$ that is safe to publish, and the high-risk symbol set

\mathcal{X}_{\varepsilon}^{c}=\mathcal{X}\setminus\mathcal{X}_{\varepsilon}\triangleq\{x\in\mathcal{X}:\omega(x)>\varepsilon\}

that requires some treatment before the data publishing.

The authors in [10, 6] adopt a uniform randomization scheme

p(y|x)=\begin{cases}1_{\{x=y\}}&x,y\in\mathcal{X}_{\varepsilon},\\ R(y)&x,y\in\mathcal{X}_{\varepsilon}^{c},\\ 0&\text{otherwise},\end{cases}

(3)

where $R(y)$ is complete merging solution, e.g., where there is only one super symbol $y^{*}\in\mathcal{X}_{\varepsilon}^{c}$ such that $R(y^{*})=1$ for all $x\in\mathcal{X}_{\varepsilon}^{c}$ and $R(y)=0$ otherwise.

1 Input:

\mathcal{X},\varepsilon,p(s,x)

2 Output:

\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}=\{\mathcal{X}_{1},\mathcal{X}_{2},\cdots\,\mathcal{X}_{p}\}

3 Initialize: Obtain

\{\mathcal{X}_{\varepsilon},\mathcal{X}_{\varepsilon}^{c}\}

\mathcal{X}_{Q}\leftarrow\mathcal{X}_{\varepsilon}^{c}

, and

p=1

while $|\mathcal{X}_{Q}|>0$ do

\mathcal{X}_{p}=\{\displaystyle\operatorname*{arg\,max}_{x\in\mathcal{X}_{Q}}{\omega(x)}\}

, and

\mathcal{X}_{Q}\leftarrow\mathcal{X}_{Q}\setminus\mathcal{X}_{p}

;

5 while $\omega({\mathcal{X}_{p}})>\varepsilon$ & $|\mathcal{X}_{Q}|>0$ do

x^{*}=\displaystyle\operatorname*{arg\,min}_{x\in\mathcal{X}_{Q}}\omega(\mathcal{X}_{p}\cup\{x\})

\mathcal{X}_{p}\leftarrow\mathcal{X}_{p}\cup\{x^{*}\}

and

\mathcal{X}_{Q}\leftarrow\mathcal{X}_{Q}\setminus\{x^{*}\}

;

8 end while

\mathcal{P}_{\mathcal{X}_{Q}}=\{\mathcal{X}_{1},\mathcal{X}_{2},\cdots,\mathcal{X}_{p}\}

, and

p\leftarrow p+1

10 end while

12while $\omega(\mathcal{X}_{p})>\varepsilon$ & $|\mathcal{P}_{\mathcal{X}_{Q}}|>1$ do

\mathcal{X}_{k}=\displaystyle\operatorname*{arg\,min}_{1<i<p}\omega(\mathcal{X}_{p}\cup\mathcal{X}_{i})

;

\mathcal{X}_{p}\leftarrow\mathcal{X}_{p}\cup\mathcal{X}_{k}

; for

k+1\leq i\leq p

update the index of

\mathcal{X}_{i}

’s to

\mathcal{X}_{i-1}

\mathcal{P}_{\mathcal{X}_{Q}}=\{\mathcal{X}_{1},\mathcal{X}_{2},\cdots,\mathcal{X}_{p}\}

16 end while

Algorithm 1 Make a refinement of

\mathcal{X}_{\varepsilon}^{c}

After randomization, the log-lift is given by $i(s,y)=\log\frac{p(y|s)}{p(y)}$ where $p(y|s)=\sum_{x\in\mathcal{X}}p(y|x)p(x|s)$ due to the Markov property and $p(y)=\sum_{x\in\mathcal{X}}p(y|x)p(x)$ . We can extend the notion of the log-lift, and $\omega(x)$ from a single $x\in\mathcal{X}$ to a subset $\mathcal{X}_{Q}\subseteq\mathcal{X}$ [10]:

i(s,\mathcal{X}_{Q})=\log\frac{p(\mathcal{X}_{Q}|s)}{p(\mathcal{X}_{Q})},\quad\omega(\mathcal{X}_{Q})=\max_{s\in\mathcal{S}}|i(s,\mathcal{X}_{Q})|,

(4)

where $p(\mathcal{X}_{Q}|s)=\sum_{x\in\mathcal{X}_{Q}}p(x|s)$ and $p(\mathcal{X}_{Q})=\sum_{x\in\mathcal{X}_{Q}}p(x)$ .

Applying $p(y|x)$ , the value of $\max_{y\in\mathcal{Y}}\omega(y)$ as the upper bound on privacy leakage after randomization is given by [10]

\max_{y\in\mathcal{Y}}\omega(y)=\max\{\max_{y\in\mathcal{X}_{\varepsilon}}\omega(y),\omega(\mathcal{X}_{\varepsilon}^{c})\}.

(5)

It is obvious that $\max_{y\in\mathcal{X}_{\varepsilon}}\omega(y)\leq\varepsilon$ , so it attains the privacy constraint. However, the value of $\omega(\mathcal{X}_{\varepsilon}^{c})$ is variable and depends on the joint probability distribution.

Example 1

Let $\mathcal{X}=\{x_{1},x_{2},\cdots,x_{5}\}$ , $\mathcal{S}=\{s_{1},s_{2},s_{3}\}$ , and $\varepsilon=0.8$ . We randomly generate a joint distribution $p(s,x)$ and the resulting maximum symbol-wise leakage and utility are $\omega(x)=[1.3515,1.6458,0.9295,0.8161,0.2608]$ and $H(X)=1.6034$ , respectively. For the given $\varepsilon$ , the low-risk and high-risk subsets are given by $\mathcal{X}_{\varepsilon}=\{x_{5}\}$ and $\mathcal{X}_{\varepsilon}^{c}=\{x_{1},x_{2},x_{3},x_{4}\}$ , respectively. After randomization, assume high-risk symbols are mapped to $y^{*}$ where $y^{*}=y_{1}=x_{1}$ and $y_{2}=x_{5}$ , then the maximum symbol-wise leakage and utility are given by $\omega(y)=[0.0627,0.2608]$ and $I(X;Y)=0.5269$ , respectively.

In Example 1, the leakage in the high-risk subset after randomization is $\omega(\mathcal{X}_{\varepsilon}^{c})=0.0627$ , which is an order of magnitude smaller than the original privacy constraint $\varepsilon=0.8$ , based on which $\mathcal{X}_{\varepsilon}^{c}$ was obtained. Although this small leakage guarantees a very high level of privacy, it damages utility drastically, the utility decreases from $H(X)=1.6034$ to $I(X;Y)=0.5269$ . On the other hand, when a threshold $\varepsilon$ is given as the privacy constraint, it is acceptable to just keep the privacy leakage less than $\varepsilon$ , even if it becomes very close to this threshold. Thus, in the next section, we propose a subset merging method to enhance utility where the privacy leakage increases, but remains under $\varepsilon$ to the extent possible.

3 Enhancing Utility

Fig. 1: Privacy leakage for different values of

\varepsilon

: The mean values of

\max_{y\in\mathcal{X}_{\varepsilon}^{c}}\omega(y)

are shown with standard deviation. Algorithm 1 increases privacy leakage in

\mathcal{X}_{\varepsilon}^{c}

in comparison with complete merging, however, in all cases it is still below the constraint

\varepsilon

In this section, we introduce an approach to enhance utility while maintaining a set privacy constraint. We measure utility by mutual information which for the bi-partition $\{\mathcal{X}_{\varepsilon},\mathcal{X}_{\varepsilon}^{c}\}$ and complete merging randomization $\mathcal{X}_{\varepsilon}^{c}$ is given by [10]

I(X;Y)=H(X)+\sum_{x\in\mathcal{X}_{\varepsilon}^{c}}p(x)\log\frac{p(x)}{p\left(\mathcal{X}_{\varepsilon}^{c}\right)}.

(6)

Clearly, the utility depends on $p(x)$ for $x\in\mathcal{X}_{\varepsilon}^{c}$ and the overall $p(\mathcal{X}_{\varepsilon}^{c})$ . Our proposed approach to enhance data utility is through increasing data resolution. That is, we propose to randomize subsets of $\mathcal{X}_{\varepsilon}^{c}$ separately rather than the complete merging of the whole set $\mathcal{X}_{\varepsilon}^{c}$ .

Let $[p]=\{1,2,\cdots,p\}$ and consider a bi-partition $\{\mathcal{X}_{\varepsilon},\mathcal{X}_{\varepsilon}^{c}\}$ , a further partitioning of elements in $\mathcal{X}_{\varepsilon}^{c}$ denoted by $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}=\{\mathcal{X}_{1},\cdots,\mathcal{X}_{p}\}$ , and complete merging randomizations $R_{i}(y),i\in[p]$ where $\sum_{y\in\mathcal{X}_{i}}R_{i}(y)=1$ . In other words, we partition $\mathcal{X}_{\varepsilon}^{c}$ to subsets $\mathcal{X}_{i}$ , so $\mathcal{X}_{\varepsilon}^{c}=\cup_{i=1}^{p}\mathcal{X}_{i}$ and each subset $\mathcal{X}_{i}$ is randomized by the corresponding randomization $R_{i}(y)$ . Note that for $y\in\mathcal{X}_{i}$ we have $p(y)=\sum_{x\in\mathcal{X}_{i}}p(y|x)p(x)=p(\mathcal{X}_{i})R_{i}(y)$ . The resulting mutual information $I(X;Y)$ is

I(X;Y)=H(X)+\sum_{i=1}^{p}\sum_{x\in\mathcal{X}_{i}}p(x)\log\frac{p(x)}{p\left(\mathcal{X}_{i}\right)}.

(7)

Then the normalized mutual information-loss for partition $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}$ on $\mathcal{X}_{\varepsilon}^{c}$ is defined as

\displaystyle\text{NMIL}(\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}})

\displaystyle=\frac{H(X)-I(X;Y)}{H(X)}.

(8)

Since $p(\mathcal{X}_{i})\leq p(\mathcal{X}_{\varepsilon}^{c})$ for $i\in[p]$ the data resolution increases for each $x\in\mathcal{X}_{i}$ and this can results in a larger mutual information and hence a lower utility loss. The following definition and derivations formalize this observation.

Definition 1

Assume two partitions $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}=\{\mathcal{X}_{1},\cdots,\mathcal{X}_{p}\}$ and $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}^{{}^{\prime}}=\{\mathcal{X}_{1}^{{}^{\prime}},\cdots,\mathcal{X}_{p^{\prime}}^{{}^{\prime}}\}$ . We say $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}^{{}^{\prime}}$ is a refinement of $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}$ and $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}$ is an aggregation of $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}^{{}^{\prime}}$ [13], if for every $i\in[p],$ $\mathcal{X}_{i}=\cup_{j\in J_{i}}\mathcal{X}_{j}^{{}^{\prime}}$ where $J_{i}\subseteq[p^{\prime}]$ , and we have $p(\mathcal{X}_{i})=\sum_{j\in J_{i}}{p(\mathcal{X}_{j}^{{}^{\prime}})}$ .

If $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}^{{}^{\prime}}$ is a refinement of $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}$ and $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}$ is an aggregation of $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}^{{}^{\prime}}$ then $\text{NMIL}(\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}^{{}^{\prime}})\leq\text{NMIL}(\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}})$ . This is because

	$\displaystyle H(X)\times\text{NMIL}(\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}})=\sum_{i=1}^{p}\sum_{x\in\mathcal{X}_{i}}p(x)\log\frac{p(\mathcal{X}_{i})}{p(x)}$		(9)
	$\displaystyle\geq\sum_{i=1}^{p^{\prime}}\sum_{x\in\mathcal{X}_{j}}p(x)\log\frac{p(\mathcal{X}_{j})}{p(x)}=H(X)\times\text{NMIL}(\mathcal{P^{\prime}}_{\mathcal{X}_{\varepsilon}^{c}}).$

Generally, finding an optimal partition $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}$ that maximizes utility while maintaining the privacy constraint is combinatorial since it depends on the high-risk symbol probabilities $p(x)$ , $x\in\mathcal{X}_{\varepsilon}^{c}$ and the joint probability distribution. However, we can use $\varepsilon$ as a stop criteria to make a heuristic agglomerative algorithm for obtaining a refinement of $\mathcal{X}_{\varepsilon}^{c}$ to enhance utility as much as possible.

3.1 A greedy algorithm to refine the high-risk subset $\mathcal{X}_{\varepsilon}^{c}$

Knowing that the aggregation of the partition of $\mathcal{X}_{\varepsilon}^{c}$ reduces the symbol resolution and data utility, we propose a heuristic greedy algorithm that determines the most refined partition of $\mathcal{X}_{\varepsilon}^{c}$ that satisfies the data privacy constraint specified by $\epsilon$ . This is a bottom-up algorithm, which bootstraps from the most refined (singleton-element) partition of $\mathcal{X}_{\varepsilon}^{c}$ . This starting point provides the highest resolution/utility but results in a lowest data privacy level. We let the subsets in the singleton partition merge with each other to reduce the log-lift measure $\omega(\mathcal{X}_{i}),\mathcal{X}_{i}\in\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}$ until the log-lift of all subsets is reduced below $\epsilon$ . To achieve a low complexity, but effective procedure that results in a finer partition of $\mathcal{X}_{\varepsilon}^{c}$ , we implement the subset merging in order. A good candidate to begin with is the most challenging symbol with the highest log-lift leakage.

The pseudo-code of our method is shown in Algorithm 1. To find each subset $\mathcal{X}_{i}\subseteq\mathcal{X}_{\varepsilon}^{c}$ , we start with $\operatorname*{arg\,max}_{x\in\mathcal{X}_{\varepsilon}^{c}}\omega(x)$ as the symbol with the highest risk in $\mathcal{X}_{\varepsilon}^{c}$ (line 4) and then to make the leakage of the subset $\omega(\mathcal{X}_{i})$ less than $\varepsilon$ we merge another symbol to it that minimizes $\omega(\mathcal{X}_{i})$ in each iteration (lines 5-8). Each symbol that is added to $\mathcal{X}_{i}$ is removed from $\mathcal{X}_{\varepsilon}^{c}$ (line 7). When a subset $\mathcal{X}_{i}$ is made, we add it to the partition set $\mathcal{P}_{\mathcal{X}_{\varepsilon}^{c}}$ and repeat the same process for the remaining $x\in\mathcal{X}_{\varepsilon}^{c}$ . After making all subsets, there is a possibility that for the last subset $\mathcal{X}_{p}$ the leakage is greater than $\varepsilon$ . That is, $\omega(\mathcal{X}_{p})>\varepsilon$ . If this happens, while $\omega(\mathcal{X}_{p})>\varepsilon$ we make an agglomerate $\mathcal{X}_{p}$ by merging a subset to it that minimizes $\omega(\mathcal{X}_{p})$ (lines 11-15). A complete merging is a special output of our algorithm if no better finer partition can be found that maintains the privacy.

Fig. 2: Normalized Utility loss (NMIL) for different values of

\varepsilon

: The mean values of NMIL are shown with standard deviation. Algorithm 1 reduces NMIL in comparison with complete merging.

4 Experiments

To make a comparison between Algorithm 1 and complete merging, we randomly generated 1000 joint distributions $p(s,x)$ where $|\mathcal{X}|=20$ and $|\mathcal{S}|=13$ . For each distribution, after randomization we obtained maximum privacy leakage of high-risk symbols $\max_{y\in\mathcal{X}_{\varepsilon}^{c}}\omega(y)$ , maximum overall privacy leakage $\max_{y\in\mathcal{Y}}\omega(y)$ , and the utility loss NMIL under Algorithm 1 and complete merging for different values of $\varepsilon\in\{0.25,0.5,0.75,\cdots,2.25,2.5\}$ . Then for each $\varepsilon$ , we derived the mean value and standard deviation of $\max_{y\in\mathcal{X}_{\varepsilon}^{c}}\omega(y)$ , $\max_{y}\omega(y)$ , and NMIL across these 1000 observations.

In Fig. 1 the mean value of $\max_{y\in\mathcal{X}_{\varepsilon}^{c}}\omega(y)$ is depicted for each $\varepsilon$ , as well as its standard deviation (shown as tolerance bars). As expected, complete merging makes a strong guarantee on privacy leakage and keeps both mean and standard deviation much less than $\varepsilon$ in all cases. In contrast, algorithm 1 increases the privacy leakage and lets it be closer to $\varepsilon$ compared to complete merging, but crucially it still keeps the mean value and the corresponding deviation less than $\varepsilon$ in all cases. As the value of $\varepsilon$ increases, the standard deviation is also increased. This is because when $\varepsilon$ increases, the privacy constraint is less strict and consequently the size of $\mathcal{X}_{\varepsilon}^{c}$ decreases. As a result, the sample size to calculate the mean value of privacy leakage reduces, which causes a larger deviation.

Fig. 2 shows the normalized mutual information loss under Algorithm 1 and complete merging. It demonstrates that Algorithm 1 enhances utility substantially for each value of $\varepsilon$ .

Fig. 3: Privacy Utility trade-off: For each value of

\varepsilon

, algorithm 1 reduces NMIL substantially while maintain privacy leakage below

\varepsilon

Finally, to have a clear privacy-utility trade-off (PUT) comparison, we present PUT curves for both Algorithm 1 and complete merging in Fig. 3. For each $\varepsilon$ , the mean value of overall privacy leakage $\max_{y\in\mathcal{Y}}\omega(y)$ is shown versus the corresponding mean value of NMIL. Clearly, Algorithm 1 enhances utility significantly while satisfying the privacy leakage constraint. For a very strict constraint on privacy ( $\varepsilon\leq 0.5$ ), complete merging results in perfect privacy with total utility loss where $\text{NMIL}=1$ . However, Algorithm 1 keeps the average utility loss less than $1$ .

5 Conclusion

In this paper, we introduced a method to enhance utility in the watchdog privacy mechanism. We showed that it is possible to maintain the privacy constraint and improve utility. In our approach, instead of randomizing the whole high-risk partition, we randomized subsets of high-risk symbols separately. Then we proposed a heuristic greedy algorithm to find subsets of high-risk elements and at the same time keep the leakage of each subset less than the privacy constraint, $\varepsilon$ . The simulation results showed substantial utility enhancement and preservation of the privacy constraint.

In future research, it would be interesting to consider how to develop more efficient subset merging algorithms. Investigation of effective parameters like subset size and different privacy and utility measures could be helpful. It would be also beneficial to apply the proposed algorithm to real world data sets and measure actual practical utility. It would be also beneficial to apply the proposed algorithm to real world datasets and measure the obtained utility.

References

[1] F. d. P. Calmon and N. Fawaz, “Privacy against statistical inference,” in Proc. Annu. Allerton Conf. Commun., Control, and Comput., Monticello, IL, 2012, pp. 1401–1408.
[2] H. Hsu, S. Asoodeh, and F. d. P. Calmon, “Information-theoretic privacy watchdogs,” in Proc. IEEE Int. Symp. Inf. Theory, Paris, France, 2019, pp. 552–556.
[3] A. Makhdoumi, S. Salamatian, N. Fawaz, and M. Médard, “From the information bottleneck to the privacy funnel,” in Proc. IEEE Inf. Theory Workshop, Hobart, TAS, 2014, pp. 501–505.
[4] R. Sibson, “Information radius,” Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 14, no. 2, pp. 149–160, 1969.
[5] S. Verdú, “ $\alpha$ -mutual information,” in Proc. Information Theory and Applications Workshop (ITA), San Diego, CA, 2015, pp. 1–6.
[6] N. Ding, M. A. Zarrabian, and P. Sadeghi, “ $\alpha$ -information-theoretic privacy watchdog and optimal privatization scheme,” in Proc. IEEE Int. Symp. Inf. Theory, 2021, pp. 2584–2589.
[7] J. Liao, O. Kosut, L. Sankar, and F. d. P. Calmon, “Tunable measures for information leakage and applications to privacy-utility tradeoffs,” IEEE Trans. Inf. Theory, vol. 65, no. 12, pp. 8043–8066, 2019.
[8] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy and statistical minimax rates,” in Proc. IEEE 54th Annu. Symp. Found. Comput. Sci., 2013, pp. 429–438.
[9] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, “What can we learn privately?,” SIAM Journal on Computing, vol. 40, no. 3, pp. 793–826, 2011.
[10] P. Sadeghi, N. Ding, and T. Rakotoarivelo, “On properties and optimization of information-theoretic privacy watchdog,” in Proc. IEEE Inf. Theory Workshop, 2020.
[11] P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local differential privacy,” in Adv. Neural Inf. Process. Syst, jan 2014, vol. 4, pp. 2879–2887.
[12] A. D. Sarwate and L. Sankar, “A rate-disortion perspective on local differential privacy,” in Proc. Annu. Allerton Conf. Commun., Control, and Comput., 2014, pp. 903–908.
[13] Y. Liu, P. Sadeghi, F. Arbabjolfaei, and Y. H. Kim, “Capacity theorems for distributed index coding,” IEEE Trans. Inf. Theory, vol. 66, no. 8, pp. 4653–4680, 2020.

Enhancing Utility in the Watchdog Privacy Mechanism

Abstract

1 Introduction

2 System Model

Example 1

3 Enhancing Utility

Definition 1

3.1 A greedy algorithm to refine the high-risk subset 𝒳εc\mathcal{X}_{\varepsilon}^{c}

4 Experiments

5 Conclusion

References

3.1 A greedy algorithm to refine the high-risk subset $\mathcal{X}_{\varepsilon}^{c}$