No-Go Theorems for Data Privacy

Thomas Studer supported by the Swiss National Science Foundation grant 200020_184625.

(Institute of Computer Science
University of Bern
Bern
Switzerland
[email protected])

Abstract

Controlled query evaluation (CQE) is an approach to guarantee data privacy for database and knowledge base systems. CQE-systems feature a censor function that may distort the answer to a query in order to hide sensitive information. We introduce a high-level formalization of controlled query evaluation and define several desirable properties of CQE-systems. Finally we establish two no-go theorems, which show that certain combinations of these properties cannot be obtained.

1 Introduction

Controlled query evaluation (CQE) refers to a data privacy mechanism where the database (or knowledge base) is equipped with a censor function. This censor checks for each query whether the answer to the query would reveal sensitive information to a user. If this is the case, then the censor will distort the answer. Essentially, there are two possibilities how an answer may be distorted:

1.

the CQE-system may refuse to answer the query [18] or
2.

the CQE-system may give an incorrect answer, i.e. it lies [10].

This censor based approach has the advantage that the task of maintaining privacy is separated from the task of keeping the data. This gives more flexibility than an integrated approach (like hiding rows in a database) and guarantees than no information is leaked through otherwise unidentified inference channels. Controlled query evaluation has been applied to a variety of data models and control mechansims, see, e.g. [5, 6, 7, 8, 9, 21].

No-go theorems are well-known in theoretical physics where they describe particular situations that are not physically possible. Often the term is used for results in quantum mechanics like Bell’s theorem [4], the Kochen–Specker theorem [15], or, for a more recent example, the Frauchiger–Renner paradox [12]. Nurgalieva and del Rio [16] provide a modal logic analysis of the latter paradox. Arrow’s theorem [2] in social choice theory also is a no-go theorem stating that no voting system can be designed that meets certain given fairness conditions. Pacuit and Yang [17] present a version of independence logic in which Arrow’s theorem is derivable.

In the present paper we develop a highly abstract model for dynamic query evaluation systems like CQE. We formulate several desirable properties of CQE-systems in our framework and establish two no-go theorems saying that certain combinations of those properties are impossible. The main contribution of this paper is the presentation of the abstract logical framework as well as the high-level formulation of the no-go theorems. Note that some particular instances of our results have already been known [5, 21].

There are many different notions of privacy available in the literature. For our results, we rely on provable privacy [19, 20], which is a rather weak notion of data privacy. Note that using a weak definition of privacy makes our impossibility theorems actually stronger since they state that under certain conditions not even this weak form of privacy can be achieved.

Clearly our work is also connected to the issues of lying and deception. Logics dealing with these notions are introduced and studied, e.g., in [1, 22, 13].

2 Logical Preliminaries

Let $X$ be a set. We use $\mathcal{P}(X)$ to denote the power set of $X$ . For sets $\Gamma$ and $\Delta$ we use $\Gamma,\Delta$ for $\Gamma\cup\Delta$ . Moreover, in such a context we write $A$ for the singleton set $\{A\}$ . Hence $\Gamma,A$ stands for $\Gamma\cup\{A\}$ .

Definition 1.

A logic $\mathsf{L}$ is given by

1.

a set of formulas $\mathsf{Fml}_{\mathsf{L}}$ and
2.
a consequence relation $\vdash_{\mathsf{L}}$ for $\mathsf{L}$ that is a relation between sets of formulas and formulas, i.e. $\vdash_{\mathsf{L}}\ \subseteq\ \mathcal{P}(\mathsf{Fml}_{\mathsf{L}})\times\mathsf{Fml}_{\mathsf{L}}$ satisfying for all $A,C\in\mathsf{Fml}_{\mathsf{L}}$ and $\Gamma,\Delta\in\mathcal{P}(\mathsf{Fml}_{\mathsf{L}})$ :
1. (a)
  
  reflexivity: $\{A\}\vdash_{\mathsf{L}}A$ ;
2. (b)
  
  weakening: $\Gamma\vdash_{\mathsf{L}}A\ \Longrightarrow\ \Gamma,\Delta\vdash_{\mathsf{L}}A$ ;
3. (c)
  
  transitivity: $\Gamma\vdash_{\mathsf{L}}C\text{ and }\Delta,C\vdash_{\mathsf{L}}A\ \Longrightarrow\ \Gamma,\Delta\vdash_{\mathsf{L}}A$ .

Transitivity is sometimes called cut. The previous definition gives us single conclusion consequence relations, which is sufficient for the purpose of this paper. For other notions of consequence relations see, e.g., [3] and [14].

As usual, we write $\vdash_{\mathsf{L}}A$ for $\emptyset\vdash_{\mathsf{L}}A$ . A formula $A$ is called a theorem of $\mathsf{L}$ if $\vdash_{\mathsf{L}}A$ .

We do not specify the logic $\mathsf{L}$ any further. The only thing we need is a consequence relation as given above. For instance, $\mathsf{L}$ may be classical propositional logic with $\vdash_{\mathsf{L}}$ being the usual derivation relation (see Section 4) or $\mathsf{L}$ may be a description logic with $\vdash_{\mathsf{L}}$ being its semantic consequence relation [21].

Definition 2.

1.

A logic $\mathsf{L}$ is called consistent if there exists a formula $A\in\mathsf{Fml}_{\mathsf{L}}$ such that $\not{\vdash_{\mathsf{L}}}A$ .
2.

A set $\Gamma$ of $\mathsf{Fml}_{\mathsf{L}}$ -formulas is called $\mathsf{L}$ -consistent if there exists a formula $A\in\mathsf{Fml}_{\mathsf{L}}$ such that $\Gamma\not{\vdash_{\mathsf{L}}}A$ .

We need a simple modal logic $\mathsf{M}$ over $\mathsf{L}$ .

Definition 3.

The set of formulas $\mathsf{Fml}_{\mathsf{M}}$ is given inductively by:

1.

if $A$ is a formula of $\mathsf{Fml}_{\mathsf{L}}$ , then $\Box A$ is a formula of $\mathsf{Fml}_{\mathsf{M}}$ ;
2.

$\bot$ is a formula of $\mathsf{Fml}_{\mathsf{M}}$ ;
3.

if $A$ and $B$ are formulas of $\mathsf{Fml}_{\mathsf{M}}$ , so is $A\to B$ , too.

We define the remaining classical connectives $\top$ , $\land$ , $\lor$ , and $\lnot$ as usual. Note that $\mathsf{M}$ is not a fully-fledged modal logic. For instance, it does not include nested modalities.

We give semantics to $\mathsf{Fml}_{\mathsf{M}}$ -formulas as follows.

Definition 4.

An $\mathsf{M}$ -model $\mathcal{M}$ is a set of sets of $\mathsf{Fml}_{\mathsf{L}}$ -formulas, that is

\mathcal{M}\subseteq\mathcal{P}(\mathsf{Fml}_{\mathsf{L}}).

Definition 5.

Let $\mathcal{M}$ be an $\mathsf{M}$ -model. Truth of an $\mathsf{Fml}_{\mathsf{M}}$ -formula in $\mathcal{M}$ is inductively defined by:

1.

$\mathcal{M}\Vdash\Box A$ iff $w\vdash_{\mathsf{L}}A$ for all $w\in\mathcal{M}$ ;
2.

$\mathcal{M}\not\Vdash\bot$ ;
3.

$\mathcal{M}\Vdash A\to B$ iff $\mathcal{M}\not\Vdash A$ or $\mathcal{M}\Vdash B$ .

We use the following standard definition.

Definition 6.

Let $\Gamma$ be a set of $\mathsf{Fml}_{\mathsf{M}}$ -formulas.

1.

We write $\mathcal{M}\Vdash\Gamma$ iff $\mathcal{M}\Vdash A$ for each $A\in\Gamma$ .
2.

$\Gamma$ is called satisfiable iff there exists an $\mathsf{M}$ -model $\mathcal{M}$ with $\mathcal{M}\Vdash\Gamma$ .
3.

$\Gamma$ entails a formula $A$ , in symbols $\Gamma\models A$ , iff for each model $\mathcal{M}$ we have that

$\mathcal{M}\Vdash\Gamma\quad\text{implies}\quad\mathcal{M}\Vdash A.$

3 Privacy

Definition 7.

A privacy configuration is a triple $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ that consists of:

1.

the knowledge base $\mathsf{KB}\subseteq\mathsf{Fml}_{\mathsf{L}}$ , which is only accessible via the censor;
2.

the set of a priori knowledge $\mathsf{AK}\subseteq\mathsf{Fml}_{\mathsf{M}}$ , which formalizes general background knowledge known to the attacker and the censor;
3.

the set of secrets $\mathsf{Sec}\subseteq\mathsf{Fml}_{\mathsf{L}}$ , which should be protected by the censor.

A privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ satisfies the following conditions:

1.

$\mathsf{KB}$ is $\mathsf{L}$ -consistent (consistency);
2.

$\{\mathsf{KB}\}\Vdash\mathsf{AK}$ (truthful start);
3.

$\mathsf{AK}\not\models\Box s$ for each $s\in\mathsf{Sec}$ (hidden secrets).

Note that in the above definition, $\mathsf{KB}$ and $\mathsf{Sec}$ are sets of $\mathsf{Fml}_{\mathsf{L}}$ -formulas while $\mathsf{AK}$ is a set of $\mathsf{Fml}_{\mathsf{M}}$ -formulas. Thus $\mathsf{AK}$ may not only contain domain knowledge but also knowledge about the structure of $\mathsf{KB}$ . This is further explained in Section 4.

A query to a knowledge base $\mathsf{KB}$ is simply a formula of $\mathsf{Fml}_{\mathsf{L}}$ .

Given a logic $\mathsf{L}$ , we can evaluate a query $q$ over a knowledge base $\mathsf{KB}$ . There are two possible answers: $t$ (true) and $u$ (unknown).

Definition 8.

The evaluation function $\mathsf{eval}$ is defined by:

\mathsf{eval}(\mathsf{KB},q):=\begin{cases}t&\text{if}\quad\mathsf{KB}\vdash_{\mathsf{L}}q\\ u&\text{otherwise}\end{cases}

If the language of the logic $\mathsf{L}$ includes negation, then one may also consider an evaluation function that can return the value $f$ (false), i.e. one defines $\mathsf{eval}(\mathsf{KB},q):=f$ if $\mathsf{KB}\vdash_{\mathsf{L}}\lnot q$ . However, in the general setting of this paper, we cannot include this case.

A censor has to hide the secrets. In order to achieve this, it can not only answer $t$ and $u$ to a query but also $r$ (refuse to answer). We denote the set of possible answers of a censor by

\mathbb{A}:=\{t,u,r\}.

Let $X$ be a set. Then $X^{\omega}$ denotes the set of infinite sequences of elements of $X$ .

Definition 9.

A censor is a mapping that assigns an answering function

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}:\mathsf{Fml}_{\mathsf{L}}^{\omega}\longrightarrow\mathbb{A}^{\omega}

to each privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ . By abuse of notation, we also call the answering function $\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}$ a censor. A sequence $q\in\mathsf{Fml}_{\mathsf{L}}^{\omega}$ is called query sequence.

Usually, the privacy configuration will be clear from the context. In that case we simply use $\mathsf{Cens}$ instead of $\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}$ .

Given a sequence $s$ , we use $s_{i}$ to denote its $i$ -th element. That is for a query sequence $q\in\mathsf{Fml}_{\mathsf{L}}^{\omega}$ , we use $q_{i}$ to denote the $i$ -th query and $\mathsf{Cens}(q)_{i}$ to denote the $i$ -th answer of the censor.

Example 10.

Let $A,B,C\in\mathsf{Fml}_{\mathsf{L}}$ . We define a privacy configuration with $\mathsf{KB}=\{A,C\}$ , $\mathsf{AK}=\emptyset$ , and $\mathsf{Sec}=\{C\}$ . A censor $\mathsf{Cens}$ yields an answering function $\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}$ , which applied to a query sequence $q=(A,B,C,\ldots)$ yields a sequence of answers, e.g.,

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)={t,u,r,\ldots}.

In this case, $\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}$ gives true answers since $\mathsf{eval}(\mathsf{KB},A)=t$ and $\mathsf{eval}(\mathsf{KB},B)=u$ and it protects the secret be refusing to answer the query $C$ .

Another option for the answering function would be to answer the third query with $u$ , i.e., it would lie (instead of refuse to answer) in order to protect the secret.

A further option would be to always refuse the answer, i.e.

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)={r,r,r,\ldots}.

This, of course, would be a trivial (and useless) answering function that would, however, preserve all secrets.

In this paper, we will consider continuous censors only, which are given as follows.

Definition 11.

A censor $\mathsf{Cens}$ is continuous iff for each privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ and for all query sequences $q,q^{\prime}\in\mathsf{Fml}_{\mathsf{L}}^{\omega}$ and all $n\in\omega$ we have that

q|_{n}=q^{\prime}|_{n}\quad\Longrightarrow\quad\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)|_{n}=\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q^{\prime})|_{n}

where for an infinite sequence $s=(s_{1},s_{2},\ldots)$ , we use $s|_{n}$ to denote the initial segment of $s$ of length $n$ , i.e. $s|_{n}=(s_{1},\ldots,s_{n})$ .

Continuity means that the answer of a censor to a query does not depend on future queries, see also Lemma 14.

A censor is called truthful if it does not lie.

Definition 12.

A censor $\mathsf{Cens}$ is called truthful iff for each privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ , all query sequences $q=(q_{1},q_{2},\ldots)$ , and all sequences

(a_{1},a_{2},\ldots)=\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)

we have that for all $i\in\omega$

a_{i}=\mathsf{eval}(\mathsf{KB},q_{i})\quad\text{or}\quad a_{i}=r.

Hence a truthful censor may refuse to answer a query in order to protect a secret but it will not give an incorrect answer.

In the modal logic $\mathsf{M}$ over $\mathsf{L}$ , we can express what knowledge one can gain from the answers of a censor to a query. This is called the content of the answer.

Definition 13.

Given an answer $a\in\mathbb{A}$ to a query $q\in\mathsf{Fml}_{\mathsf{L}}$ , we define its content as follows:

	$\displaystyle\mathsf{cont}(q,t)$	$\displaystyle:=\Box q$
	$\displaystyle\mathsf{cont}(q,u)$	$\displaystyle:=\lnot\Box q$
	$\displaystyle\mathsf{cont}(q,r)$	$\displaystyle:=\top$

Assume that we are given a privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ and a censor $\mathsf{Cens}$ . We define the content of the answers of the censor to a query sequence $q\in\mathsf{Fml}_{\mathsf{L}}^{\omega}$ up to $n\in\omega$ by

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q),n):=\bigcup_{1\leq i\leq n}\!\{\mathsf{cont}(q_{i},a_{i})\}\cup\mathsf{AK}

where $a=\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)$ . Note that here we have also included the a priori knowledge.

The following is a trivial observation showing the role of continuity.

Lemma 14.

Let $\mathsf{Cens}$ be a continuous censor. The content function is monotone in the second argument: for $m\leq n$ we have

\mathsf{cont}(\mathsf{Cens}(q),m)\subseteq\mathsf{cont}(\mathsf{Cens}(q),n).

We call a censor credible if it does not return contradicting answers.

Definition 15.

A censor $\mathsf{Cens}$ is called credible iff for each privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ and for every query sequence $q$ and every $n\in\omega$ , the set $\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q),n)$ is satisfiable.

Definition 16.

The full content of a knowledge base $\mathsf{KB}$ is given by

\mathsf{full}(\mathsf{KB}):=\bigcup_{A\in\mathsf{Fml}_{\mathsf{L}}}\mathsf{cont}(A,\mathsf{eval}(\mathsf{KB},A)).

Lemma 17.

For any knowledge base $\mathsf{KB}$ , we have that

\{\mathsf{KB}\}\Vdash\mathsf{full}(\mathsf{KB}).

Proof.

Let $A$ be an $\mathsf{Fml}_{\mathsf{L}}$ -formula. We dinstinguish:

1.

$\mathsf{KB}\vdash_{\mathsf{L}}A$ . Then $\Box A\in\mathsf{full}(\mathsf{KB})$ and further $\{\mathsf{KB}\}\Vdash\Box A$ .
2.

$\mathsf{KB}\not\vdash_{\mathsf{L}}A$ . Then $\lnot\Box A\in\mathsf{full}(\mathsf{KB})$ and further $\{\mathsf{KB}\}\Vdash\lnot\Box A$ .

∎

Lemma 18.

We let $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ be a privacy configuration. Further we let $\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}$ be a truthful censor. For every query sequence $q$ and $n\in\omega$ , we have that

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q),n)\subseteq\mathsf{full}(\mathsf{KB})\cup\{\top\}\cup\mathsf{AK}.

Proof.

By induction on $n$ . The base case $n=0$ is trivial since

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q),0)=\mathsf{AK}.

Induction step. Since $\mathsf{Cens}$ is truthful, we have

a_{n+1}\in\{r,\mathsf{eval}(\mathsf{KB},q_{n+1})\}.

We distinguish:

1.

$a_{n+1}=r$ . Then $\mathsf{cont}(\mathsf{Cens}(q),n+1)=\mathsf{cont}(\mathsf{Cens}(q),n)\cup\{\top\}$ and the claim follows immediately from the induction hypothesis.

$a_{n+1}=\mathsf{eval}(\mathsf{KB},q_{n+1})$ . Then

\begin{split}\mathsf{cont}(\mathsf{Cens}(q),n+1)=\ &\mathsf{cont}(\mathsf{Cens}(q),n)\ \cup\\ &\mathsf{cont}(q_{n+1},\mathsf{eval}(\mathsf{KB},q_{n+1})).\end{split}

The claim follows from the induction hypothesis and

\mathsf{cont}(q_{n+1},\mathsf{eval}(\mathsf{KB},q_{n+1}))\in\mathsf{full}(\mathsf{KB}),

which holds by Definition 16.

∎

The following corollary is a generalization of Cor. 30 in [21].

Corollary 19.

Every truthful censor is credible.

Proof.

Let $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ be a privacy configuration and $\mathsf{Cens}$ be a truthful censor for it. By Definition 7, we have $\{\mathsf{KB}\}\Vdash\mathsf{AK}$ . Thus by the two previous lemmas, we find that for each $n\in\omega$ ,

	$\displaystyle\{\mathsf{KB}\}\Vdash\$	$\displaystyle\mathsf{full}(\mathsf{KB})\cup\{\top\}\cup\mathsf{AK}$
and
		$\displaystyle\mathsf{full}(\mathsf{KB})\cup\{\top\}\cup\mathsf{AK}\supseteq\mathsf{cont}(\mathsf{Cens}(q),n),$

that means $\mathsf{cont}(\mathsf{Cens}(q),n)$ is satisfiable for each $n\in\omega$ and thus $\mathsf{Cens}$ is credible. ∎

There are several properties that a ‘good’ censor should fulfil. We call a censor effective if it protects all secrets.

Definition 20.

A censor $\mathsf{Cens}$ is called effective iff for each privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ and for every query sequence $q\in\mathsf{Fml}_{\mathsf{L}}^{\omega}$ and every $n\in\omega$ , we have

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q),n)\not\models\Box s\quad\text{for each $s\in\mathsf{Sec}$}

A ‘good’ censor should only distort an answer to a query when it is absolutely necessary, i.e. when giving the correct answer would leak a secret. We call such a censor minimally invasive.

Definition 21.

Let $\mathsf{Cens}$ be an effective and credible censor. This censor is called minimally invasive iff for each privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ and for each query sequence $q\in\mathsf{Fml}_{\mathsf{L}}^{\omega}$ , we have that whenever

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)_{i}\neq\mathsf{eval}(\mathsf{KB},q_{i}),

replacing

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)_{i}\quad\text{with}\quad\mathsf{eval}(\mathsf{KB},q_{i})

would lead to a violation of effectiveness or credibility, that is for any censor $\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}$ such that

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q)|_{i-1}=\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)|_{i-1}

and

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q)_{i}=\mathsf{eval}(\mathsf{KB},q_{i})

we have that for some $n$

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q),n)\models\Box s\quad\text{for some $s\in\mathsf{Sec}$}

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q),n)\text{ is not satisfiable}.

It is a trivial observation that a truthful, effective and minimally invasive censor has to answer the same query always in the same way.

Lemma 22.

Let $\mathsf{Cens}$ be a truthful, effective and minimally invasive censor. Further let $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ be a privacy configuration and $q$ be a query sequence with $q_{i}=q_{j}$ for some $i,j$ . Then

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)_{i}=\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)_{j}.

Consider a truthful, effective, continuous and minimally invasive censor and a given query sequence. If the censor lies to answer some query, then giving the correct answer would immediately reveal a secret.

Lemma 23.

Let $\mathsf{Cens}$ be a truthful, effective, continuous and minimally invasive censor. Further let $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ be a privacy configuration and $q$ be a query sequence. Let $i$ be the least natural number such that

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)_{i}\neq\mathsf{eval}(\mathsf{KB},q_{i}).

Let $\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}$ be such that

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q)|_{i-1}=\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)|_{i-1}

and

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q)_{i}=\mathsf{eval}(\mathsf{KB},q_{i}).

Then it holds that

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q),i)\models\Box s\quad\text{for some $s\in\mathsf{Sec}$}.

Proof.

Consider the query sequence $q^{\prime}$ given by $q^{\prime}_{j}:=q_{j}$ for $j<i$ and $q^{\prime}_{j}:=q_{i}$ for $j\geq i$ , i.e. $q^{\prime}$ has the form $(q_{1},q_{2},\ldots,q_{i-1},q_{i},q_{i},q_{i},\ldots)$ . In particular, we have $q|_{i}=q^{\prime}|_{i}$ . Thus by continuity of the censor we find

\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)|_{i}=\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q^{\prime})|_{i}.

Thus $\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q^{\prime})_{i}\neq\mathsf{eval}(\mathsf{KB},q_{i})$ . By the definition of minimally invasive we find that for some $n$

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q^{\prime}),n)\models\Box s\quad\text{for some $s\in\mathsf{Sec}$}

(1)

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q^{\prime}),n)\text{ is not satisfiable}.

(2)

Since the censor is truthful and by Corollary 19, we find that (2) is not possible. Thus (1) holds for some $n$ .

By the definition of $q^{\prime}$ and the previous lemma we find

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q^{\prime}),n)=\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q^{\prime}),i)

if $i\leq n$ . Thus, in case $i\leq n$ , (1) implies

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q),i)\models\Box s\quad\text{for some $s\in\mathsf{Sec}$}.

(3)

In case $i>n$ , we find by Lemma 14 that

\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q),n)\subseteq\mathsf{cont}(\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}^{\prime}(q),i).

Thus again (1) implies (3), which finishes the proof. ∎

Next we define the notion of a repudiating censor, which garantees that there is always a knowledge base in which no secret holds and which, given as input to the answering function, produces the same results as the actual knowledge base. Hence this definition provides a version of plausible deniability for all secrets.

Definition 24.

A censor $\mathsf{Cens}$ is called repudiating iff for each privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ and each query sequence $q$ , there are knowledge bases $\mathsf{KB}_{i}$ ( $i\in\omega$ ) such that

1.

$(\mathsf{KB}_{i},\mathsf{AK},\mathsf{Sec})$ is a privacy configuration for each $i\in\omega$ ;
2.

$\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)|_{n}=\mathsf{Cens}_{(\mathsf{KB}_{n},\mathsf{AK},\mathsf{Sec})}|_{n}$ , for each $n\in\omega$ ;
3.

$\mathsf{KB}_{i}\not\vdash_{\mathsf{L}}s$ for each $s\in\mathsf{Sec}$ and each $i\in\omega$ .

Now we can establish our first no-go theorem, which is a generalization of Th. 50 in [21].

Theorem 25 (First No-Go Theorem).

A continuous and truthful censor satisfies at most two of the properties effectiveness, minimal invasion, and repudiation.

Proof.

Let the censor $\mathsf{Cens}$ be continuous, truthful, effective, and minimally invasive. We show that $\mathsf{Cens}$ cannot be repudiating. We let $S$ be an $\mathsf{Fml}_{\mathsf{L}}$ -formula and consider the privacy configuration $(\mathsf{KB},\mathsf{AK},\mathsf{Sec})$ given by

\mathsf{KB}:=\{S\}\qquad\mathsf{AK}:=\emptyset\qquad\mathsf{Sec}:=\{S\}

and the query sequence $q:=(S,S,\ldots)$ . We set

a:=\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q).

Obviously, we have $a=(r,r,\ldots)$ since otherwise $\mathsf{Cens}$ would either be lying (i.e. not be truthful) or revealing a secret (i.e. not be effective).

Now asssume that $\mathsf{Cens}$ is repudiating. Then there exists a knowledge base $\mathsf{KB}_{1}$ such that

1.

$(\mathsf{KB}_{1},\mathsf{AK},\mathsf{Sec})$ is a privacy configuration;
2.

$\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)|_{1}=\mathsf{Cens}_{(\mathsf{KB}_{1},\mathsf{AK},\mathsf{Sec})}(q)|_{1}$ ;
3.

$\mathsf{KB}_{1}\not\vdash_{\mathsf{L}}S$ .

Let $(a^{\prime}_{1}):=\mathsf{Cens}_{(\mathsf{KB}_{1},\mathsf{AK},\mathsf{Sec})}(q)|_{1}$ . Because of $\mathsf{KB}_{1}\not\vdash_{\mathsf{L}}S$ and $\mathsf{Cens}$ being truthful, we find that $a^{\prime}_{1}=u$ or $a^{\prime}_{1}=r$ .

Suppose towards a contradiction that

a^{\prime}_{1}=r.

(4)

Now let $\mathsf{Cens}^{\prime}$ be a censor as in Lemma 23, i.e. such that

\mathsf{Cens}^{\prime}_{(\mathsf{KB}_{1},\mathsf{AK},\mathsf{Sec})}(q)_{1}=u=\mathsf{eval}(\mathsf{KB}_{1},S).

(5)

By Lemma 23 we get

\mathsf{cont}(\mathsf{Cens}^{\prime}_{(\mathsf{KB}_{1},\mathsf{AK},\mathsf{Sec})}(q),1)\models\Box S.

(6)

However, by (5) we also have $\mathsf{cont}(\mathsf{Cens}^{\prime}_{(\mathsf{KB}_{1},\mathsf{AK},\mathsf{Sec})}(q),1)=\{\lnot\Box S\}$ , which contradicts (6).

Hence (4) is not possible and thus we have $a^{\prime}_{1}=u$ . This, however, contradicts $\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)|_{1}=\mathsf{Cens}_{(\mathsf{KB}_{1},\mathsf{AK},\mathsf{Sec})}(q)|_{1}$ . We conclude that $\mathsf{Cens}$ cannot be repudiating. ∎

4 Non-refusing censors

In this section we study censors that do not refuse to answer a query.

Definition 26.

A censor is non-refusing if it never assigns the answer $r$ to a query.

Of course, a non-refusing censor has to lie in order to keep the secrets. That means if a censors of this kind shall be effective, then it cannot be truthful.

Even if we consider lying censors, we work with the assumption that

an attacker believes every answer of the censor.

(7)

Otherwise, we are in a situation where an attacker cannot believe any answer because the attacker does not know which answers are correct and which are wrong, which means that any answer could be a lie. In that case, querying a knowledge base would not make any sense at all.¹¹1This is, of course, not completely true. It is possible to distort knowledge bases in such a way that privacy is preserved but statistical inferences are still informative, see, e.g. [11].

Because of the assumption (7), we can use our notions of effectiveness (Definition 20) and credibility (Definition 15) also in the context of lying censors: an attacker should not believe any secret and the beliefs should be satisfiable.

Theorem 25 about truthful censors did not make any assumptions on the underlying logic $\mathsf{L}$ . The next theorem about non-refusing censors is less general as it is based on classical logic. We will use $a,b,c,\ldots$ for atomic propositions and $A,B,C,\ldots$ for arbitrary formulas.

Moreover, we assume that the knowledge base $\mathsf{KB}$ only contains atomic facts (we say $\mathsf{KB}$ is atomic). That is if $F\in\mathsf{KB}$ , then $F$ is either of the form $p$ or of the form $\lnot p$ where $p$ is an atomic proposition. Hence we find that if $\mathsf{KB}\vdash_{\mathsf{L}}a\to b$ for two distinct atomic propositions $a$ and $b$ , then $\mathsf{KB}\vdash_{\mathsf{L}}\lnot a$ or $\mathsf{KB}\vdash_{\mathsf{L}}b$ . We can formalize this using the set of a priori knowledge by letting

\Box(a\to b)\to(\Box\lnot a\lor\Box b)\in\mathsf{AK}.

Now we can establish our second no-go theorem, which is a generalization of the results of [5].

Theorem 27 (Second No-Go Theorem).

Let $\mathsf{L}$ be based on classical logic. A continuous and non-refusing censor cannot be at the same time effective and minimally invasive.

Proof.

Let the censor $\mathsf{Cens}$ be continuous, non-refusing, and minimally invasive. We show that $\mathsf{Cens}$ cannot be effective. Let $\mathsf{L}$ be classical propositional logic. We consider the knowledge base

\mathsf{KB}:=\{a,b\}

where both $a$ and $b$ shall be kept secret, i.e.

\mathsf{Sec}:=\{a,b\}.

Further we assume that it is a priori knowledge that $\mathsf{KB}$ is atomic. Thus, in particular,

	$\displaystyle\Box(c\to a)\to(\Box\lnot c\lor\Box a)$	$\displaystyle\in\mathsf{AK}$
	$\displaystyle\Box(\lnot c\to b)\to(\Box c\lor\Box b)$	$\displaystyle\in\mathsf{AK}$

We consider the query sequence $q:=(c\to a,\lnot c\to b,c,\ldots)$ and set $a:=\mathsf{Cens}_{(\mathsf{KB},\mathsf{AK},\mathsf{Sec})}(q)$ .

We find $\mathsf{Cens}(c\to a)=t$ since $\mathsf{Cens}$ is minimally invasive and $\mathsf{KB}$ might contain $\lnot c$ . Further, we find $\mathsf{Cens}(\lnot c\to b)=t$ since $\mathsf{Cens}$ is minimally invasive and $\mathsf{KB}$ might contain $c$ .

Note that after issuing the first two queries of the sequence $q$ , an attacker knows that $a$ or $b$ must be entailed by $\mathsf{KB}$ . But since the attacker does not know which one is the case, no secret is leaked. Formally we have

	$\displaystyle\mathsf{cont}(\mathsf{Cens}(q),2)$	$\displaystyle\vdash_{\mathsf{M}}\Box(c\to a)$	(8)
and
	$\displaystyle\mathsf{cont}(\mathsf{Cens}(q),2)$	$\displaystyle\vdash_{\mathsf{M}}\Box(\lnot c\to b).$	(9)

By basic modal logic, (8) and (9) yield

	$\displaystyle\mathsf{cont}(\mathsf{Cens}(q),2)$	$\displaystyle\vdash_{\mathsf{M}}\Box c\to\Box a$	(10)
and
	$\displaystyle\mathsf{cont}(\mathsf{Cens}(q),2)$	$\displaystyle\vdash_{\mathsf{M}}\Box\lnot c\to\Box b,$	(11)

respectively. Using the a priori knowledge $\mathsf{AK}$ , we obtain from (8) and (9)

	$\displaystyle\mathsf{cont}(\mathsf{Cens}(q),2)$	$\displaystyle\vdash_{\mathsf{M}}\Box\lnot c\lor\Box a$	(12)
and
	$\displaystyle\mathsf{cont}(\mathsf{Cens}(q),2)$	$\displaystyle\vdash_{\mathsf{M}}\Box c\lor\Box b.$	(13)

Because of $\Box c\lor\lnot\Box c$ , we get by (10) and (13) that

\mathsf{cont}(\mathsf{Cens}(q),2)\vdash_{\mathsf{M}}\Box a\lor\Box b.

Thus, at this stage, it is known that a secret holds, but an attacker does not know which one and hence privacy is still preserved.

Now comes the third query, which is $c$ . There are two possibilities for a non-refusing censor to choose from:

1.

$(a)_{3}=u$ (which is true). We find $\mathsf{cont}(\mathsf{Cens}(q),3)\vdash_{\mathsf{M}}\lnot\Box c$ . By (13) we get $\mathsf{cont}(\mathsf{Cens}(q),3)\vdash_{\mathsf{M}}\Box b$ and a secret is leaked.
2.

$(a)_{3}=t$ (which is a lie). We find $\mathsf{cont}(\mathsf{Cens}(q),3)\vdash_{\mathsf{M}}\Box c$ . By (10) we get $\mathsf{cont}(\mathsf{Cens}(q),3)\vdash_{\mathsf{M}}\Box a$ and a secret is leaked.

In both cases, a secret is leaked. Thus the censor cannot be effective. ∎

To avoid this problem, a censor must not only protect the single elements of $\mathsf{Sec}$ but also their disjunction [5]. For the privacy configuration of the previous proof that means $\mathsf{Cens}$ must also protect $a\lor b$ . Then, already the second query, $\lnot c\to b$ would be answered with $u$ because the answer $t$ , as shown above, reveals $a\lor b$ .

Note that protecting the disjunction of all secrets is not as simple as it sounds. Consider, for instance, a hospital information system that should protect the disease a patient is diagnosed with. In this case, protecting the disjunction of all secrets means protecting the information that the patient has some disease. This, however, is not feasible as it is general background knowledge that everybody who is a patient in a hospital has some disease. Worse than that, sometimes the disjunction of all secrets may even be a logical tautology, which cannot be protected.

5 Conclusion

In this paper, we have established two no-go theorems for data privacy using tools from modal logic. We are confident that logical methods will play an important role for finding new impossibility theorems or for better understanding already known ones, see, e.g., the logical analyses carried out in [16] and [17].

Another line of future research relates to the fact that refusing to answer a query can give away the information that there exists a secret that could be infered from some other answer. Similar phenomena may occur in multi-agent systems when one of the agents refuses to communicate. For example, imagine the situation of an oral exam where the examiner asks a question and the student keeps silent. In this case the examiner learns that the student does not know the answer to the question for otherwise the student would have answered.

It is also possible that refusing an answer can lead to knowing that someone else knows a certain fact. Consider the following scenario. A father enters a room where his daughter is playing and he notices that one of the toys is in pieces. So he asks who has broken the toy. The daughter does not want to betray her brother (who actually broke it) and she also does not want to lie. Therefore, she refuses to answer her father’s question. Of course, then the father knows that his daughter knows who broke the toy for otherwise the daughter could have said that she does not know.

We believe that it is worthwhile to study the above situations using general communication protocols that include the possibility of refusing an answer and to investigate the implications of refusing in terms of higher-order knowledge.

References

[1] T. Ågotnes, H. van Ditmarsch, and Y. Wang. True lies. Synthese, 195(10):4581–4615, 2018.
[2] K. J. Arrow. A difficulty in the concept of social welfare. Journal of Political Economy, 58(4):328–346, 1950.
[3] A. Avron. Simple consequence relations. Inf. Comput., 92(1):105–139, 1991.
[4] J. S. Bell. On the einstein podolsky rosen paradox. Physics Physique Fizika, 1:195–200, 1964.
[5] J. Biskup. For unknown secrecies refusal is better than lying. Data and Knowledge Engineering, 33(1):1–23, 2000.
[6] J. Biskup and P. A. Bonatti. Lying versus refusal for known potential secrets. Data and Knowledge Engineering, 38(2):199–222, 2001.
[7] J. Biskup and P. A. Bonatti. Controlled query evaluation for enforcing confidentiality in complete information systems. International Journal of Information Security, 3(1):14–27, 2004.
[8] J. Biskup and P. A. Bonatti. Controlled query evaluation for known policies by combining lying and refusal. Annals of Mathematics and Artificial Intelligence, 40(1):37–62, 2004.
[9] J. Biskup and T. Weibert. Keeping secrets in incomplete databases. International Journal of Information Security, 7(3):199–217, 2008.
[10] P. A. Bonatti, S. Kraus, and V. S. Subrahmanian. Foundations of secure deductive databases. Transactions on Knowledge and Data Engineering, 7(3):406–422, 1995.
[11] F. du Pin Calmon and N. Fawaz. Privacy against statistical inference. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1401–1408. IEEE, 2012.
[12] D. Frauchiger and R. Renner. Quantum theory cannot consistently describe the use of itself. Nature Communications, 9, 2018.
[13] B. Icard. Lying, deception and strategic omission : definition et evaluation. PhD thesis, Université Paris Sciences et Lettres, 2019.
[14] R. Iemhoff. Consequence relations and admissible rules. Journal of Philosophical Logic, 45(3):327–348, 2016.
[15] S. Kochen and E. Specker. The problem of hidden variables in quantum mechanics. Indiana Univ. Math. J., 17:59–87, 1968.
[16] N. Nurgalieva and L. del Rio. Inadequacy of modal logic in quantum settings. In P. Selinger and G. Chiribella, editors, Proceedings 15th International Conference on Quantum Physics and Logic, QPL 2018, Halifax, Canada, 3-7th June 2018., volume 287 of EPTCS, pages 267–297, 2019.
[17] E. Pacuit and F. Yang. Dependence and independence in social choice: Arrow’s theorem. In S. Abramsky, J. Kontinen, J. Väänänen, and H. Vollmer, editors, Dependence Logic: Theory and Applications, pages 235–260. Springer, 2016.
[18] G. L. Sicherman, W. De Jonge, and R. P. Van de Riet. Answering queries without revealing secrets. ACM Trans. Database Syst., 8(1):41–59, Mar. 1983.
[19] K. Stoffel and T. Studer. Provable data privacy. In K. V. Andersen, J. Debenham, and R. Wagner, editors, Database and Expert Systems Applications, pages 324–332. Springer, 2005.
[20] P. Stouppa and T. Studer. A formal model of data privacy. In I. Virbitskaite and A. Voronkov, editors, Perspectives of Systems Informatics, pages 400–408. Springer, 2007.
[21] T. Studer and J. Werner. Censors for boolean description logic. Transactions on Data Privacy, 7:223–252, 2014.
[22] H. van Ditmarsch. Dynamics of lying. Synthese, 191(5):745–777, 2014.