Differentially Private Data Release over Multiple Tables

(2018)

Abstract.

We study synthetic data release for answering multiple linear queries over a set of database tables in a differentially private way. Two special cases have been well-studied in the literature: how to release a synthetic dataset for answering multiple linear queries over a single table, and how to release the answer for a single counting (join size) query over a set of database tables. Compared to the single-table case, the join operator has emerged as the major difficulty in query answering, since the sensitivity (i.e., how much change an individual data record can lead in the query answer) could be heavily amplified via complex join relationships.

We present an algorithm for the general problem, and prove a lower bound illustrating that our general algorithm achieves parameterized optimality (up to logarithmic factors) on some simple queries (for example, two-table join queries) in the most commonly-used privacy parameter regimes. For the class of hierarchical joins, we present a data partition procedure that exploits the idea of uniformizing sensitivities to further improve the utility.

^†^†copyright: acmcopyright^†^†journalyear: 2018^†^†doi: XXXXXXX.XXXXXXX^†^†conference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NY^†^†price: 15.00^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†copyright: none

1. Introduction

Synthetic data release is a very useful objective in private data analysis. Differential privacy (DP) (dwork2006calibrating; dwork2006our) has emerged as a compelling model that enables us to formally study the tradeoff between utility of released information and the privacy of individuals. In the literature, there has been a large body of work that studies synthetic data release over a single table, for example, the private multiplicative weight algorithm (hardt2012simple), the histogram-based algorithm (vadhan2017complexity), the matrix mechanism (li2010optimizing), the Bayesian network algorithm (zhang2017privbayes); as well as other works focusing on geometric range queries (li2011efficient; li2012adaptive; dwork2010differential; bun2015differentially; chan2011private; dwork2015pure; cormode2012differentially; huang2021approximate), and datacubes (ding2011differentially).

Data analysis over multiple private tables connected via join operators has been the subject of significant interest within the area of modern database systems. In particular, the challenging question of releasing the join size over a set of private tables has been studied in several recent works including the sensitivity-based framework (johnson2018towards; dong2021residual; dong2021nearly), the truncation-based mechanism (kotsogiannis2019privatesql; dong2022r2t; tao2020computing), as well as in works on one-to-one joins (mcsherry2009privacy; proserpio2014calibrating), and on graph databases (blocki2013differentially; chen2013recursive). In practice, multiple queries (as opposed to a single one) are typically issued for data analysis, for example, a large class of linear queries on top of join results with different weights on input tuples, as a generalization of the counting join size query. One might consider answering each query independently but the utility would be very low due to the limited privacy budget, implied by the DP composition rule (dwork2006calibrating; dwork2006our). Hence the question that we tackle in this paper is: how can we release synthetic data for accurately answering a large class of linear queries over multiple tables in a differentially private way?

1.1. Problem Definition

Multi-way Join and Linear Queries.

A (natural) join query can be represented as a hypergraph $\mathcal{H}=(\mathbf{x},\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}\})$ (abiteboul1995foundations), where $\mathbf{x}$ is the set of vertices or attributes and each $\mathbf{x}_{i}\subseteq\mathbf{x}$ is a hyperedge or relation. Let $\textsf{dom}(x)$ be the domain of attribute $x$ . Define $\displaystyle{\mathbb{D}=\textsf{dom}(\mathbf{x})=\prod_{x\in\mathbf{x}}\textsf{dom}(x)}$ , and $\displaystyle{\mathbb{D}_{i}=\textsf{dom}(\mathbf{x}_{i})=\prod_{x\in\mathbf{x}_{i}}\textsf{dom}(x)}$ for any $i\in[m]$ . Given an input instance $\mathbf{I}=\left(R^{\mathbf{I}}_{1},R^{\mathbf{I}}_{2},\cdots,R^{\mathbf{I}}_{m}\right)$ , each table $R^{\mathbf{I}}_{i}:\mathbb{D}_{i}\to\mathbb{Z}^{\geq 0}$ is defined as a function from domain $\mathbb{D}_{i}$ to a non-negative integer, indicating the frequency of each tuple. This is more general than the setting where each tuple appears at most once in each relation, since it can capture annotated relations (green2007provenance) with non-negative values. The input size of $\mathbf{I}$ is defined as the sum of frequencies of data records, i.e., $\displaystyle{n=\sum_{i\in[m]}\sum_{t\in\mathbb{D}_{i}}R^{\mathbf{I}}_{i}(t)}$ . When the context is clear, we just drop the superscript $\mathbf{I}$ from $R^{\mathbf{I}}_{j}$ , and often use $R_{j}=\{t\in\mathbb{D}_{j}:R_{j}(t)>0\}$ to denote the set of tuples with non-zero frequency.^{linecolor=red,backgroundcolor=red!25,bordercolor=red}^{linecolor=red,backgroundcolor=red!25,bordercolor=red}todo: linecolor=red,backgroundcolor=red!25,bordercolor=redPasin: This notion $R_{j}$ is pretty confusing to me. Do we really need it?

For each $i\in[m]$ , we have a family $\mathcal{Q}_{i}$ of linear queries defined over $\mathbb{D}_{i}$ such that for $q\in\mathcal{Q}_{i}$ , $q:\mathbb{D}_{i}\to[-1,+1]$ . The family of linear queries defined over $\mathbf{I}$ is $\mathcal{Q}=\times_{i\in[m]}\mathcal{Q}_{i}$ . For each linear query $q=(q_{1},q_{2},\cdots,q_{m})\in\mathcal{Q}$ , the result is defined as

q(\mathbf{I})=\sum_{\vec{t}=(t_{1},t_{2},\cdots,t_{m})\in\times_{i\in[m]}\mathbb{D}_{i}}\rho\left(\vec{t}\right)\prod_{i\in[m]}q_{i}(t_{i})\cdot R_{i}(t_{i})

where $\rho:\times_{i\in[m]}\mathbb{D}_{i}\to\{0,1\}$ is the natural join function, such that for each combination $\vec{t}\in\times_{i\in[m]}\mathbb{D}_{i}$ , $\rho(\vec{t})=1$ if and only if $\pi_{\mathbf{x}_{i}\cap\mathbf{x}_{j}}t_{i}=\pi_{\mathbf{x}_{i}\cap\mathbf{x}_{j}}t_{j}$ for each pair of $i,j\in[m]$ . Here, $\pi$ is the standard projection operator, such that $\pi_{x}t$ indicates the value(s) that tuple $t$ displays on attribute(s) $x$ .

Our goal is to release a function $\mathbb{F}:\times_{i\in[m]}\mathbb{D}_{i}\to\mathbb{N}$ such that all queries in $\mathcal{Q}$ can be answered over $\mathbb{F}$ as accurately as possible, i.e., minimizing the $\ell_{\infty}$ -error $\displaystyle{\alpha=\max_{q\in\mathcal{Q}}|q(\mathbf{I})-q(\mathbb{F})|}$ , where

\displaystyle q(\mathbb{F})

\displaystyle=\sum_{(t_{1},t_{2},\cdots,t_{m})\in\times_{i\in[m]}\mathbb{D}_{i}}\mathbb{F}(t_{1},t_{2},\cdots,t_{m})\prod_{i\in[m]}q_{i}(t_{i}).

DP Setting.

Two DP settings have been studied in the relational model, depending on whether foreign key constraints are considered or not. The one considering foreign key constraints assumes the existence of a primary private table, and deleting a tuple in the primary private relation will delete all other tuples referencing it; see (kotsogiannis2019privatesql; tao2020computing; dong2022r2t). In this work, we adopt the other notion, which does not consider foreign key constrains, but defines instances to be neighboring if one can be converted into the other by adding/removing a single tuple; this is the same as the notion studied in previous works (johnson2018towards; kifer2011no; narayan2012djoin; proserpio2014calibrating). ¹¹1Our definition corresponds to the add/remove variant of DP where a record is added/removed from a single database $R_{i}$ . Another possible definition is based on substitution DP. Add/remove DP implies substitution DP with only a factor of two increase in the privacy parameters; therefore, all of our results also apply to substitution DP.

Definition 1.1 (Neighboring Instance).

A pair of instances $\mathbf{I}=(R_{1},\cdots,R_{m})$ and $\mathbf{I}^{\prime}=(R^{\prime}_{1},\cdots,R^{\prime}_{m})$ are neighboring if there exist some $i\in[m]$ and $t^{*}\in\mathbb{D}_{i}$ such that:

•

for any $j\in[m]-\{i\}$ , $R_{j}(t)=R^{\prime}_{j}(t)$ for every $t\in\mathbb{D}_{j}$ ;
•

$R_{i}(t)=R^{\prime}_{i}(t)$ for every $t\in\mathbb{D}_{i}-\{t^{*}\}$ and $|R_{i}(t^{*})-R^{\prime}_{i}(t^{*})|=1$ .

Definition 1.2 (Differential Privacy (dwork2006calibrating; dwork2006our)).

For $\epsilon,\delta>0$ , a randomized algorithm $\mathcal{A}$ is $(\epsilon,\delta)$ -differentially private (denoted by $(\epsilon,\delta)$ -DP) if for any pair $\mathbf{I}$ , $\mathbf{I}^{\prime}$ of neighboring instances and any subset $\mathcal{S}$ of outputs, $\Pr(\mathcal{A}(\mathbf{I})\in\mathcal{S})\leq e^{\epsilon}\cdot\Pr(\mathcal{A}(\mathbf{I}^{\prime})\in\mathcal{S})+\delta$ .

Notation.

For simplicity of presentation, we henceforth assume throughout that $0<\epsilon\leq O(1)$ and $0\leq\delta\leq 1/2$ . Furthermore, all lower bounds below hold against an algorithm that achieves the stated error $\alpha$ with probability $1-\beta$ for some sufficiently small constant $\beta>0$ ; we will omit mentioning this probabilistic part for brevity. For notational ease, we define the following quantities:

f^{\mathrm{lower}}(\mathbb{D},\mathcal{Q},\epsilon)=\sqrt{\frac{1}{\epsilon}\cdot\log|\mathcal{Q}|\cdot\sqrt{\log|\mathbb{D}|}},\mbox{ and }

f^{\mathrm{upper}}(\mathbb{D},\mathcal{Q},\epsilon,\delta)=f^{\mathrm{lower}}(\mathbb{D},\mathcal{Q},\epsilon)\cdot\sqrt{\log 1/\delta}.

When $\mathbb{D},\mathcal{Q},\epsilon,\delta$ are clear from context, we will omit them. Let $\lambda=\frac{1}{\epsilon}\log\frac{1}{\delta}$ , which is a commonly-used parameter in this paper.

1.2. Prior Work

We first review the problem of releasing synthetic data for a single table, and mention two important results in the literature. In the single table case, nearly tight upper and lower bounds are known, as stated below.

Theorem 1.3 ((hardt2012simple)).

For a single table $R$ of at most $n$ records, a family $\mathcal{Q}$ of linear queries, and $\epsilon>0$ , $\delta>0$ , there exists an algorithm that is $(\epsilon,\delta)$ -DP, and with probability at least $1-1/\mathrm{poly}(|\mathcal{Q}|)$ produces $\mathbb{F}$ such that all queries in $\mathcal{Q}$ can be answered within error $\displaystyle{\alpha=O\left(\sqrt{n}\cdot f^{\mathrm{upper}}\right)}$ using $\mathbb{F}$ .

Theorem 1.4 ((bun2018fingerprinting)).

For every sufficiently small $\epsilon,\alpha>0$ , $n_{D}\geq(1/\alpha)^{\Omega(1)}$ and $n_{Q}\leq(1/\alpha)^{O(1)}$ , there exists a family $\mathcal{Q}$ of queries of size $n_{Q}$ on a domain $\mathbb{D}$ of size $n_{D}$ such that any $(\epsilon,o(1/n))$ -DP algorithm that takes as input a database of size at most $n$ , and outputs an approximate answer to each query in $\mathcal{Q}$ to within error $\alpha$ must satisfy $\displaystyle{\alpha\geq\tilde{\Omega}\left(\sqrt{n}\cdot f^{\mathrm{lower}}\right)}$ .

Another related problem that has been widely studied by the database community is how to release the answer of join size queries over multiple tables with DP, denoted as $\textsf{count}(\cdot)$ :

\textsf{count}(\mathbf{I})=\sum_{\vec{t}=(t_{1},t_{2},\dots,t_{m})\in\times_{i\in[m]}\mathbb{D}_{i}}\rho\left(\vec{t}\right)\prod_{i\in[m]}R_{i}(t_{i}),

which can be captured by a special linear query $q=(q_{1},q_{2},\dots,q_{m})$ such that $q_{i}:\mathbb{D}_{i}\to\{+1\}$ for $i\in[m]$ .

^{linecolor=green,backgroundcolor=green!25,bordercolor=green}^{linecolor=green,backgroundcolor=green!25,bordercolor=green}todo: linecolor=green,backgroundcolor=green!25,bordercolor=greenRavi: The following para needs rephrasing^{linecolor=purple,backgroundcolor=purple!25,bordercolor=purple}^{linecolor=purple,backgroundcolor=purple!25,bordercolor=purple}todo: linecolor=purple,backgroundcolor=purple!25,bordercolor=purpleXiao: Just rewrite this part, and please check!

A popular approach of answering the counting join size query is based on the sensitivity framework, which first computes $\textsf{count}(\mathbf{I})$ , then computes the sensitivity of the query (measuring the difference between the query answers on neighboring database instances), and releases a noise-masked query answer, where the noise is drawn from some distribution calibrated appropriately according to the sensitivity. It is known that the local sensitivity $LS_{\textsf{count}}(\cdot)$ cannot be directly used for calibrating noise, as $LS_{\textsf{count}}(\mathbf{I}),LS_{\textsf{count}}(\mathbf{I}^{\prime})$ for a pair of neighboring databases can lead to distinguishability of $\mathbf{I},\mathbf{I}^{\prime}$ . Global sensitivity (e.g., (dwork2006calibrating)) could be used, but the utility is not satisfactory in many instances. Instead, smooth upper bounds on local sensitivity (nissim2007smooth) have been considered, for example, the smooth sensitivity, which is the smallest smooth upper bound, the residual sensitivity (dong2021residual), which is a constant-approximation to the smooth sensitivity, and the elastic sensitivity (johnson2018towards), which is further strictly larger than residual sensitivity.

We note that this sensitivity-based framework can be adapted to answering any single linear query, as long as the sensitivity is correctly computed. However, we are not interested in answering each linear query individually, since the privacy budget blows up with the number of queries to be answered, implied by the DP composition rule (dwork2006calibrating; dwork2006our), which would be too costly when the query space is extremely large. Instead, we aim to release a synthetic dataset such that an arbitrary linear query can be freely answered over it while preserving DP. In building our data release algorithm, we also resort to the existing literature on the sensitivities derived for this counting join size query $\textsf{count}(\cdot)$ . Although smooth sensitivity enjoys the best utility, we adopt the residual sensitivity in this paper for two reasons: it is much more computationally efficient compared to smooth sensitivity; and more importantly, it is only built on the statistics of the input instance while smooth sensitivity is built on all neighboring instances, which is critical to our optimization technique of uniformizing sensitivities.

1.3. Our Results

^{linecolor=blue,backgroundcolor=blue!25,bordercolor=blue}^{linecolor=blue,backgroundcolor=blue!25,bordercolor=blue}todo: linecolor=blue,backgroundcolor=blue!25,bordercolor=blueBadih: Based on discussion with Xiao, we might need to add more interpretation of the results in this section?^{linecolor=purple,backgroundcolor=purple!25,bordercolor=purple}^{linecolor=purple,backgroundcolor=purple!25,bordercolor=purple}todo: linecolor=purple,backgroundcolor=purple!25,bordercolor=purpleXiao: I added a few sentences after Theorem 1.6, but not sure if they make sense.

We first present the basic join-as-one approach for releasing a synthetic dataset for multi-table queries, with error a function of the output size of join results and the residual sensitivity of $\textsf{count}(\cdot)$ .

Theorem 1.5.

For any join query $\mathcal{H}$ , an instance $\mathbf{I}$ , a family $\mathcal{Q}$ of linear queries, and $\epsilon>0$ , $\delta>0$ , there exists an $(\epsilon,\delta)$ -DP algorithm that with probability at least $1-1/\textsf{poly}(|\mathcal{Q}|)$ produces $\mathbb{F}$ that can be used to answer all queries in $\mathcal{Q}$ within error:

\alpha=O\left((\sqrt{\mathsf{OUT}\cdot SS^{\beta}_{\textsf{count}}(\mathbf{I})\cdot\lambda}+SS^{\beta}_{\textsf{count}}(\mathbf{I})\cdot\lambda)\cdot f^{\mathrm{upper}}\right)

where $\mathsf{OUT}$ is the join size of $\mathbf{I}$ and $SS^{\beta}_{\textsf{count}}(\mathbf{I})$ is the $\beta$ -residual sensitivity of the counting join size query $\textsf{count}(\cdot)$ on $\mathbf{I}$ for $\beta=\frac{\epsilon}{2\ln 2/\delta}$ (see Definition 3.5).

We next present the parameterized optimality with respect to the output size of join results and local sensitivity of $\textsf{count}(\cdot)$ .

Theorem 1.6.

Given arbitrary parameters $\mathsf{OUT}\geq\Delta>0$ and a join query $\mathcal{H}$ , for every sufficiently small $\epsilon,\alpha>0$ , $n_{D}\geq(1/\alpha)^{\Omega(1)}$ and $n_{Q}\leq(1/\alpha)^{O(1)}$ , there exists a family $\mathcal{Q}$ of queries of size $n_{Q}$ on domain $\mathbb{D}$ of size $n_{D}$ such that any $(\epsilon,o(1/n))$ -differentially private algorithm that takes as input a multi-table database over $\mathcal{H}$ of input size at most $n$ , join size $\mathsf{OUT}$ and local sensitivity $\Delta$ , and outputs an approximate answer to each query in $\mathcal{Q}$ to within error $\alpha$ must satisfy

\alpha\geq\tilde{\Omega}\left(\sqrt{\mathsf{OUT}\cdot\Delta}\cdot f^{\mathrm{lower}}\right).

It should be noted that for the setting with $\epsilon=\Omega(1)$ and $\delta=O\left(\frac{1}{n^{c}}\right)$ for some constant $c$ , we just have $\lambda=O\left(\log n\right)$ . The gap between the upper bound in Theorem 1.5 and the lower bound in Theorem 1.6 boils down to the gap between smooth sensitivity and local sensitivity of $\textsf{count}(\cdot)$ , which also appears (but in a different formula) in answering such a single query.^{linecolor=red,backgroundcolor=red!25,bordercolor=red}^{linecolor=red,backgroundcolor=red!25,bordercolor=red}todo: linecolor=red,backgroundcolor=red!25,bordercolor=redPasin: Moved instance-optimality discussion to the conclusion.

^{linecolor=red,backgroundcolor=red!25,bordercolor=red}^{linecolor=red,backgroundcolor=red!25,bordercolor=red}todo: linecolor=red,backgroundcolor=red!25,bordercolor=redPasin: Our upper bound above is in terms of smooth sensitivity but lower bound is in terms of local sensitivity. Do we need to state a connection between the two?^{linecolor=purple,backgroundcolor=purple!25,bordercolor=purple}^{linecolor=purple,backgroundcolor=purple!25,bordercolor=purple}todo: linecolor=purple,backgroundcolor=purple!25,bordercolor=purpleXiao: I added a few sentences here, but not sure if they make sense.

From the upper bound in terms of residual sensitivity, we exploit the idea of uniformizing sensitivity²²2Similar idea of uniformizing degrees of join values, i.e., how many tuples a join value appears in a relation, has been used in query processing (joglekar2015s), but we use it in a more complicated scenario in combination of sensitivity.^{linecolor=blue,backgroundcolor=blue!25,bordercolor=blue}^{linecolor=blue,backgroundcolor=blue!25,bordercolor=blue}todo: linecolor=blue,backgroundcolor=blue!25,bordercolor=blueBadih: Per discussion with Xiao, since the previous work used “uniformization” to refer to a much simpler technique, we might want to consider using a different term to avoid confusion? to further improve Theorem 1.5, by using a more fine-grained characterization of input instances. For the class of hierarchical queries, which has been widely studied in the context of various database problems (dalvi2007efficient; fink2016dichotomies; berkholz2017answering; hu2022temporal), we obtain a more fine-grained parameterized upper bound in Theorem 4.12 and lower bound in Theorem 4.13. As the utility formula display a rather complicated form, we defer all details to Section 4.

2. Preliminaries

For random variables $X,Y$ and scalars $\epsilon>0,\delta\in[0,1)$ , we write $X\sim_{(\epsilon,\delta)}Y$ if $\Pr[X\in S]\leq e^{\epsilon}\cdot\Pr[Y\in S]+\delta$ and $\Pr[Y\in S]\leq e^{\epsilon}\cdot\Pr[X\in S]+\delta$ for all sets $S$ of outcomes.

For convenience, we write $a+\mathcal{D}$ as a shorthand for $a+X$ where $X$ is a random variable with distribution $\mathcal{D}$ (denoted by $X\sim\mathbb{D}$ ).

Laplacian distribution .

to be done.

Shifted and truncated geometric distribution (chan2019foundations).

Let $\textsf{Geom}_{\alpha}(x)\\ =\frac{\alpha-1}{\alpha+1}\cdot\alpha^{-|x|}$ . Let $\epsilon>0$ , $\delta\in(0,1)$ , and $\Delta\geq 1$ . Let $n_{0}$ be the smallest positive integer such that $\displaystyle{\Pr\left[\left|\textsf{Geom}_{\exp(\epsilon/\Delta)}(x)\right|\geq n_{0}\right]\leq\delta}$ , where $n_{0}\leq\frac{\Delta}{\epsilon}\log\frac{2}{\delta}+1$ . The shifted and truncated geometric distribution $\textsf{STGeom}_{\epsilon,\delta,\Delta}(x)$ has support in $[0,2\cdot(n_{0}+\Delta-1)]$ , and is defined as $\displaystyle{\min\left\{\max\left\{0,n_{0}+\Delta-1+\textsf{Geom}_{\exp(\frac{\epsilon}{\Delta})}(x)\right\},2(n_{0}+\Delta-1)\right\}}$ . It has been proved (balcer2017differential; chan2019foundations) that for any pair $u,v$ with $|u-v|\leq\Delta$ ,

u+\textsf{STGeom}_{\epsilon,\delta,\Delta}\sim_{(\epsilon,\delta)}v+\textsf{STGeom}_{\epsilon,\delta,\Delta}

u+\textsf{Geom}_{\exp(\epsilon/\Delta)}\sim_{(\epsilon,0)}v+\textsf{Geom}_{\exp(\epsilon/\Delta)}

Exponential Mechanism.

The exponential mechanism (EM) (McSherryT07) can select a “good” candidate from a set $\mathcal{C}$ of candidates. The goodness is defined by a scoring function $s(\mathbf{I},c)$ for input dataset $\mathbf{I}$ and a candidate $c\in\mathcal{C}$ and is assumed have sensitivity at most one. Then, the EM algorithm is to sample each candidate $c\in\mathcal{C}$ with probability $\propto\exp\left(-0.5\epsilon\cdot s(\mathbf{I},c)\right)$ ; this algorithm is $(\epsilon,0)$ -DP.

Local Sensitivity.

Given a function $f$ that takes an input dataset and outputs a vector, its $(\ell_{1}-)$ local sensitivity at $\mathbf{I}$ is defined as $s_{f}(\mathbf{I}):=\max_{\mathbf{I}^{\prime}}\|f(\mathbf{I}^{\prime})-f(\mathbf{I})\|_{1}$ where the maximum is over all neighbors $\mathbf{I}^{\prime}$ of $\mathbf{I}$ .

Join Result as a Function.

For $\mathbf{I}=(R_{1},R_{2},\cdots,R_{m})$ , as each input relation is abstracted as a function, we introduce the join result function as $J=\Join_{i\in[m]}R_{i}$ , such that $J\left(\vec{t}\right)=\prod_{i\in[m]}R_{i}\left(\pi_{\mathbf{x}_{i}}\vec{t}\right)$ for any $\vec{t}\in\mathbb{D}$ . Moreover, $\textsf{count}(\mathbf{I})=|J|=\sum_{\vec{t}\in\mathbb{D}}J\left(\vec{t}\right)$ is denoted as join size of $\mathbf{I}$ , and we might exchangably use them later.

When we mention “local sensitivity” without a specific function $f$ , it should be assumed that the function is the counting join size query; note that this is equal to $\mathop{\mathrm{LS}}_{\textsf{count}}(\mathbf{I})$ .

3. Join-As-One Algorithm

In this section we present our first join-as-one algorithm for general join queries, which namely computes the join results as a single function and then invokes the single-table private multiplicative weight (PMW) algorithm (hardt2012simple). While this is apparently simple, there are many non-trivial issues in putting everything together. (All missing proofs are in Appendix B.)

3.1. Algorithms for Two-Table Query

We start with the simplest two-table query. Assume $\mathcal{H}$ is defined on $\mathbf{x}=\{A,B,C\}$ , $\mathbf{x}_{1}=\{A,B\}$ , and $\mathbf{x}_{2}=\{B,C\}$ . Let us define some parameters first. For $i\in[2]$ , we define the degree of join value $b\in\textsf{dom}(B)$ to be $\displaystyle{\textsf{deg}_{i}(B,b)=\sum_{t\in\mathbb{D}_{i}:\pi_{B}t=b}R_{i}(t)}$ . The maximum degree of any join value in $R_{i}$ is denoted as $\displaystyle{\Delta_{i}=\max_{b\in\textsf{dom}(B)}\textsf{deg}_{i}(B,b)}$ . Define $\Delta=\max\{\Delta_{1},\Delta_{2}\}$ . We note that $\Delta=LS_{\textsf{count}}(\mathbf{I})$ .

A Natural (but Flawed) Idea.

Let us first consider a natural approach, which turns out to violate DP:

•

Compute $J=\Join_{i\in[m]}R_{i}$ and release a synthetic dataset for $J$ as $\tilde{J}_{1}$ by invoking the single-table PMW algorithm;
•

As the PMW algorithm releases a set of records with their total frequencies as $|J|$ , which will reveal information of the input instance, we add $\textsf{STGeom}_{\epsilon/2,\delta/2,1}$ noise to $\Delta$ to obtain $\tilde{\Delta}$ .
•

We draw a random noise $\rho$ from $\textsf{STGeom}_{\epsilon/2,\delta/2,\tilde{\Delta}}$ and choose $\rho$ records randomly from $\mathbb{D}_{1}\times\mathbb{D}_{2}$ , denoted as $\tilde{J_{2}}$ .
•

We release the union of these two datasets $\mathbb{F}=\tilde{J}_{1}\cup\tilde{J}_{2}$ .

The challenge is that invoking the single-table algorithm on $J$ does not preserve DP, as the initial data distribution parameterized by $|J|$ can lead to leaking of information about the input database.

\tilde{\Delta}\leftarrow LS_{\textsf{count}}(\mathbf{I})+\textsf{STGeom}_{\epsilon/2,\delta/2,1}(x)

;

2 return

\textsc{PMW}_{\epsilon/2,\delta/2,\tilde{\Delta}}(\mathbf{I})

\blacktriangleright

See Algorithm 2 ;

Algorithm 1 JoinTwoTable

(\mathbf{I}=\{R_{1},R_{2}\})

\widehat{n}=\max\{1,\textsf{count}(\mathbf{I})+\textsf{Geom}_{\exp(0.5\epsilon/\tilde{\Delta})}\}

;

\mathbb{F}_{0}\leftarrow\widehat{n}

times uniform distribution over

\mathbb{D}

\mathbb{F}_{0}(x)=\frac{\widehat{n}}{|\mathbb{D}|}

;

\epsilon^{\prime}\leftarrow\frac{\epsilon}{16\sqrt{T\cdot\log(1/\delta)}}

;

4 foreach $i=1,2,\dots,T$ do

5 Sample a query

q_{i}\in\mathcal{Q}

using the

\epsilon^{\prime}

-DP EM with score function

s_{i}(\mathbf{I},q)=\frac{1}{\tilde{\Delta}}\cdot|q(\mathbb{F}_{i-1})-q(\mathbf{I})|

;

m_{i}\leftarrow q_{i}(\mathbf{I})+\textsf{Geom}_{\epsilon^{\prime},\tilde{\Delta}}

;

7 Update for each

x\in\mathbb{D}

\mathbb{F}_{i}(x)\propto\mathbb{F}_{i-1}(x)\times\exp\left(q_{i}(x)\cdot\left(m_{i}-q_{i}(\mathbb{F}_{i-1})\right)\cdot\frac{1}{2\widehat{n}}\right)

;

9return

\textsf{Avg}\left(\sum_{i=1}^{T}\mathbb{F}_{i}\right)

;

Algorithm 2 PMW

{}_{\epsilon,\delta,\tilde{\Delta}}(\mathbf{I}=\{R_{1},R_{2},\dots,R_{m}\})

Modified Algorithm.

The main insight is to change the order of two steps in the flawed version. We add $\textsf{Geom}_{\exp(0.5\epsilon/\tilde{\Delta})}(x)$ random noise to $\textsf{count}(\mathbf{I})$ as $\widehat{n}$ , and then invoke the single-table algorithm on $J$ starting with a uniform distribution parameterized by $\widehat{n}$ . As $\widehat{n}$ is released under DP, and the single-table algorithm (hardt2012simple) also releases a synthetic dataset for $J$ under DP, Algorithm 1 also satisfies DP, as stated in Lemma 3.1.

Lemma 3.1.

Algorithm 1 is $(\epsilon,\delta)$ -DP.

Error Analysis.

As shown in Appendix A, with probability at least $1-1/\textsf{poly}(\mathcal{Q})$ , Algorithm 1 produces $\mathbb{F}=\textsf{Avg}\sum_{i=1}^{T}\mathbb{F}_{i}$ such that every linear query can be answered within error (omitting $f^{\mathrm{upper}}$ ): $O\left(\sqrt{|J|}\cdot\sqrt{\Delta+\lambda}+\Delta+\lambda\right)$ . Then, we come to the following result:

Theorem 3.2.

For any two-table instance $\mathbf{I}$ , a family of linear queries $\mathcal{Q}$ , and $\epsilon>0$ , $\delta>0$ , there exists an algorithm that is $(\epsilon,\delta)$ -DP, and with probability at least $1-1/\textsf{poly}(|\mathcal{Q}|)$ produces $\mathbb{F}$ such that all queries in $\mathcal{Q}$ can be answered within error:

\alpha=O\left((\sqrt{\mathsf{OUT}\cdot(\Delta+\lambda)}+\Delta+\lambda)\cdot f^{\mathrm{upper}}\right).

where $\lambda=\frac{1}{\epsilon}\log\frac{1}{\delta}$ , $\mathsf{OUT}=\textsf{count}(\mathbf{I})$ is the join size of $\mathbf{I}$ , and $\Delta=LS_{\textsf{count}}(\mathbf{I})$ is the local sensitivity of $\textsf{count}(\cdot)$ on $\mathbf{I}$ .

3.2. Lower Bounds for Two-Table Query

The first lower bound is based on the local sensitivity, which can be shown via standard techniques.

Theorem 3.3.

For any $\Delta>0$ , if there exists a family of queries $\mathcal{Q}$ such that any $(\epsilon,\delta)$ -DP algorithm that takes as input a database $\mathbf{I}$ with local sensitivity at most $\Delta$ and outputs an approximate answer to each query in $\mathcal{Q}$ to within error $\alpha$ , then $\alpha\geq\Omega\left(\Delta\right)$ .

Our second lower bound is via a reduction to the single-table case: we create a two-table instance where $R_{1}$ encodes the single-table and $R_{2}$ “amplifies” both the sensitivity and the join size by a factor of $\Delta$ . This eventually results in the following lower bound.

Theorem 3.4.

For any $\mathsf{OUT}\geq\Delta>0$ , any sufficiently small $\epsilon,\alpha>0$ , $n_{D}\geq(1/\alpha)^{\Omega(1)}$ and $n_{Q}\leq(1/\alpha)^{O(1)}$ , if there exists a family $\mathcal{Q}$ of queries of size $n_{Q}$ on domain $\mathbb{D}$ of size $n_{D}$ such that any $(\epsilon,o(1/n))$ -DP algorithm that takes as input a two-table database of join size $\mathsf{OUT}$ and local sensitivity $\Delta$ and outputs an approximate answer to each query in $\mathcal{Q}$ to within error $\alpha$ , then $\displaystyle{\alpha\geq\tilde{\Omega}\left(\sqrt{\mathsf{OUT}}\cdot\sqrt{\Delta}\cdot f^{\mathrm{lower}}\right)}$ .

Proof.

Let $n=\frac{\mathsf{OUT}}{\Delta}$ . From Theorem 1.4, there exists a set $\mathcal{Q}_{\textsf{one}}$ of queries on domain $\mathbb{D}$ for which any $(\epsilon,\delta)$ -DP algorithm that takes as input a single-table database $T\in\mathbb{D}$ and outputs an approximate answer to each query in $\mathcal{Q}_{\textsf{one}}$ within error $\alpha$ requires that $\alpha\geq\tilde{\Omega}\left(\sqrt{n}\cdot f^{\mathrm{lower}}(\mathbb{D},\mathcal{Q}_{\textsf{one}},\epsilon)\right)$ . For an arbitrary single-table database $T:\mathbb{D}\to\mathbb{Z}^{+}$ , we construct a two-table instance $\mathbf{I}$ of join size $\mathsf{OUT}$ , and local sensitivity $\Delta$ as follows:

•

Set $\textsf{dom}(A)=\mathbb{D}$ , $\textsf{dom}(B)=\mathbb{D}\times[n]$ and $\textsf{dom}(C)=[\Delta]$ .
•

Let $R_{1}(a,(b_{1},b_{2}))=\mathbf{1}[a=b_{1}\wedge b_{2}\leq T(a)]$ for all $a\in A$ and $(b_{1},b_{2})\in B$ .
•

Let $R_{2}(b,c)=1$ for all $b\in B$ and $c\in C$ .

It can be easily checked that $\mathbf{I}$ has join size at most $\mathsf{OUT}$ and local sensitivity $\Delta$ , and that two neighboring datasets $T,T^{\prime}$ result in neighboring $\mathbf{I},\mathbf{I}^{\prime}$ . Finally, let $\mathcal{Q}_{1}$ contain queries from $\mathcal{Q}_{\textsf{one}}$ applied on its first attribute (i.e., $\mathcal{Q}_{1}:=\{q\circ\pi_{A}\mid q\in\mathcal{Q}_{\textsf{one}}\}$ ), and let $\mathcal{Q}_{2}$ contain only a single query $q_{\textsf{all-one}}:\mathbb{D}_{2}\to\{+1\}$ . An example is illustrated in Figure 1.

Our lower bound argument is a reduction to the single-table case. Let $\mathcal{A}$ be an algorithm that takes any two-table instance in $\mathbf{I}(\mathsf{OUT},\Delta)$ and outputs an approximate answer to each query in $\mathcal{Q}$ within error $\alpha^{\prime}$ . For each query $q\in\mathcal{Q}_{\textsf{one}}$ , let $q^{\prime}=(q\circ\pi_{A},q_{\textsf{all-one}})$ be its corresponding query in $\mathcal{Q}$ . Let $\widetilde{q}^{\prime}(\mathbf{I})$ be the approximate answer for query $\widetilde{q}^{\prime}$ . We then return $\displaystyle{\widetilde{q}(T)=\widetilde{q}^{\prime}(\mathbf{I})/\Delta}$ as an approximate answer of $q(T)$ .

We first note that this algorithm for answering queries in $\mathcal{Q}_{\textsf{one}}$ is $(\epsilon,\delta)$ -DP due to the post-processing property of DP. The error guarantee follows immediately from the observation that $q^{\prime}(\mathbf{I})=\Delta\cdot q(T)$ .

Refer to caption — Figure 1. Lower bound instance with $n=9$ , $\Delta=3$ , $\mathsf{OUT}=27$ .

Therefore, from Theorem 1.4, we can conclude that $\alpha^{\prime}\geq\Delta\cdot\tilde{\Omega}(\sqrt{n}\cdot f^{\mathrm{lower}}(\mathbb{D},\mathcal{Q}_{\textsf{one}},\epsilon))\geq\tilde{\Omega}\left(\sqrt{\mathsf{OUT}}\cdot\sqrt{\Delta}\cdot f^{\mathrm{lower}}\right)$ . ∎

3.3. Multi-Table Query

We next extend the join-as-one approach to multi-table query. First, notice that Algorithm 1 does not work in multi-table setting because $\mathop{\mathrm{LS}}_{\textsf{count}}(\mathbf{I})$ may itself have a large local sensitivity, unlike in the two table case where the global sensitivity was one.

To handle this, we will need the notion of the residual sensitivity. For a join query $\mathcal{H}=(\mathbf{x},\{\mathbf{x}_{1},\mathbf{x}_{2},\dots,\mathbf{x}_{m}\})$ , the residual query on a subset $E\subseteq[m]$ of relations is $\Join_{i\in E}R_{i}$ . Its boundary, denoted as $\partial_{E}$ , is the set of attributes that belong to relations both in and out of $E$ , i.e., $\partial E=\{x\mid x\in\textbf{x}_{i}\cap\textbf{x}_{j},i\in E,j\notin E\}$ . For a residual query on $E$ over instance $\mathbf{I}$ , the query result is³³3The semi-join result of $R_{i}\ltimes t$ is defined as function $R^{\prime}_{i}:\mathbb{D}_{i}\to\mathbb{Z}^{+}$ such that $R_{t}(t^{\prime})=R(t^{\prime})$ if $\pi_{\mathbf{x}_{i}}t^{\prime}=\pi_{\mathbf{x}_{i}}t$ and $0$ otherwise.:

(1)

T_{E}(\mathbf{I})=\max_{t\in\textsf{dom}(\partial E)}\big{|}\Join_{i\in[m]}(R_{i}\ltimes t)\big{|},

which is a join-aggregate queries over attributes in $\partial E$ .

Definition 3.5 (Residual Sensitivity for Counting Query (dong2021residual; dong2021nearly)).

Given $\beta>0$ , the $\beta$ -residual sensitivity of counting the join size of multi-way query $\mathcal{H}$ over instance $\mathbf{I}$ is defined as

SS^{\beta}_{\textsf{count}}(\mathbf{I})=\max_{k\geq 0}e^{-\beta k}\cdot\hat{LS}^{(k)}_{\textsf{count}}(\mathbf{I}),

where

(2)

\hat{LS}^{(k)}_{\textsf{count}}(\mathbf{I})=\max_{\mathbf{s}\in\mathcal{S}^{k}}\max_{i\in[m]}\sum_{E^{\prime}\subseteq[m]-\{i\}}T_{[m]-\{i\}-E^{\prime}}(\mathbf{I})\cdot\prod_{j\in E^{\prime}}\mathbf{s}_{j},

for $\mathcal{S}^{k}=\left\{\mathbf{s}=(\mathbf{s}_{1},\mathbf{s}_{2},\dots,\mathbf{s}_{m}):\sum_{i=1}^{m}\mathbf{s}_{i}=k,\mathbf{s}_{i}\in\mathbb{Z},\forall i\in[m]\right\}$ .

It is clear that $SS^{\beta}_{\textsf{count}}(\mathbf{I})$ is an upper bound on $\mathop{\mathrm{LS}}_{\textsf{count}}(\mathbf{I})$ , and therefore it suffices for us to find an upper bound $\tilde{\Delta}$ for $SS^{\beta}_{\textsf{count}}(\mathbf{I})$ (which is then passed to the single-table algorithm). To compute $\tilde{\Delta}$ , we observe that $\ln(SS^{\beta}_{\textsf{count}}(\mathbf{I}))$ has global sensitivity at most $\beta$ . Therefore, adding an appropriately calibrated (truncated and shifted) Laplace noise to it provides a DP upper bound. The idea is formalized in Algorithm 3. Its privacy guarantee is immediate:

\beta\leftarrow\frac{\epsilon}{2\ln 2/\delta}

;

SS^{\beta}_{\textsf{count}}(\mathbf{I})\leftarrow\beta

-residual sensitivity of

\textsf{count}(\mathbf{I})

;

\tilde{\Delta}\leftarrow\max\left\{SS_{\textsf{count}}(\mathbf{I}),SS_{\textsf{count}}(\mathbf{I})\cdot e^{\frac{1}{2}+\textsf{Lap}_{\epsilon/2\beta}}\right\}

;

5 return

\textsc{PMW}_{\epsilon/2,\delta/2,\tilde{\Delta}}(\mathbf{I})

;

Algorithm 3 JoinMultiTable

(\mathbf{I}=\{R_{1},R_{2},\dots,R_{m}\})

^{linecolor=red,backgroundcolor=red!25,bordercolor=red}^{linecolor=red,backgroundcolor=red!25,bordercolor=red}todo: linecolor=red,backgroundcolor=red!25,bordercolor=redPasin: I changed the notation of Lap the exponent a bit (in Algorithm 3). Might need to be propagated to the proofs^{linecolor=red,backgroundcolor=red!25,bordercolor=red}^{linecolor=red,backgroundcolor=red!25,bordercolor=red}todo: linecolor=red,backgroundcolor=red!25,bordercolor=redPasin: I don’t think we’ve define Laplace distribution before Algorithm 3. Might need to do that.

Lemma 3.6.

Algorithm 3 is $(\epsilon,\delta)$ -DP.

Error analysis.

With probability at least $1-\delta$ , $|\textsf{Lap}_{\epsilon/2\beta}|<1/2$ , then $\frac{1}{2}+\textsf{Lap}_{\epsilon/2\beta}\in[0,1]$ since $\beta=\frac{\epsilon}{2\ln 2/\delta}$ , and therefore $\tilde{\Delta}\in[SS^{\beta}_{\textsf{count}}(\mathbf{I}),SS^{\beta}_{\textsf{count}}(\mathbf{I})\cdot e]$ . Hence $\widehat{n}=O\left(\mathsf{OUT}+SS_{\textsf{count}}(\mathbf{I})\cdot\lambda\right)$ . Putting everything together, the error can be achieved is (omitting $f^{\mathrm{upper}}$ ):

O\left(\sqrt{\widehat{n}}\cdot\sqrt{\tilde{\Delta}}\right)=O\left(\sqrt{\mathsf{OUT}\cdot SS^{\beta}_{\textsf{count}}(\mathbf{I})\cdot\lambda}+SS^{\beta}_{\textsf{count}}(\mathbf{I})\cdot\lambda\right)

The upper bound is given in Theorem 1.5. Moreover, extending the previous lower bound argument on the two-table query, we can obtain the lower bound on the multi-table query in Theorem 1.6.

4. Uniformize Sensitivity

So far, we have showed a parameterized algorithm for answering linear queries, whose utility is in terms of the join size and the residual sensitivity. A natural question arises: Can we achieve a more fine-grained parameterized algorithm with better utility?

\mathbb{I}\leftarrow\textsc{Partition}(\mathbf{I})

;

3 foreach $\mathbf{I}^{\prime}\in\mathbb{I}$ do

\mathbb{F}(\mathbf{I}^{\prime})\leftarrow\textsc{JoinMultiTable}(\mathbf{I}^{\prime})

;

4 return

\bigcup_{\mathbf{I}^{\prime}\in\mathbb{I}}\mathbb{F}(\mathbf{I}^{\prime})

;

Algorithm 4 Uniformize

(\mathbf{I})

Let us start with a two-table instance (see Figure 2), with input size $\Theta(n)$ , join size $\mathsf{OUT}=\Theta(n\sqrt{n})$ and local sensitivity $\sqrt{n}$ . Algorithm 1 achieves an error of $O(n)$ . However, this instance is beyond the scope of Theorem 3.4, as the sensitivity (or degree in this simple scenario) distribution over join values is extremely non-uniform. Revisiting the error complexity in Theorem 3.2, we can gain some intuition regarding why Algorithm 1 does not perform well on this instance. The costly term is $O(\sqrt{\mathsf{OUT}}\cdot\sqrt{\Delta})$ , where $\Delta$ is the largest degree of join values in the input instance. However, there are many join values whose degree is much smaller than $\Delta$ , so a natural idea is to uniformize sensitivities, i.e., partition the input instance into a set of sub-instances by join values, such that join values with similar sensitivities are put into same sub-instance. After this uniformization, we just invoke our previous join-as-one approach as a primitive on each sub-instance independently, and return the union of synthetic datasets generated for all sub-instances. The framework of uniformization is illustrated in Algorithm 4. (All missing proofs are given in Appendix B.1).

4.1. Uniformize Two-Table

n

is not explicitly used in the algorithm description, but can be argued that any

i

with

B^{i}\neq\emptyset

must have

i\leq\lceil\log(\frac{n}{\lambda}+1)\rceil

. Please check!

As mentioned, there is quite an intuitive way to uniformize a two-table join. As described in Algorithm 5, the high-level idea is to bucket join values in $\textsf{dom}(B)$ by their maximum degree in $R_{1}$ and $R_{2}$ . To protect the privacy of individual tuples, we add some noise drawn from $\textsf{STGeom}_{\epsilon,\delta,1}$ to each join value’s degree, before determining to which bucket it should go. Recall that $\lambda=\frac{1}{\epsilon}\log\frac{1}{\delta}$ . Let $\gamma_{0}=0$ and $\gamma_{i}=2^{i}\lambda$ for all $i\in\mathbb{N}$ . Conceptually, we divide $[0,n+\lambda]$ into $\ell=\lceil\log(\frac{n}{\lambda}+1)\rceil$ buckets, where the $i$ -th bucket is associated with $(\gamma_{i-1},\gamma_{i}]$ for $i\in[\ell]$ . The set of values from $\textsf{dom}(B)$ whose maximum noisy degree in $R_{1}$ and $R_{2}$ falls into the $i$ -bucket is denoted as $B^{i}$ . For each $i$ , we identify tuples in $R_{1},R_{2}$ whose join value falls into $B^{i}$ as $R^{i}_{1},R^{i}_{2}$ , which forms the sub-instance $(R^{i}_{1},R^{i}_{2})$ . At last, we just return all sub-instances. Note that $n$ is not explicitly used in Algorithm 5 as this is not public, but it is easily to check that there exists no join value with degree more than $n$ under the input size constraint. Hence, it is safe to only consider $i\in[\ell]$ in our analysis.

2foreach $i\in\mathbb{N}$ do

B^{i}\leftarrow\emptyset

;

3 foreach $b\in\textsf{dom}(B)$ do

\widetilde{\textsf{deg}}(B,b)=\max\{\textsf{deg}_{1}(B,b),\textsf{deg}_{2}(B,b)\}+\textsf{STGeom}_{\epsilon,\delta,1}

;

i\leftarrow\max\left\{1,\left\lceil\log\frac{1}{\lambda}\cdot\widetilde{\textsf{deg}}(B,b)\right\rceil\right\}

;

B^{i}\leftarrow B^{i}\cup\{b\}

;

9foreach $i$ with $B^{i}\neq\emptyset$ do

10 foreach $j\in\{1,2\}$ do

R^{i}_{j}\leftarrow\{R_{j}(t):t\in\mathbb{D}_{j},\pi_{B}t\in B^{i}\}

;

12return

\bigcup_{i:B^{i}\neq\emptyset}\{(R^{i}_{1},R^{i}_{2})\}

;

Algorithm 5 Partition-TwoTable

(\mathbf{I}=\{R_{1},R_{2}\})

Lemma 4.1.

Algorithm 4 on two-table join is $(\epsilon,\delta)$ -DP.

The key insight in Lemma 4.1 is that adding or removing one input tuple can increase or decrease the degree of one join value $b\in\textsf{dom}(B)$ by at most one. Hence, Algorithm 5 satisfies $(\epsilon,\delta)$ -DP by the parallel composition rule (mcsherry2009privacy). Moreover, since each input tuple participates in exactly one sub-instance, and Algorithm 2 preserves $(\epsilon,\delta)$ -DP for each sub-instance by Lemma 3.1, Algorithm 4 preserves $(2\epsilon,2\delta)$ -DP by the basic composition (dwork2006calibrating).

Error Analysis.

We next analyze the error of Algorithm 8. Given an instance $\mathbf{I}$ , let $\pi=\{B^{1}_{\pi},\cdots,B^{\ell}_{\pi}\}$ be the partition of $\textsf{dom}(B)$ generated by Algorithm 8. Correspondingly, $\pi$ derives a partition of input instance $\mathbf{I}=\{(R^{i}_{1},R^{i}_{2}):i\in[\ell]\}$ , where $(R^{i}_{1},R^{i}_{2})$ is the sub-instance in which tuples have their join values falling into $B^{i}_{\pi}$ . Moreover, $\pi$ derives a partition of join result $J=\left\{J^{1}_{\pi},J^{2}_{\pi},\cdots,J^{\ell}_{\pi}\right\}$ . Let $\mathbb{F}^{i}$ be the synthetic dataset generated for $(R^{i}_{1},R^{i}_{2})$ by JoinMultiTable. From Theorem 3.2, with probability $1-1/\textsf{poly}(|\mathcal{Q}|)$ , the error of answering linear queries defined over $\textsf{dom}(A)\times\textsf{dom}(B^{i}_{\pi})\times\textsf{dom}(C)$ that can be achieved with $\mathbb{F}^{i}$ is $\alpha_{i}=O\left(\sqrt{|J^{i}_{\pi}|}\cdot\sqrt{2^{i}\cdot\lambda}+2^{i}\cdot\lambda\right)$ . By a union bound, with probability $1-1/\mathop{\mathrm{poly}}(|\mathcal{Q}|)$ , the error can be achieved with $\bigcup_{i}\mathbb{F}^{i}$ is:

\displaystyle\alpha\leq\sum_{i\in[\ell]}\alpha_{i}\leq\sum_{i\in[\ell]}\sqrt{|J^{i}_{\pi}|}\cdot\sqrt{2^{i}\cdot\lambda}+(\Delta+\lambda)\cdot\lambda

where $\sum_{i\in[\ell]}2^{i}=\sum_{i\in[\lceil\log(\Delta+\lambda)\rceil]}2^{i}=O\left(\Delta+\lambda\right)$ .

One may observe an overlap between the bucket ranges defined by $\pi$ , but $\sum_{i}|J^{i}_{\pi}|=|J|$ always holds as $\pi$ induces a partition of $\textsf{dom}(B)$ as well as $J$ . In this way, Algorithm 4 on two-table query is always better (at least not worse) than Algorithm 1, since $\sum_{i\in[\ell]}\sqrt{|J^{i}_{\pi}|}\cdot\sqrt{2^{i}}\leq\sqrt{|J|}\cdot\sqrt{\Delta}$ , which is implied by the Cauchy-Schwarz inequality. Furthermore, we show in Appendix B.1 that for some input instances the gap between Theorem 4.3 and Theorem 3.2 can be polynomially large, in terms of the data size.

Careful inspection reveals that although the partition generated by Algorithm 5 is randomized due to the noise, it is not far away from the fixed partition based on true degrees. As shown in Appendix B.1, Algorithm 5 always has its error bounded by what can be achieved through the uniform partition defined as below.

Definition 4.2 (Uniform Partition).

For $n>0,\epsilon>0,\delta>0$ , and an input instance $\mathbf{I}=(R_{1},R_{2})$ , the uniform partition of $\textsf{dom}(B)$ on $\mathbf{I}$ is $\pi^{*}=\left\{B^{1}_{\pi^{*}},\cdots,B^{\lceil\log\frac{n}{\lambda}\rceil}_{\pi^{*}}\right\}$ such that for any $i\in[\lceil\log\frac{n}{\lambda}\rceil]$ , $b\in B^{i}$ if $\displaystyle{\max\{\textsf{deg}_{1}(B,b),\textsf{deg}_{2}(B,b)\}\in(\gamma_{i-1},\gamma_{i}]}$ .

Theorem 4.3.

\alpha=O\left(\left(\sum_{i\in[\lceil\log\frac{n}{\lambda}\rceil]}\sqrt{\mathsf{OUT}^{i}_{\pi^{*}}\cdot 2^{i}\cdot\lambda}+\lambda^{2}\right)\cdot f^{\mathrm{upper}}\right)

where $\lambda=\frac{1}{\epsilon}\log\frac{1}{\delta}$ , $\mathsf{OUT}^{i}_{\pi^{*}}$ is the number of join results whose join value falls into $B^{i}_{\pi^{*}}$ under the uniform partition $\pi^{*}$ .

Let $\overrightarrow{\mathsf{OUT}}_{\pi^{*}}=(\mathsf{OUT}^{i}_{\pi^{*}})_{i\in[{\lceil\log\frac{n}{\lambda}\rceil}}$ be a characterization vector of join sizes under the uniform sensitivity partition $\pi^{*}$ . A two-table database $\mathbf{I}=(R_{1},R_{2})$ conforms to $\overrightarrow{\mathsf{OUT}}_{\pi^{*}}$ if

\sum_{(t_{1},t_{2})\in\mathbb{D}_{1}\times\mathbb{D}_{2}:\pi_{B}t_{1}=\pi_{B}t_{2}\in B^{i}_{\pi^{*}}}R_{1}(t_{1})\cdot R_{2}(t_{2})\leq\mathsf{OUT}^{i}_{\pi^{*}}

holds for each $i\in[\lceil\log\frac{n}{\lambda}\rceil]$ . Then, we introduce the following parameterized lower bound in terms of join size distribution, and show that Algorithm 8 is optimal on two-table query up to poly-logarithmic factors. The proof is given in Appendix B.1.

Theorem 4.4.

Given arbitrary parameters $\overrightarrow{\mathsf{OUT}}_{\pi^{*}}$ under the uniform partition $\pi^{*}$ , for every sufficiently small $\epsilon,\alpha>0$ , $n_{D}\geq(1/\alpha)^{\Omega(1)}$ and $n_{Q}\leq(1/\alpha)^{O(1)}$ , there exists a family of queries $\mathcal{Q}$ of size $n_{Q}$ on domain $\mathbb{D}$ of size $n_{D}$ such that any $(\epsilon,o(1/n))$ -differentially private algorithm that takes as input a two-table database of input size at most $n$ while conforming to $\overrightarrow{\mathsf{OUT}}_{\pi^{*}}$ , and outputs an approximate answer to each query in $\mathcal{Q}$ to within error $\alpha$ must satisfy $\displaystyle{\alpha\geq\Omega\left(\max_{i\in[\lceil\log\frac{n}{\lambda}\rceil]}\sqrt{\mathsf{OUT}^{i}_{\pi^{*}}}\cdot\sqrt{2^{i}\cdot\lambda}\cdot f^{\mathrm{lower}}\right)}$ .

^{linecolor=purple,backgroundcolor=purple!25,bordercolor=purple}^{linecolor=purple,backgroundcolor=purple!25,bordercolor=purple}todo: linecolor=purple,backgroundcolor=purple!25,bordercolor=purpleXiao: Pasin: do we need to say something about the hardness of neighborhood-optimality here?^{linecolor=red,backgroundcolor=red!25,bordercolor=red}^{linecolor=red,backgroundcolor=red!25,bordercolor=red}todo: linecolor=red,backgroundcolor=red!25,bordercolor=redPasin: Moved the discussion to the conclusion.

4.2. Uniformize Hierarchical Query

Beyond two-table query, we surprisingly find that this uniformization technique can be further extended to the class of hierarchical queries. A join query $\mathcal{H}=\{\mathbf{x},\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}\}\}$ is hierarchical, if for any pair of attributes $x,y$ , either $E_{x}\subseteq E_{y}$ , or $E_{y}\subseteq E_{x}$ , or $E_{x}\cap E_{y}=\emptyset$ , where $E_{x}=\{i:x\in\mathbf{x}_{i}\}$ is the set of relations containing attribute $x$ . One can always organize the attributes of a hierarchical join into a tree such that every relation corresponds to a root-to-node path (see Figure 3).

Thanks to the nice structures of hierarchical joins, it is feasible to play with the magic of uniformization. Recall that the essence of uniformization is to partition instance into a set of sub-instances, so that we can get an upper bound on the residual sensitivity as tight as possible, as implied by the error expression in Theorem 1.5. In (2), the residual sensitivity is built on the join sizes of a set of residual queries, but these statistics are too far away from being the partition criteria. Instead of using residual sensitivity directly, we first rewrite the join size of an residual query and get an upper bound for it in terms of degrees.

4.2.1. An Upper Bound on Residual Query Size

To derive an upper bound on $T_{E}$ ’s for $E\subseteq[m]$ , we target a broader class of $\bar{q}$ -hierarchical queries⁴⁴4This is different from q-hierarchical queries in (berkholz2017answering), where $\mathbf{y}$ serves as the set of output attributes, although they have the same property characterization on $\mathbf{y}$ . by generalizing the aggregate attributes in (1) to any subset of attributes that sit on the “top” of $\mathcal{T}$ . It is not hard to see that $T_{E}=T_{E,\partial E}$ for boundary attributes $\partial E$ .

Definition 4.5.

For a hierarchical join $\mathcal{H}=(\mathbf{x},\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}\})$ and instance $\mathbf{I}$ , a $\bar{q}$ -hierarchical query on $E\subseteq[m]$ and $\mathbf{y}\subseteq(\cup_{i\in E}\mathbf{x}_{i})$ is defined as $\displaystyle{T_{E,\mathbf{y}}(\mathbf{I})=\max_{t\in\textsf{dom}(\mathbf{y})}\big{|}(\Join_{i\in E}R_{i})\ltimes t\big{|}}$ , where $\mathbf{y}$ satisfies the property: for any $x_{1},x_{2}\in\mathbf{x}$ , if $E_{x_{1}}\subseteq E_{x_{2}}$ and $x_{1}\in\mathbf{y}$ , then $x_{2}\in\mathbf{y}$ .

In the remaining, we focus on deriving an upper bound for $\bar{q}$ -hierarchical queries $T_{E,\mathbf{y}}(\mathbf{I})$ , which boils down to a product chain of degree functions. For example, $\textsf{deg}^{\mathbf{I}}_{34}(AB,ab)$ in Figure 3 indicates the number of tuples in $\pi_{ABE}(R_{3}\Join R_{4})$ that $ab$ appears.

Definition 4.6 (Degree Function).

For a hierarchical join $\mathcal{H}=(\mathbf{x},\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}\})$ and an instance $\mathbf{I}$ , the degree function $\textsf{deg}^{\mathbf{I}}_{E}(\mathbf{y},\vec{y})$ for a subset of relations $E\subseteq[m]$ , a subset of attributes $\mathbf{y}\subseteq(\cap_{i\in[E]}\mathbf{x}_{i})$ and a tuple $\vec{y}\in\textsf{dom}(\mathbf{y})$ is defined as⁵⁵5When the context is clear, we drop the superscript $\mathbf{I}$ from $\textsf{deg}^{\mathbf{I}}_{E}(\cdot,\cdot)$ . Moreover, when $|E|=1$ , say $E=\{i\}$ , we just write $\textsf{deg}_{i}(\cdot,\cdot)$ as short for $\textsf{deg}_{E}(\cdot,\cdot)$ .:

(3)

\textsf{deg}^{\mathbf{I}}_{E}(\mathbf{y},\vec{y})=\bigg{|}\left\{t\in\pi_{\cap_{i\in[E]}\mathbf{x}_{i}}\Join_{i\in E}R_{i}:\pi_{\mathbf{y}}t=\vec{y}\right\}\bigg{|}

Base Case. When $|E|=1$ , say $E=\{i\}$ , $T_{E,\mathbf{y}}$ degenerates to the degree of tuples over attributes $\mathbf{y}$ in $R_{i}$ (as $\mathbf{y}\subseteq\mathbf{x}_{i}$ ):

T_{E,\mathbf{y}}(\mathbf{I})=\max_{t\in\textsf{dom}(\mathbf{y})}\big{|}R_{i}\ltimes t\big{|}=\left\{\begin{array}[]{ll}1&\textrm{if $\mathbf{y}=\mathbf{x}_{i}$}\\ \displaystyle{\max_{t\in\textsf{dom}(\mathbf{y})}\textsf{deg}_{i}(\mathbf{y},\pi_{\mathbf{y}}t)}&\textrm{otherwise}\end{array}\right.

General Case. Denote $\bar{H}_{E}$ as the residual query of $\Join_{i\in E}R_{i}$ by removing attributes in $\mathbf{y}$ . We distinguish the following two cases:

•

$\bar{H}_{E}$ is disconnected. Let $\mathcal{C}_{E}$ be the set of connected component of $\bar{H}_{E}$ . We can then decompose $\Join_{i\in E}R_{i}$ into a set of sub-queries, after pushing the semi-join inside each connected component:

	$\displaystyle T_{E,\mathbf{y}}(\mathbf{I})=$	$\displaystyle\max_{t\in\textsf{dom}(\mathbf{y})}\prod_{E^{\prime}\in\mathcal{C}_{E}}\big{\|}\left(\Join_{i\in E^{\prime}}R_{i}\right)\ltimes t\big{\|}$
	$\displaystyle\leq$	$\displaystyle\prod_{E^{\prime}\in\mathcal{C}_{E}}\max_{t\in\textsf{dom}(\mathbf{y})}\big{\|}\left(\Join_{i\in E^{\prime}}R_{i}\right)\ltimes t\big{\|}=\prod_{E^{\prime}\in\mathcal{C}_{E}}T_{E^{\prime},\mathbf{y}\cap(\cup_{i\in E^{\prime}}\mathbf{x}_{i})}(\mathbf{I})$

The correctness follows that $\mathbf{y}\cap(\cup_{i\in E^{\prime}}\mathbf{x}_{i})\subseteq\cup_{i\in E^{\prime}}\mathbf{x}_{i}$ and $\mathbf{y}\cap(\cup_{i\in E^{\prime}}\mathbf{x}_{i})$ satisfies the property in Defition 4.6.

•

$\bar{H}_{E}$ is connected. In this case, we must have $\mathbf{y}\subsetneq(\cap_{i\in E}\mathbf{x}_{i})$ .

	$\displaystyle T_{E,\mathbf{y}}(\mathbf{I})\leq$	$\displaystyle\max_{t\in\textsf{dom}(\mathbf{y})}\textsf{deg}_{E}(\mathbf{y},t)\cdot\max_{t^{\prime}\in\textsf{dom}(\cap_{i\in E}\mathbf{x}_{i}):\pi_{\mathbf{y}}t^{\prime}=t}\big{\|}(\Join_{i\in E}R_{i})\ltimes t^{\prime}\big{\|}$
	$\displaystyle\leq$	$\displaystyle\max_{t\in\textsf{dom}(\mathbf{y})}\textsf{deg}_{E}(\mathbf{y},t)\cdot\max_{t^{\prime}\in\textsf{dom}(\cap_{i\in E}\mathbf{x}_{i})}\big{\|}(\Join_{i\in E}R_{i})\ltimes t^{\prime}\big{\|}$
	$\displaystyle=$	$\displaystyle\max_{t\in\textsf{dom}(\mathbf{y})}\textsf{deg}_{E}(\mathbf{y},t)\cdot T_{E,\cap_{i\in E}\mathbf{x}_{i}}(\mathbf{I})$

The correctness follows that $(\cap_{i\in E}\mathbf{x}_{i})\subseteq(\cup_{i\in E}\mathbf{x}_{i})$ and $\cap_{i\in E}\mathbf{x}_{i}$ satisfies the property in Definition 4.6.

It is not hard to see that $T_{E,\mathbf{y}}$ is eventually upper bounded by a product chain of degree functions (see Example 4.8). Careful inspection reveals that degree functions participate in $T_{E,\partial E}$ are not arbitrary; instead, they display very special structures captured by Lemma 4.7, which is critical to our partition procedure.

Lemma 4.7.

Every degree function $\textsf{deg}_{E^{\prime}}(\mathbf{y}^{\prime},\cdot)$ participating in the upper bound of $T_{E,\partial E}$ corresponds to a distinct attribute $x\in\mathbf{x}$ such that $E^{\prime}=E_{x}$ and $\mathbf{y}^{\prime}$ corresponds to the set of attributes lying on the path from the root $r$ to $x$ ’s parent in $\mathcal{T}$ .

example 0.

An upper bound derived for $T_{E}$ in Figure 3:

	$\displaystyle T_{345}=$	$\displaystyle\max_{(a,b)\in A\times B}\|R_{3}\Join R_{4}\Join R_{5}\ltimes(a,b)\|$
	$\displaystyle=$	$\displaystyle\max_{a\in A}\|R_{5}\ltimes(a)\|\times\max_{(a,b)\in A\times B}\|R_{3}\Join R_{4}\ltimes(a,b)\|$
	$\displaystyle=$	$\displaystyle\max_{a\in A}\|R_{5}\ltimes(a)\|\times\max_{(a,b)\in A\times B}\|\pi_{ABG}R_{3}\Join R_{4}\|$
	$\displaystyle\times$	$\displaystyle\max_{(a,b,g)\in A\times B\times G}\|R_{3}\ltimes(a,b,g)\|\times\max_{(a,b,g)\in A\times B\times G}\|R_{4}\ltimes(a,b,g)\|$
	$\displaystyle=$	$\displaystyle\max_{a\in A}\textsf{deg}_{5}(A,a)\times\max_{(a,b,g)\in A\times B\times G}\textsf{deg}_{3}(ABG,abg)$
	$\displaystyle\times$	$\displaystyle\max_{(a,b,g)\in A\times B\times G}\textsf{deg}_{4}(ABG,abg)\times\max_{(a,b)\in A\times B}\textsf{deg}_{34}(AB,ab)$

4.2.2. Partition with Degree Characterization

From the upper bound on $T_{E,\partial E}(\mathbf{I})$ , we now define the degree characterization for hierarchical joins, similar as the two-table join. Our target is to partition the input instance into a set of sub-instances (which may not be tuple-disjoint), such that each sub-instance is captured by a distinct degree characterization, and the join results over all sub-instances form a partition of the final join result.

Definition 4.9 (Degree Characterization).

For a hierarchical join $\mathcal{H}=(\mathbf{x},\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}\})$ , a degree characterization is defined as $\sigma:2^{[m]}\times 2^{\mathbf{x}}\to\mathbb{Z}^{*}\cup\{\bot\}$ such that for any $E\subseteq[m]$ and $\mathbf{y}\subseteq\mathbf{x}$ , $\sigma(E,\mathbf{y})\neq\bot$ if and only if there exists an attribute $x\in\mathbf{x}$ such that $E=E_{x}$ and $\mathbf{y}$ is the set of ancestors of $x$ in $\mathcal{T}$ .

As described in Algorithm 6, the high-level idea is to partition the input instance by attributes in a bottom-up way on $\mathcal{T}$ , where $\mathcal{T}$ is the attribute tree of the input hierarchical query $\mathcal{H}$ , and invoke Algorithm 7 as a primitive on every sub-instance obtained so far.

As described in Algorithm 7, PartitionByAttr takes an input an attribute $x$ and an instance $\mathbf{I}^{\prime}$ such that

•

for every attribute $x^{\prime}$ as a descendant of $x$ in $\mathcal{T}$ , $\textsf{deg}^{\mathbf{I}^{\prime}}_{E_{x^{\prime}}}(\mathbf{y}^{\prime},\vec{y})$ is uniformized for every $\vec{y}\in\textsf{dom}(\mathbf{y}^{\prime})$ , where $\mathbf{y}^{\prime}$ is the set of ancestors of $x^{\prime}$ in $\mathcal{T}$ .

and outputs a set of sub-instances $\mathbb{I}^{\prime}$ of $\mathbf{I}^{\prime}$ , such that

•

$\bigcupplus_{\mathbf{I}^{\prime\prime}\in\mathbb{I}^{\prime}}\left(\Join_{i\in[m]}R^{\mathbf{I}^{\prime\prime}}_{i}\right)=\left(\Join_{i\in[m]}R^{\mathbf{I}^{\prime}}_{i}\right)$ , i.e., the join results of sub-instances in $\mathbb{I}^{\prime}$ form a partition of join result of $\mathbf{I}^{\prime}$ ;
•

for every sub-instance $\mathbf{I}^{\prime\prime}\in\mathbb{I}^{\prime}$ , $\textsf{deg}^{\mathbf{I}^{\prime\prime}}_{E_{x}}(\mathbf{y},\vec{y})$ is roughly the same for every $\vec{y}\in\textsf{dom}(\mathbf{y})$ , where $\mathbf{y}$ is the set of ancestors of $x$ in $\mathcal{T}$ .

Similar to the two-table case, for every $\vec{y}\in\textsf{dom}(\mathbf{y})$ , we compute a noisy version of its degree $\textsf{deg}^{\mathbf{I}^{\prime}}_{E_{x}}(\mathbf{y},\vec{y})$ and put it into the bucket indexed by its logarithmic value. For each $j\in E_{x}$ , we obtain a tuple-disjoint partition of $R^{\mathbf{I}^{\prime}}_{j}$ based on $\mathbf{y}$ , such that each $i\in\lceil\log\left(\frac{n}{\lambda}+1\right)\rceil$ defines a sub-instance involving a sub-relation $R^{\mathbf{I}^{\prime}}_{j,i}$ for each $j\in E_{x}$ as well as all remaining relations $R^{\mathbf{I}^{\prime}}_{j}$ with $j\notin E_{x}$ . At last, we return the union of all sub-instances.

\mathbb{I}\leftarrow\{\mathbf{I}\}

;

\mathcal{T}\leftarrow

the attribute tree of

\left(\mathbf{x},\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}\}\right)

;

4 while there is an non-visited node in $\mathcal{T}$ do

x\leftarrow

any leaf or any node whose children are all visited;

6 foreach $\mathbf{I}^{\prime}\in\mathbb{I}$ do

\mathbb{I}\leftarrow\mathbb{I}\cup\textsc{PartitionByAttr}(\mathbf{I}^{\prime},x)

;

9 Mark

x

as “visited”;

11return

\mathbb{I}

;

Algorithm 6 Partition-Hierarchical

(\mathbf{I})

\mathbf{y}\leftarrow\{y\in\mathbf{x}:E_{x}\subsetneq E_{y}\}

;

3 foreach $\vec{y}\in\textsf{dom}(\mathbf{y})$ do

\widetilde{\textsf{deg}}^{\mathbf{I}^{\prime}}_{E_{x}}(\mathbf{y},\vec{y})=\textsf{deg}^{\mathbf{I}^{\prime}}_{E_{x}}(\mathbf{y},\vec{y})+\textsf{STGeom}_{\epsilon,\delta,1}(x)

;

i\leftarrow\max\left\{1,\left\lceil\log\frac{1}{\lambda}\cdot\widetilde{\textsf{deg}}^{\mathbf{I}^{\prime}}_{E_{x}}(\mathbf{y},\vec{y})\right\rceil\right\}

;

6 if $\mathbf{y}^{i}$ does not exist then

\mathbf{y}^{i}\leftarrow\left\{\vec{y}\right\}

;

7 else

\mathbf{y}^{i}\leftarrow\mathbf{y}^{i}\cup\left\{\vec{y}\right\}

;

9foreach $i$ with $\mathbf{y}^{i}\neq\emptyset$ do

10 foreach $j\in E_{x}$ do

R^{\mathbf{I}^{\prime}}_{j,i}=\left\{R^{\mathbf{I}^{\prime}}_{j}(t):t\in\mathbb{D}_{j},\pi_{\mathbf{y}}t\in\mathbf{y}^{i},R^{\mathbf{I}^{\prime}}_{j}(t)>0\right\}

14return

\bigcup_{i:\mathbf{y}^{i}\neq\emptyset}\left\{\{R^{\mathbf{I}^{\prime}}_{j,i}:j\in E_{x}\}\cup\{R^{\mathbf{I}^{\prime}}_{j}:j\notin E_{x}\}\right\}

;

Algorithm 7 PartitionByAttr

(\mathbf{I}^{\prime},x)

Lemma 4.10.

For input instance $\mathbf{I}$ , let $\mathbb{I}$ be the set of sub-instances returned by Algorithm 7. $\mathbb{I}$ satisfies the following properties:

•

$\bigcupplus_{\mathbf{I}^{\prime}\in\mathbb{I}}\left(\Join_{i\in[m]}R^{\mathbf{I}^{\prime}}_{i}\right)=\left(\Join_{i\in[m]}R^{\mathbf{I}}_{i}\right)$ , i.e., the join results of all sub-instances in $\mathbb{I}$ form a partition of the join result of $\mathbf{I}$ ;
•

Each input tuple appears in $O\left(\log^{c}n\right)$ sub-instances of $\mathbb{I}$ , where $c$ is a constant depending on query size;

•

Each sub-instance $\mathbf{I}^{\sigma}\in\mathbb{I}$ corresponds to a distinct degree characterization $\sigma$ such that for any $E\in[m]$ and $\mathbf{y}\subseteq\mathbf{x}_{E}$ with $\sigma(E,\mathbf{y})\neq\bot$ :

(4)

\widetilde{\textsf{deg}}^{\mathbf{I}^{\sigma}}_{E}(\mathbf{y},\vec{y})\in\left(\gamma_{i-1}\cdot 2^{\sigma(E,\mathbf{y})-1},\gamma_{i}\cdot 2^{\sigma(E,\mathbf{y})\right]holdsforany\vec{y}\in\textsf{dom}(\mathbf{y})if\widetilde{\textsf{deg}}^{\mathbf{I}^{\sigma}}_{E}(\mathbf{y},\vec{y})>0.\par}

Lemma 4.11.

Algorithm 4 on hierarchical joins is $(O(\log^{c}n)\cdot\epsilon,O(\log^{c}n)\cdot\delta)$ -DP, where $c$ is a constant depending only on the tree. ^{linecolor=red,backgroundcolor=red!25,bordercolor=red}^{linecolor=red,backgroundcolor=red!25,bordercolor=red}todo: linecolor=red,backgroundcolor=red!25,bordercolor=redPasin: I fixed this: I think $c$ is tree-dependent but *not* query (i.e. $\mathcal{Q}$ )-dependent.

The logarithmic factor in Lemma 4.11 arise since each input tuple participates in $O(\log^{c}n)$ sub-instances, implied by Lemma 4.10. Next, given a degree characterization $\sigma$ , how to derive the residual sensitivity of instances consistent with $\sigma$ ? Actually, this is already resolved by just rewriting the upper bound derived for $T_{E}$ , denoted as $T^{\sigma}_{E}$ . Similar to the two-table case, we also define the uniform partition of an input instance $\mathbf{I}$ over a hierarchical join query $\mathcal{Q}$ using true degrees as $\{\mathbf{I}^{\sigma}:\sigma\textrm{ is a degree characterization of }\mathbf{I}\}$ . As shown in Appendix B.1, the error complexity achieved by the partition based on noisy degrees can be bounded by the uniform one.

Theorem 4.12.

For any hierarchical join $\mathcal{H}=(\mathbf{x},\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}\})$ and an instance $\mathbf{I}$ , a family of linear queries $\mathcal{Q}$ , and $\epsilon>0$ , $\delta>0$ , there exists an algorithm that is $(\epsilon,\delta)$ -DP, and with probability at least $1-1/\textsf{poly}(|\mathcal{Q}|)$ produces $\mathbb{F}$ such that all queries in $\mathcal{Q}$ can be answered within error:

\alpha=O\left((\sum_{\sigma}\sqrt{\mathsf{OUT}^{\sigma}\cdot SS^{\beta}_{\textsf{count}}\left(\mathbf{I}^{\sigma}\right)\cdot\lambda}+SS^{\beta}_{\textsf{count}}\left(\mathbf{I}^{\sigma}\right)\cdot\lambda)\cdot f^{\mathrm{upper}}\right)

where $\sigma$ is over all degree characterizations of $\mathbf{I}$ , $\mathbf{I}^{\sigma}$ is the sub-instance characterized by $\sigma$ , $\mathsf{OUT}^{\sigma}$ is the join size of $\mathbf{I}^{\sigma}$ , and $\beta=\frac{\epsilon}{2\ln(2/\delta)}$ .

Let $\overrightarrow{\mathsf{OUT}}=\left\{\mathsf{OUT}^{\sigma}\in\mathbb{Z}:\sigma\textrm{ is a degree characterization of }\mathbf{I}\right\}$ as the output size distribution over the uniform degree partition. An instance $\mathbf{I}$ conforms to $\overrightarrow{\mathsf{OUT}}$ if $|J^{\sigma}|=\mathsf{OUT}^{\sigma}$ , for every degree characterization $\sigma$ , where $J^{\sigma}$ is the join result of sub-instance $\mathbf{I}^{\sigma}$ . Extending the lower bound argument of Theorem 4.4, we obtain the following parameterized lower bound for hierarchical queries.

Theorem 4.13.

Given a hierarchical join $\mathcal{H}=(\mathbf{x},\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}\})$ and an arbitrary parameters $\overrightarrow{\mathsf{OUT}}$ , for every sufficiently small $\epsilon,\alpha>0$ , $n_{D}\geq(1/\alpha)^{\Omega(1)}$ and $n_{Q}\leq(1/\alpha)^{O(1)}$ , there exists a family of queries $\mathcal{Q}$ of size $n_{Q}$ on domain $\mathbb{D}$ of size $n_{D}$ such that any $(\epsilon,o(1/n))$ -differentially private algorithm that takes as input a multi-table database over $\mathcal{H}$ of input size at most $n$ while conforming to $\overrightarrow{\mathsf{OUT}}$ , and outputs an approximate answer to each query in $\mathcal{Q}$ to within error $\alpha$ must satisfy

\alpha\geq\tilde{\Omega}\left(\max_{\sigma}\sqrt{\mathsf{OUT}^{\sigma}}\cdot\sqrt{LS^{\sigma}_{\textsf{count}}}\cdot f^{\mathrm{lower}}\right)

where $\sigma$ is over all degree characterizations and $\displaystyle{LS^{\sigma}_{\textsf{count}}=\max_{i\in[m]}T^{\sigma}_{[m]-i}}$ is the local sensitivity of $\textsf{count}(\cdot)$ under $\sigma$ .

5. Conclusions and Discussion

In this work, we proposed algorithms for answering linear queries of multi-table joins. Our work opens up several interesting questions that we list below.

Non-Hierarchical Setting.

First and perhaps the most immediate question is if uniformization can benefit the non-hierarchical case. We note that the uniformization technique from Section 4.2 itself is applicable but may not lead to uniformized residual sensitivity on non-hierarchical queries, even on the simplest non-hierarchical join $R_{1}(A,B)\Join R_{2}(B,C)\Join R_{3}(C,D)$ . We can simply partition values in $\textsf{dom}(B)$ into buckets, each of which is indexed by two integers such that $B^{ij}=\{b\in\textsf{dom}(B):\textsf{deg}_{1}(B,b)\in(\gamma_{i-1},\gamma_{i}],\textsf{deg}_{2}(B,b)\in(\gamma_{j-1},\gamma_{j}]\}.$ Similar partition idea applies to values in $\textsf{dom}(C)$ . Note that each pair $\left(B^{ij},C^{kh}\right)$ of buckets corresponding to $\sigma=(i,j,k,h)$ defines a sub-instance $\mathbf{I}^{\sigma}$ . But, residual sensitivity within each sub-instance is not guaranteed to be uniformized. In determining the residual sensitivity in (2), we observe that $T_{23}\leq\lambda^{2}\cdot 2^{j+h}$ and $T_{12}\leq\lambda^{2}\cdot 2^{i+k}$ ; however, $\textsf{deg}_{2}(B,b)$ and $\textsf{deg}_{2}(C,c)$ are defined over the whole instance $\mathbf{I}$ , instead of the specific instance $\mathbf{I}^{\sigma}$ . This coarse-grained partition can be extended to arbitrary acyclic query (see Appendix B.1), but does not necessarily lead to uniformized residual sensitivity inside each sub-instance. We leave this as an interesting future work.

Query-Specific Optimality.

Throughout this work, we consider worst-case set $\mathcal{Q}$ of queries parameterized by its size. Although this is a reasonable starting point, it is also plausible to hope for an algorithm that is nearly optimal for all query set $\mathcal{Q}$ . In the single table case, this has been achieved in (HardtT10; BhaskaraDKT12; NikolovT016). However, the situation is rather complicated in multi-table setting since we have to take the local sensitivity into account, whereas, in the single table case, the query family already tells us everything about the possible change by moving to a neighboring dataset. This also seems like an interesting open question for future research.

Instance-Specific Optimality.

Similarly, we have considered worst-case instances. One might instead prefer to achieve finer-grained instance-optimal errors. For the single table case, (asi2020instance) observed that an instance-optimal DP algorithm is not achievable, since a trivial algorithm can return the same answer(s) for all input instances, which could work perfectly on one specific instance but miserably on all remaining instances. To overcome this, a notion of “neighborhood optimality” has been proposed (dong2021nearly; asi2020instance) where we consider not only a single instance but also its neighbors at some constant distance. We note, however, that this would not work for our setting when there are a large number of queries. Specifically, if we again consider an algorithm that always returns the true answer for this instance, then its error with respect to the entire neighborhood set is still quite small–at most the distance times the maximum local sensitivity in the set. This is independent of the table size $n$ , whereas our lower bounds show that the dependency on $n$ is inevitable. As such, the question of how to define and prove instance-optimal errors remains open for the multi-query case.

Appendix A Missing Proof in Section 1

We adapt the single-table proof to our context. Let $\mathbb{J}$ be a table of $n$ data items over domain $\mathbb{D}$ . Define $\mathcal{Q}=\{q:\mathbb{D}\to[-1,+1]\}$ be the set of linear queries. Let $\mathbb{F}_{i}$ be the distribution returned by MWEM algorithm in the $i$ -iteration. Define $\lambda_{i}=\max_{q}|q(\mathbb{F}_{i-1})-q(\mathbb{J})|$ as the maximum error over all queries in terms of $\mathbb{F}_{i-1}$ and $\mathbb{J}$ . Now we are going to bound the maximum error of the MWEM algorithm:

	$\displaystyle\max_{q}\|q(\textrm{avg}_{i\leq T}\mathbb{F}_{i})-q(\mathbb{J})\|$	$\displaystyle=\max_{q}\|\textrm{avg}_{i\leq T}q(\mathbb{F}_{i})-q(\mathbb{J})\|$
		$\displaystyle\leq\max_{q}\textrm{avg}_{i\leq T}\|q(\mathbb{F}_{i})-q(\mathbb{J})\|$
		$\displaystyle\leq\textrm{avg}_{i\leq T}\max_{q}\|q(\mathbb{F}_{i})-q(\mathbb{J})\|$
		$\displaystyle\leq\textrm{avg}_{i\leq T}\lambda_{i}$

Define $\gamma=2T\cdot\frac{\widetilde{\Delta}}{\epsilon}\cdot\log|\mathcal{Q}|$ for $\widetilde{\Delta}=\Delta+G_{\epsilon,\delta,1}$ . Note that the sensitivity for invoking the exponential mechanism is

\displaystyle\max_{\mathbb{J},\mathbb{J}^{\prime}\textrm{are neighboring}}\max_{q}|q(\mathbb{J})-q(\mathbb{J}^{\prime})|\leq\big{|}|\mathbb{J}|-|\mathbb{J}^{\prime}|\big{|}\leq\Delta\leq\widetilde{\Delta}

and that for invoking the laplacian mechanism is

		$\displaystyle\max_{\mathbb{J},\mathbb{J}^{\prime}\textrm{are neighboring}}\max_{q}\max_{i\in\{1,2,\cdots,T\}}\|s_{i}(\mathbb{J},q)-s_{i}(\mathbb{J}^{\prime},q)\|$
	$\displaystyle\leq$	$\displaystyle\max_{\mathbb{J},\mathbb{J}^{\prime}\textrm{are neighboring}}\max_{q}\|q(\mathbb{J})-q(\mathbb{J}^{\prime})\|\leq\widetilde{\Delta}$

followed by the similar argument.

Lemma A.1.

With probability at least $1-2T/|\mathcal{Q}|^{c}$ for any $c\geq 0$ , for all $1\leq i\leq T$ , we have:

	$\displaystyle\|q_{i}(\mathbb{F}_{i-1})-q_{i}(\mathbb{J})\|\geq\lambda_{i}-(2c+2)\times\gamma$
	$\displaystyle\|m_{i}-q_{i}(\mathbb{J})\|\leq c\times\gamma$

Proof.

For the first inequality, we note that the probability the exponential mechanism with parameter $\frac{\epsilon}{2T\cdot\widetilde{\Delta}}$ selects a query with quality score at least $r$ less than the optimal is bounded by

\Pr[|q_{i}(\mathbb{F}_{i-1})-q_{i}(\mathbb{J})|<\lambda_{i}-(2c+2)\times\gamma]\leq|\mathcal{Q}|\times\exp(-\frac{\epsilon\gamma}{4T\cdot\widetilde{\Delta}})\leq\frac{1}{|\mathcal{Q}|^{c}}

For the second inequality, we note that $|m_{i}-q_{i}(\mathbb{J})|\leq c\cdot\gamma$ happens if and only if $\left|\textsf{Lap}_{\frac{2T\cdot\widetilde{\Delta}}{\epsilon}}(x)\right|\leq c\cdot\gamma$ happens. By definition of the Laplace distribution, we have:

\displaystyle\Pr\left[\left|\textsf{Lap}_{\frac{2T\cdot\widetilde{\Delta}}{\epsilon}}(x)\right|>c\cdot 2T\cdot\frac{\widetilde{\Delta}}{\epsilon}\cdot\log|\mathcal{Q}|\right]\leq\exp(-c\cdot\log|\mathcal{Q}|)=\frac{1}{|\mathcal{Q}|^{c}}

A union bound over $2T$ events yields such a failure probability. ∎

Then, how the MW mechanism improve the approximation in each round where $q_{i}(\mathbb{F}_{i})-q_{i}(\mathbb{J})$ has large magnitude. To capture the improvement, we use the relative entropy function:

\displaystyle\Psi^{\prime}_{i}=\frac{1}{n}\sum_{x\in\mathbb{D}}\mathbb{J}(x)\log\left(\frac{\mathbb{J}(x)}{\mathbb{F}_{i}(x)}\right),\ \Psi_{i}=\frac{n}{\hat{n}}\cdot\Psi_{i}

Here, we can only show that

	$\displaystyle\Psi_{0}$	$\displaystyle=\frac{1}{\widehat{n}}\cdot\sum_{t\in\mathbb{D}}J(t)\log\frac{J(t)}{\mathbb{F}_{0}(t)}\leq\frac{\|\mathbb{J}\|}{\widehat{n}}\cdot\log\frac{\|\mathbb{J}\|\cdot\|\mathbb{D}\|}{\widehat{n}}\leq\log\|\mathbb{D}\|$
	$\displaystyle\Psi_{i}$	$\displaystyle=-\frac{1}{\widehat{n}}\cdot\sum_{t\in\mathbb{D}}\mathbb{J}(t)\log\frac{\mathbb{F}_{i}(t)}{\mathbb{J}(t)}\geq-\frac{1}{\widehat{n}}\sum_{t\in\mathbb{D}^{\prime}}\mathbb{F}_{i}(t)+\frac{\|\mathbb{J}\|}{\widehat{n}}\geq-1$

where $\mathbb{D}^{\prime}\subseteq\mathbb{D}$ such that $J(t)>0$ for every $t\in\mathbb{D}^{\prime}$ .

We next consider how $\Psi$ changes in each iteration:

	$\displaystyle\Psi_{i}-\Psi_{i-1}=$	$\displaystyle\frac{1}{\hat{n}}\cdot\sum_{x\in\mathbb{D}}\mathbb{J}(x)\cdot\log\left(\frac{\mathbb{F}_{i}(x)}{\mathbb{F}_{i-1}(x)}\right)$
	$\displaystyle=$	$\displaystyle\frac{1}{\hat{n}}\cdot q_{i}(\mathbb{J})\cdot\eta_{i}-\frac{n}{\hat{n}}\log\beta_{i}$

where

	$\displaystyle\eta_{i}$	$\displaystyle=\frac{1}{2\hat{n}}\cdot(m_{i}-q_{i}(\mathbb{F}_{i-1})),$
	$\displaystyle\beta_{i}$	$\displaystyle=\frac{1}{\hat{n}}\cdot\sum_{x\in\mathbb{D}}\exp(q_{i}(x)\eta_{i})\mathbb{F}_{i-1}(x).$

Using the fact that $\exp(x)\leq 1+x+x^{2}$ for $|x|\leq 1$ and $|q_{i}(x)\eta_{i}|\leq 1$ ,

	$\displaystyle\beta_{i}\leq$	$\displaystyle\frac{1}{\hat{n}}\cdot\sum_{x\in\mathbb{D}}(1+q_{i}(x)\eta_{i}+q^{2}_{i}(x)\eta^{2}_{i})\mathbb{F}_{i-1}(x)$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\hat{n}}\cdot\sum_{x\in\mathbb{D}}(1+q_{i}(x)\eta_{i}+\eta^{2}_{i})\mathbb{F}_{i-1}(x)$
	$\displaystyle\leq$	$\displaystyle 1+\frac{1}{\hat{n}}\cdot\eta_{i}q_{i}(\mathbb{F}_{i-1})+\eta^{2}_{i}$

Then, we can rewrite $\Psi_{i}-\Psi_{i-1}$ in the following way:

	$\displaystyle\Psi_{i}-\Psi_{i-1}=$	$\displaystyle\frac{1}{\hat{n}}\cdot q_{i}(\mathbb{J})\cdot\eta_{i}-\log\beta_{i}$
	$\displaystyle\geq$	$\displaystyle\frac{1}{\hat{n}}\cdot q_{i}(\mathbb{J})\cdot\eta_{i}-\left(\frac{1}{\hat{n}}\cdot\eta_{i}q_{i}(\mathbb{F}_{i-1})+\eta^{2}_{i}\right)$
	$\displaystyle\geq$	$\displaystyle\eta_{i}\cdot\left(\frac{q_{i}(\mathbb{J})-q_{i}(\mathbb{F}_{i-1})}{\hat{n}}-\frac{m_{i}-q_{i}(\mathbb{F}_{i-1})}{2\hat{n}}\right)$
	$\displaystyle=$	$\displaystyle\frac{m_{i}-q_{i}(\mathbb{F}_{i-1})}{2\hat{n}}\cdot\frac{2q_{i}(\mathbb{J})-m_{i}-q_{i}(\mathbb{F}_{i-1})}{2\hat{n}}$
	$\displaystyle=$	$\displaystyle\left(\frac{q_{i}(\mathbb{J})-q_{i}(\mathbb{F}_{i-1})}{2\hat{n}}\right)^{2}-\left(\frac{q_{i}(\mathbb{J})-m_{i}}{2\hat{n}}\right)^{2}$
	$\displaystyle\geq$	$\displaystyle\frac{1}{4\hat{n}^{2}}\cdot\left((\lambda_{i}-4\gamma)^{2}-\gamma^{2}\right)$

where the last inequality follows the previous analysis by taking $c=1$ . By rewritting the last inequality, we have

\displaystyle\lambda_{i}\leq\sqrt{4\hat{n}^{2}\cdot(\Psi_{i}-\Psi_{i-1})+\gamma^{2}}+4\times\gamma

Then,

	$\displaystyle\textrm{avg}_{i\leq T}\lambda_{i}\leq$	$\displaystyle\sqrt{4\hat{n}^{2}\cdot\textrm{avg}_{i\leq T}(\Psi_{i}-\Psi_{i-1})+\gamma^{2}}+4\times\gamma$
	$\displaystyle\leq$	$\displaystyle 2\hat{n}\cdot\sqrt{\frac{\log\|\mathbb{D}\|}{T}}+5\times\gamma$
	$\displaystyle=$	$\displaystyle O\left(\hat{n}\cdot\sqrt{\frac{\log\|\mathbb{D}\|}{T}}+\frac{T\cdot\log\|\mathcal{Q}\|\cdot\widetilde{\Delta}}{\epsilon}\right)$

By taking $T=\left(\frac{\widehat{n}\cdot\epsilon\cdot\sqrt{\log|\mathbb{D}|}}{\widetilde{\Delta}\cdot\log|\mathcal{Q}|}\right)^{2/3}$ , we can obtain the minimized error as $O\left(\widehat{n}^{2/3}\cdot\left(\frac{\log|\mathcal{Q}|\cdot\widetilde{\Delta}\cdot\log|\mathbb{D}|)}{\epsilon}\right)^{1/3}\right)$ .bIn $(\epsilon,\delta)$ -DP, this error can be generalized as (omitting the fact of $\left(\frac{\log|\mathcal{Q}|\cdot\log\frac{1}{\delta}\cdot\log^{\frac{1}{2}}|\mathbb{D}|)}{\epsilon}\right)^{1/2}$ ):

O\left(\sqrt{\widehat{n}}\cdot\sqrt{\widetilde{\Delta}}\right)=O\left(\sqrt{|\mathbb{J}|}\cdot\sqrt{\widetilde{\Delta}}+\widetilde{\Delta}\right)

Appendix B Missing Proofs in Section 3

Lemma B.1.

For any pair of neighboring instance $\mathbf{I},\mathbf{I}^{\prime}$ , let $J,J^{\prime}$ be the join results of $\mathbf{I},\mathbf{I}^{\prime}$ respectively. Then, $\big{|}|J|-|J^{\prime}|\big{|}\leq\widetilde{\Delta}$ .

Lemma B.1 directly follows the definition of $LS_{\textsf{count}}(\mathbf{I})$ and the non-negative property of STGeom.

Proof of Lemma 3.1.

Consider two neighboring database $\mathbf{I}=(R_{1},R_{2})$ and $\mathbf{I}^{\prime}=(R^{\prime}_{1},R^{\prime}_{2})$ . We first prove $\widehat{n}\sim_{(2\epsilon,2\delta)}\widehat{n^{\prime}}$ . Let $\Delta,\Delta^{\prime}$ be the local sensitivity of $\mathbf{I},\mathbf{I}^{\prime}$ respectively. It can be easily checked that $|\Delta-\Delta^{\prime}|\leq 1$ . Let $\widetilde{\Delta},\widetilde{\Delta^{\prime}}$ be the random variables by adding random noises from $\textsf{STGeom}_{\epsilon,\delta,1}$ to $\Delta,\Delta^{\prime}$ separately. Let $\textsf{supp}\left(\widetilde{\Delta}\right),\textsf{supp}\left(\widetilde{\Delta^{\prime}}\right)$ be the support of $\widetilde{\Delta},\widetilde{\Delta^{\prime}}$ separately. From STGeom, we observe:

(5)

\Delta+\textsf{STGeom}_{\epsilon,\delta,1}(x)\sim_{(\epsilon,\delta)}\Delta^{\prime}+\textsf{STGeom}_{\epsilon,\delta,1}(x)

since $|\Delta-\Delta^{\prime}|=1$ , hence $\widetilde{\Delta}\sim_{(\epsilon,\delta)}\widetilde{\Delta^{\prime}}$ . We next note that for any $\widehat{\Delta}\in\textsf{supp}\left(\widetilde{\Delta}\right)\cup\textsf{supp}\left(\widetilde{\Delta^{\prime}}\right)$ , the following holds:

\displaystyle|J|+\textsf{STGeom}_{\epsilon,\delta,\widehat{\Delta}}(x)\sim_{(\epsilon,\delta)}|J^{\prime}|+\textsf{STGeom}_{\epsilon,\delta,\widehat{\Delta}}

since $\big{|}|J|-|J^{\prime}|\big{|}\leq\widetilde{\Delta}$ . From the sequential composition rule (dwork2006calibrating; dwork2006our), we conclude that $\widehat{n}\sim_{(2\epsilon,2\delta)}\widehat{n}^{\prime}$ .

We next turn to line 6. Let $\mathbb{F},\mathbb{F}^{\prime}$ be the synthetic data generated for $(R_{1},R_{2}),(R^{\prime}_{1},R^{\prime}_{2})$ separately. The single-table algorithm starts with a uniform distribution with $\mathbb{F}_{0}(x)=\frac{\widehat{n}}{|\mathbb{D}|}$ for $(R_{1},R_{2})$ , and $\mathbb{F}^{\prime}_{0}(x)=\frac{\widehat{n}^{\prime}}{|\mathbb{D}|}$ for $(R^{\prime}_{1},R^{\prime}_{2})$ . Implied by the analysis above, the initial state satisfies $(2\epsilon,2\delta)$ -DP, i.e., $\mathbb{F}_{0}\sim_{(2\epsilon,2\delta)}\mathbb{F}^{\prime}_{0}$ .

We finally come to line 6-9, with $T$ invocations to the EM Mechanism and Laplace mechanism. For exponential mechanism, let $q_{j},q_{j^{\prime}}$ be the query selected after line 7. For Laplace mechanism, let $m_{i},m^{\prime}_{i}$ be the resulted value after line 9 for $(R_{1},R_{2}),(R^{\prime}_{1},R^{\prime}_{2})$ respectively. From previous, we already have (5). We next note two facts for any $\widehat{\Delta}\in\textsf{supp}\left(\widetilde{\Delta}\right)\cup\textsf{supp}\left(\widetilde{\Delta^{\prime}}\right)$ :

•

$j\sim_{(\epsilon,\delta)}j^{\prime}$ since $s_{i}(J,q)-s_{i}(J^{\prime},q)\leq|q(J)-q(J^{\prime})|\leq\big{|}|J|-|J^{\prime}|\big{|}\leq\widehat{\Delta}$ holds for every query $q$ ,

\displaystyle q_{i}(J)+\textsf{STGeom}_{\frac{\epsilon}{2T},\delta,\widehat{\Delta}}(x)\sim_{(\frac{\epsilon}{2T},\delta)}()q_{i}(J^{\prime})+\textsf{STGeom}_{\frac{\epsilon}{2T},\delta,\widehat{\Delta}}(x)

since $\big{|}q_{i}(J)-q(J^{\prime})\big{|}\leq|J-J^{\prime}|\leq\Delta\leq\widehat{\Delta}$ . Summing over $T$ iterations, it yields $(\epsilon,\delta)$ -DP. Thus, Algorithm 1 preserves $(3\epsilon,3\delta)$ -DP. ∎

Proof of Lemma 3.6.

For any pair of neighboring instance $\mathbf{I},\mathbf{I}^{\prime}$ , we note that $SS^{\beta}_{\textsf{count}}(\mathbf{I})\leq e^{\beta}\cdot SS^{\beta}_{\textsf{count}}(\mathbf{I}^{\prime})$ always holds. Let $\widetilde{\Delta},\widetilde{\Delta}^{\prime}$ be the updated value after line 3 for $\mathbf{I},\mathbf{I}^{\prime}$ respectively. It can showed that with probability at least $1-\delta$ , $|\textsf{Lap}_{1}(x)|<\ln\frac{2}{\delta}$ , then $\frac{\beta}{\epsilon}\textsf{Lap}_{1}(x)+\frac{1}{2}\in[0,1]$ , and therefore $\widetilde{\Delta}\in[SS^{\beta}_{\textsf{count}}(\mathbf{I}),SS^{\beta}_{\textsf{count}}(\mathbf{I})\cdot e]$ . For any $y\in[SS^{\beta}_{\textsf{count}}(\mathbf{I}),+\infty)$ , we have

\displaystyle\frac{\Pr[\widetilde{\Delta}=y]}{\Pr[\widetilde{\Delta}^{\prime}=y]}=\exp\left(\frac{\epsilon}{\beta}\cdot\log\frac{SS_{\textsf{count}}(\mathbf{I})}{SS_{\textsf{count}}(\mathbf{I}^{\prime})}\right)\leq\exp(\epsilon)

thus, $\widetilde{\Delta}\sim_{(\sigma,\delta)}\widetilde{\Delta}^{\prime}$ . Then, for any $\widehat{\Delta}\in\textsf{supp}\left(\widetilde{\Delta}\right)\cup\textsf{supp}\left(\widetilde{\Delta}^{\prime}\right)$ , we know that

|J|+\textsf{STGeom}_{\epsilon,\delta,\widehat{\Delta}}(x)\sim_{(\epsilon,\delta)}|J^{\prime}|+\textsf{STGeom}_{\epsilon,\delta,\widehat{\Delta}}(x)

since $\big{|}|J|-|J^{\prime}|\big{|}\leq LS_{\textsf{count}}(\mathbf{I})\leq SS_{\textsf{count}}(\mathbf{I})\leq\widehat{\Delta}$ . Let $\widehat{n},\widehat{n}^{\prime}$ be the resulted value after line 5 for $\mathbf{I},\mathbf{I}^{\prime}$ respectively. Then, $\widehat{n}\sim_{(2\epsilon,2\delta)}\widehat{n}^{\prime}$ .

We next turn to line 6. Let $\mathbb{F},\mathbb{F}^{\prime}$ be the synthetic data generated for $\mathbf{I},\mathbf{I}^{\prime}$ separately. The single-table algorithm starts with a uniform distribution with $\mathbb{F}_{0}(x)=\frac{\widehat{n}}{|\mathbb{D}|}$ for $(R_{1},R_{2})$ , and $\mathbb{F}^{\prime}_{0}(x)=\frac{\widehat{n}^{\prime}}{|\mathbb{D}|}$ for $(R^{\prime}_{1},R^{\prime}_{2})$ . Implied by the analysis above, the initial state satisfies $(2\epsilon,2\delta)$ -DP, i.e., $\mathbb{F}_{0}\sim_{(2\epsilon,2\delta)}\mathbb{F}^{\prime}_{0}$ .

We finally come to line 7-10, with $T$ invocations to the EM Mechanism and Laplace mechanism. For exponential mechanism, let $q_{j},q_{j^{\prime}}$ be the query selected after line 7. For Laplace mechanism, let $m_{i},m^{\prime}_{i}$ be the resulted value after line 9 for $\mathbf{I},\mathbf{I}^{\prime}$ respectively. As previous, we already showed that $\displaystyle{\widetilde{\Delta}\sim_{(\epsilon,\delta)}\widetilde{\Delta}^{\prime}}$ . We next note two facts for any $\widehat{\Delta}\in\textsf{supp}\left(\widetilde{\Delta}\right)\cup\textsf{supp}\left(\widetilde{\Delta^{\prime}}\right)$ :

•

$j\sim_{(\epsilon,\delta)}j^{\prime}$ ;
•

$q_{i}(J)+\textsf{STGeom}_{\frac{\epsilon}{2T},\delta,\widehat{\Delta}}(x)\sim_{(\frac{\epsilon}{2T},\delta)}q_{i}(J^{\prime})+\textsf{STGeom}_{\frac{\epsilon}{2T},\delta,\widehat{\Delta}}(x)$

The first one follows the fact that $s_{i}(J,q)-s_{i}(J^{\prime},q)\leq|q(J)-q(J^{\prime})|\leq\big{|}|J|-|J^{\prime}|\big{|}\leq LS_{\textsf{count}}(\mathbf{I})\leq SS_{\textsf{count}}(\mathbf{I})\leq\widehat{\Delta}$ holds for every query $q$ . The second one follows from the fact that $\big{|}q_{i}(J)-q(J^{\prime})\big{|}\leq|J-J^{\prime}|\leq\Delta\leq\widehat{\Delta}$ . Summing over $T$ iterations, it results in $(\epsilon,\delta)$ -DP. Thus, Algorithm 1 preserves $(3\epsilon,3\delta)$ -DP. ∎

Proof of Theorem 1.6.

Set $n=\frac{\mathsf{OUT}}{\Delta}$ and $\mathbb{D}=(\{0,1\}^{d})^{n}$ . From Theorem 1.4, let $\mathcal{Q}_{\textsf{one}}$ be the set of queries on which any $(\epsilon,\delta)$ -differentially private algorithm that takes as input a single-table database $T\in\mathbb{D}$ and output an approximate answer to each query in $\mathcal{Q}_{\textsf{one}}$ within $\ell_{\infty}$ -error $\alpha$ requires that $\alpha\geq\widetilde{\Omega}\left(\sqrt{n}\cdot(\frac{\sqrt{d}\log|\mathcal{Q}_{\textsf{one}}|}{\epsilon})^{\frac{1}{2}}\right)$ . For an arbitrary single-table database $T\in\mathbb{D}$ , we construct a multi-table instance $\mathbf{I}$ for $\mathcal{H}$ of input size $m$ , join size $\mathsf{OUT}$ , and local sensitivity $\Delta$ as follows: Without loss of generality, we pick $\mathbf{x}_{1}$ as the anchor relation to encode $T$ .

•

Set $\textsf{dom}(x)=\mathbb{D}$ if $x\in\mathbf{x}_{1}$ , and $\textsf{dom}(x)=\mathbb{Z}$ otherwise;
•

There are $\Delta^{\frac{1}{k}}$ distinct values in $\textsf{dom}(x)$ for $x\notin\mathbf{x}_{1}$ , where $k=\left|\bigcup_{i\in[m]}\mathbf{x}_{i}-\mathbf{x}_{1}\right|$ ;
•

For each data record $y\in T$ , we add a tuple $(y,y,\cdots,y)$ in $R_{1}$ ;
•

Tuples in $R_{i}$ for $i\geq 2$ form a Cartesian product of size $\Delta^{\frac{|\mathbf{x}_{i}|}{k}}$ over all attributes in $\mathbf{x}_{i}$ ;

It can be easily checked that $\mathbf{I}\in\mathbb{I}(\mathsf{OUT},\Delta)$ . From $\mathcal{Q}_{\textsf{one}}$ , we construct a set of queries over $\mathbf{I}$ as $\mathcal{Q}_{\textsf{hier}}$ . For each query $q\in\mathcal{Q}_{\textsf{one}}$ , we construct another query $q^{\prime}=(q_{1},q_{2},\cdots,q_{m})$ as follows:

•

$q_{1}:\mathbb{D}_{1}\to[0,1]$ such that $q_{1}(t)=q(y)$ if and only if $\pi_{x}t=y$ for any $x\in\textsf{dom}(x_{1})$ ;
•

$q_{i}:\mathbb{D}_{i}\to\{+1\}$ over $\mathbb{D}_{i}$ for $i\geq 2$ .

We also add the counting query $q^{\prime}=(q_{1},q_{2},\cdots,q_{m})$ with $q_{i}:\mathbb{D}_{i}\to\{+1\}$ for any $i\in[m]$ to $\mathcal{Q}_{\textsf{two}}$ .

It suffices to show that if there is a $(\epsilon,\delta)$ -differentially private algorithm that takes any multi-table instance in $\mathbb{I}(\mathsf{OUT},\Delta)$ and outputs an approximate answer to each query in $\mathcal{Q}_{\textsf{hier}}$ within error $\alpha$ , then there is a $(\epsilon,\delta)$ -differentially private algorithm that takes any single-table instance $T\in\mathbb{D}$ and outputs an approximate answer to each query in $\mathcal{Q}_{\textsf{one}}$ within error $\frac{\alpha}{\Delta}$ . The remaining argument exactly follows that of two-table case. ∎

2foreach $i\in[m]$ do

\ell_{i}\leftarrow\lceil\log(\frac{n}{\lambda}+1)\rceil

;

B^{\sigma}\leftarrow\emptyset

for each

\sigma\in[\ell]^{m}

;

4 foreach $b\in B$ do

5 foreach $i\in[m]$ do

\displaystyle{\widetilde{\textsf{deg}}_{i}(B,b)=\textsf{deg}_{i}(B,b)+\textsf{STGeom}_{\epsilon,\delta,1}(x)}

;

\sigma\leftarrow\left\langle\max\left\{0,\left\lceil\log\frac{1}{\lambda}\cdot\widetilde{\textsf{deg}}_{i}(B,b)\right\rceil\right\}:i\in[m]\right\rangle

;

B^{\sigma}\leftarrow B^{\sigma}\cup\{b\}

;

11foreach $\sigma\in[\ell]^{m}$ do

12 foreach $i\in[m]$ do

R^{\sigma}_{i}\leftarrow\{R_{i}(t):t\in\mathbb{D}_{i},\pi_{B}t\in B^{\sigma}\}

;

16return

\bigcup_{\sigma\in[\ell]^{m}}\left\{(R^{\sigma}_{1},R^{\sigma}_{2},\cdots,R^{\sigma}_{m})\right\}

;

Algorithm 8 Uniformize-Star

(\mathbf{I}=\{R_{1},R_{2},\cdots,R_{m}\})

B.0.1. Missing Proofs for the Lower Bound

Proof of Theorem 3.3.

Let $\mathcal{Q}=\{\textsf{count}\}$ . Let $\mathbf{I},\mathbf{I}^{\prime}$ be a pair of neighboring instances $\mathbf{I},\mathbf{I}^{\prime}\in\mathbb{I}(\Delta)$ such that $|\textsf{count}(\mathbf{I})-\textsf{count}(\mathbf{I}^{\prime})|\geq\Omega(\Delta)$ .

Suppose for the sake of contradiction that there is an $(\epsilon,\delta)$ -DP algorithm $\mathcal{A}$ that achieves an error $\alpha<|\textsf{count}(\mathbf{I})-\textsf{count}(\mathbf{I}^{\prime})|/2$ with probability at least 0.99. Let $\widetilde{\textsf{count}}(\mathbf{I}),\widetilde{\textsf{count}}(\mathbf{I}^{\prime})$ be the results released for join sizes of $\mathbf{I},\mathbf{I}^{\prime}$ respectively. Then, as $\mathcal{A}$ preserves $(\epsilon,\delta)$ -DP, it must be that

	$\displaystyle 0.99$	$\displaystyle\leq\Pr\left(\widetilde{\textsf{count}}(\mathbf{I})\in[\textsf{count}(\mathbf{I})-\alpha,\textsf{count}(\mathbf{I})+\alpha]\right)$
		$\displaystyle\leq e^{\epsilon}\cdot\Pr\left(\widetilde{\textsf{count}}(\mathbf{I}^{\prime})\in[\textsf{count}(\mathbf{I})-\alpha,\textsf{count}(\mathbf{I})+\alpha]\right)+\delta$
		$\displaystyle<e^{\epsilon}\cdot\left(1-\Pr\left(\widetilde{\textsf{count}}(\mathbf{I}^{\prime})\in[\textsf{count}(\mathbf{I}^{\prime})-\alpha,\textsf{count}(\mathbf{I}^{\prime})+\alpha]\right)\right)+\delta$
		$\displaystyle\leq e^{\epsilon}\cdot 0.1+\delta<0.99,$

a contradiction. ∎

B.1. Missing Proofs in Section 4

example 0.

Consider an example that $|J^{i}|=\frac{|J|}{\ell}$ for each $i\in[\ell]$ . Algorithm 1 achieves an error as $\alpha=O\left(\sqrt{|J|}\cdot\sqrt{\Delta}+\Delta\right)$ , while Algorithm 8 achieves an error as

\alpha=\sum_{i\in[\ell]}\sqrt{\frac{|J|}{\ell}}\cdot\sqrt{2}^{i}+\sum_{i\in[\ell]}\Delta_{i}=\sqrt{\frac{|J|}{\ell}}\cdot\sqrt{2}^{\ell+1}+\Delta=O\left(\sqrt{\frac{|J|}{\ell}}\cdot\sqrt{\Delta}+\Delta\right)

where we can save a factor of $\sqrt{\ell}=O(\sqrt{\log n})$ .

example 0.

Consider an example that $|J^{i}|\cdot\Delta_{i}=\Delta^{2}$ for each $i\in[\ell]$ , by setting $|B^{i}|=4^{\ell-i}$ and $|J^{i}|=4^{\ell-i}\cdot 2^{i}$ . The total number of join results is $|J|=\sum_{i\in[\ell]}|J^{i}|=\sum_{i\in[\ell]}4^{\ell-\frac{i}{2}}\leq 4^{\ell}=\Delta^{2}$ . By setting $\Delta=\sqrt{n}/\log n$ , the total number of tuples is at most $\sum_{i\in[\ell]}2^{i}\cdot 4^{\ell-i}=4^{\ell}\cdot\sum_{i\in[\ell]}4^{-i}\leq\Delta^{2}=n$ . Algorithm 1 achieves an error as $\alpha=O(\Delta\cdot\sqrt{\Delta})$ , while Algorithm 8 achieves an error as

\alpha=\sum_{i\in[\ell]}\sqrt{4^{\ell-i}\cdot 2^{i}}\cdot\sqrt{2^{i}}+\Delta=2^{\ell}\cdot\ell+\Delta=O(\Delta\cdot\log n)

where we can save a factor of $O(\sqrt{\Delta})=O(n^{1/4})$ .

Proof of Lemma 4.1.

Consider two neighboring instances $\mathbf{I}=(R_{1},R_{2})$ and $\mathbf{I}^{\prime}=(R^{\prime}_{1},R^{\prime}_{2})$ . Let $\sigma,\sigma^{\prime}:\textsf{dom}(B)\to[\ell]$ be the partitions of $\textsf{dom}(B)$ for $\mathbf{I},\mathbf{I}^{\prime}$ respectively. By definition, $\sigma(b)=i$ if and only if $b\in B^{i}$ ; $\sigma^{\prime}$ is defined similarly. Note that $|\textsf{deg}_{i}(B,b)-\textsf{deg}^{\prime}_{i}(B,b)|\leq 1$ holds for any $i\in\{1,2\}$ and $b\in\textsf{dom}(B)$ . Hence, $\sigma(b)\sim_{(\epsilon,\delta)}\sigma^{\prime}(b)$ . As the degrees for different $b$ ’s are calculated on disjoint parts of input data, then $\sigma\sim_{(\epsilon,\delta)}\sigma^{\prime}$ from the parallel composition rule.

Let $\mathbb{F}^{i},\mathbb{F}^{\prime i}$ be the synthetic dataset generated for $(R^{i}_{1},R^{i}_{2}),(R^{\prime i}_{1},R^{\prime i}_{2})$ respectively. Implied by Lemma 3.1, $\mathbb{F}^{i}\sim_{(3\epsilon,3\delta)}\mathbb{F}^{\prime i}$ for each $i\in[\ell]$ . Note that $\{(R^{i}_{1},R^{i}_{2}):i\in[\ell]\}$ forms a disjoint partition of $(R_{1},R_{2})$ . By the parallel composition rule, $\bigcup_{i\in[\ell]}\mathbb{F}^{i}\sim_{(3\epsilon,3\delta)}\bigcup_{i\in[\ell]}\mathbb{F}^{\prime i}$ . Putting everything together, we conclude that Algorithm 8 preserves $(4\epsilon,4\delta)$ -DP implied by the sequential composition rule. ∎

Proof of Theorem 4.3.

Given an input instance $\mathbf{I}$ over the two-table query, let $\pi_{1}=\left\{B^{0}_{1},B^{1}_{1},\cdots,B_{1}^{\lceil\log(\frac{n}{\lambda}+1)\rceil}\right\}$ , be a fixed partition of $\textsf{dom}(B)$ , such that $b\in B^{i}_{1}$ if and only of

\max\{\textsf{deg}_{1}(B,b),\textsf{deg}_{2}(B,b)\}\in(\lambda\cdot 2^{i-1},\lambda\cdot 2^{i}].

Let $\pi_{2}=\left\{B^{0}_{2},B^{1}_{2},\cdots,B_{2}^{\lceil\log(\frac{n}{\lambda}+1)\rceil}\right\}$ be a random partition of $\textsf{dom}(B)$ by Algorithm 5. Note that $b\in B^{i}_{2}$ happens only if

\max\{\textsf{deg}_{1}(B,b),\textsf{deg}_{2}(B,b)\}\in(\lambda\cdot 2^{i-1}-\lambda,\lambda\cdot 2^{i}].

It is easy to see that $B^{i+1}_{1}\subseteq B^{i}_{2}\cup B^{i+1}_{2}$ . For simplicity, we denote the number of join results participated by values in $B^{i}_{1}\cap B^{i}_{2}$ as $x_{i}$ and the number of join results participated by values in $B^{i}_{1}-B^{i}_{2}$ as $y_{i}$ . Then the cost of Algorithm 8 under partition $\pi_{2}$ is

		$\displaystyle\sum_{i\in[\lceil\log(\frac{n}{\lambda}+1)\rceil]}\sqrt{x_{i}+y_{i-1}}\cdot\sqrt{\lambda\cdot 2^{i}}$
	$\displaystyle\leq$	$\displaystyle\sqrt{\lambda}\cdot\sum_{i\in[\lceil\log(\frac{n}{\lambda}+1)\rceil]}(\sqrt{x_{i}}+\sqrt{y_{i-1}})\cdot\sqrt{2^{i}}$
	$\displaystyle\leq$	$\displaystyle\sqrt{\lambda}\cdot\left(\sum_{i\in[\lceil\log(\frac{n}{\lambda}+1)\rceil]}\sqrt{x_{i}}\cdot\sqrt{2^{i}}+\sum_{i\in\lceil\log\frac{(\frac{n}{\lambda}+1)}{\lambda}\rceil}\sqrt{y_{i-1}}\cdot\sqrt{2^{i}}\right)$
	$\displaystyle\leq$	$\displaystyle\sqrt{2}\cdot\sqrt{\lambda}\cdot\left(\sum_{i\in[\lceil\log(\frac{n}{\lambda}+1)\rceil]}\sqrt{x_{i}}\cdot\sqrt{2^{i}}+\sum_{i\in\lceil\log\frac{(\frac{n}{\lambda}+1)}{\lambda}\rceil}\sqrt{y_{i-1}}\cdot\sqrt{2^{i-1}}\right)$
	$\displaystyle=$	$\displaystyle\sqrt{2}\cdot\sqrt{\lambda}\cdot\left(\sum_{i\in[\lceil\log(\frac{n}{\lambda}+1)\rceil]}(\sqrt{x_{i}}+\sqrt{y_{i}})\cdot\sqrt{2^{i}}\right)$
	$\displaystyle\leq$	$\displaystyle\sum_{i\in[\lceil\log(\frac{n}{\lambda}+1)\rceil]}\sqrt{x_{i}+y_{i}}\cdot\sqrt{2^{i}\cdot\lambda}$

thus can be bounded by the cost of $\mathcal{A}$ under the fixed uniform partition of $\textsf{dom}(B)$ . ∎

Proof of Theorem 4.4.

Let $\mathbb{I}(\mathsf{OUT}_{i},i)$ be the set of two-table instances with output size $\mathsf{OUT}_{i}$ and every join value has maximum degree falling into $(\lambda\cdot 2^{i-1},\lambda\cdot 2^{i}]$ . Our proof consists of two steps:

•

Step (1): There exists a family of queries $\mathcal{Q}_{i}$ such that any $(\epsilon,\delta)$ -algorithm that takes as input an instance from $\mathbb{I}(\mathsf{OUT}_{i},i)$ and outputs an approximate answer to each query in $\mathcal{Q}_{i}$ within error $\alpha^{i}$ must require $\displaystyle{\alpha_{i}\geq\Omega\left(\sqrt{\mathsf{OUT}_{i}}\cdot\sqrt{2^{i}\cdot\lambda}\right)}$ .
•

Step (2): There exists a family of queries $\mathcal{Q}$ such that any $(\epsilon,\delta)$ -algorithm that takes as input an instance $\mathbf{I}(\overrightarrow{\mathsf{OUT}})$ and outputs an approximate answer to each query in $\mathcal{Q}$ within error $\alpha$ must require $\displaystyle{\alpha\geq\Omega\left(\max_{i}\sqrt{\mathsf{OUT}_{i}}\cdot\sqrt{2^{i}\cdot\lambda}\right)}$ .

Let’s first focus on step (1) with arbitrary $i\in[\lceil\log(\frac{n}{\lambda})\rceil]$ . Let $n_{i}$ be an arbitrary integer such that $n_{i}\cdot\lambda\cdot 2^{i-1}\leq\mathsf{OUT}_{i}\leq n_{i}\cdot\lambda\cdot 2^{i}$ holds. From Theorem 1.4, let $\mathcal{Q}^{i}_{\textsf{one}}$ be the set of hard queries on which any $(\epsilon,\delta)$ -differentially private algorithm takes as input any single table $T^{i}\in(\{0,1\}^{d})^{n_{i}}$ , and outputs an approximate answer to each query in $\mathcal{Q}^{i}_{\textsf{one}}$ within error $\alpha$ must require $\alpha\geq\widetilde{\Omega}\left(\sqrt{n_{i}}\cdot(\frac{\sqrt{d}\cdot\log|\mathcal{Q}^{i}_{\textsf{one}}|}{\epsilon})^{\frac{1}{2}}\right)$ . For an arbitrary single table $T^{i}\in(\{0,1\}^{d})^{n_{i}}$ , we can construct a two-table instance $(R^{i}_{1},R^{i}_{2})$ as follows:

•

Set $\textsf{dom}(A)=\mathbb{D}$ , $\textsf{dom}(B)=\mathbb{Z}$ and $\textsf{dom}(C)=\mathbb{Z}$ ;
•

Set $\mathbb{D}_{1}=\textsf{dom}(A)\times\textsf{dom}(B)$ and $\mathbb{D}_{2}=\textsf{dom}(B)\times\textsf{dom}(C)$ ;
•

Tuples in $R^{i}_{1}$ form a Cartesian product of size $n_{i}\times 1$ over $\textsf{dom}(A)\times\textsf{dom}(B)$ ;
•

Tuples in $R^{i}_{2}$ form a Cartesian product of sizes $1\times\frac{\mathsf{OUT}_{i}}{n_{i}}$ over $\textsf{dom}(B)\times\textsf{dom}(C)$ ;
•

For each of the $n$ records in $T^{i}$ , say $x$ , we add a tuple $t\in R_{1}$ and set $\pi_{A}t=x$ .

It can be easily checked that $\mathbf{I}\in\mathbb{I}(\mathsf{OUT}_{i},i)$ . From Theorem 1.4, let $\mathcal{Q}^{i}_{\textsf{two}}$ be the set of all linear queries over $\mathbb{D}^{i}_{1}\times\mathbb{D}^{i}_{2}$ . For each query $q\in\mathcal{Q}^{i}_{\textsf{one}}$ , we construct another query $q^{\prime}=(q_{1},q_{2})\in\mathcal{Q}^{i}_{\textsf{two}}$ such that:

•

$q_{1}:\mathbb{D}^{i}_{1}\to[0,1]$ such that $q_{1}(t)=q(x)$ if and only if $\pi_{A}t=x$ ;
•

$q_{2}:\mathbb{D}^{i}_{2}\to\{+1\}$ .

We borrow the similar argument from Theorem LABEL:the:lb-two-table, showing that an $(\epsilon,\delta)$ -algorithm that takes a two-table instance in $\mathbb{I}(\mathsf{OUT}_{i},i)$ and outputs an approximate answer to each query in $\mathcal{Q}^{i}_{\textsf{two}}$ within error $\alpha$ , there exists an $(\epsilon,\delta)$ -algorithm that takes an arbitrary single table $T_{i}\in(\{0,1\})^{n_{i}}$ , and outputs an approximate answer to each query in $\mathcal{Q}^{i}_{\textsf{one}}$ within error $\frac{\alpha^{i}}{\lambda\cdot 2^{i}}$ . As $\frac{\alpha^{i}}{\lambda\cdot 2^{i}}\geq\sqrt{n_{i}}$ , $\alpha^{i}\geq\sqrt{\mathsf{OUT}_{i}}\cdot\sqrt{2^{i}}\cdot\sqrt{\lambda}$ .

Step (2).

From Theorem 1.4, let $\mathcal{Q}_{\textsf{two}}$ be the family of linear queries over $\mathbb{D}_{1}\times\mathbb{D}_{2}$ , such that $\displaystyle{\mathcal{Q}_{\textsf{two}}=\left\{\cup_{i\in[m]}q_{i}:q_{i}\in\mathcal{Q}^{i}_{\textsf{two}},\forall i\in[\lceil\log n\rceil]\right\}}$ . Consider an arbitrary $(\epsilon,\delta)$ -differentially private algorithm $\mathcal{A}$ takes as input in $\mathbb{I}(\overrightarrow{\mathsf{OUT}})$ and outputs an approximate answer to each query in $\mathcal{Q}_{\textsf{two}}$ within error $\alpha$ . If $\alpha\leq\max_{i}\sqrt{\mathsf{OUT}_{i}}\cdot\sqrt{2^{i}\cdot\lambda}$ , there must exist an $(\epsilon,\delta)$ -differentially private algorithm that takes an input any instance from $\mathbb{I}(\overrightarrow{\mathsf{OUT}})$ and output an approximate answer to each query in $\mathcal{Q}^{i}_{\textsf{two}}$ , for some $i$ , contradicting to Step (1). ∎

In the attribute tree $\mathcal{T}$ , if there exists an attribute with only one child, say $u$ with child $v$ , we know that $u,v$ always appear together in all relations, then they can be simply considered as one “combined” attribute. Hence, it is safe to assume that every non-leaf attribute has at least two children in $\mathcal{T}$ .

Proof of Lemma 4.10.

We will prove the first two properties by induction. In the base case when $\mathbb{I}$ only contains one instance, these two properties hold trivially. Consider an arbitrary iteration of while-loop in Algorithm 6. By hypothesis, all sub-instances in $\mathbb{I}$ have disjoint join results, and the union of join results of all sub-instances in $\mathbb{I}$ is $\Join_{i\in[m]}R_{i}$ . Then, it suffices to show that PartitionByAttr $(\mathbf{I},x)$ generates a partition of $\mathbf{I}$ as

\left\{\mathbf{I}^{i}=\left(\bigcup_{j\in E_{x}}R^{i}_{j}\right)\cup\left(\bigcup_{j\notin E_{x}}R_{j}\right):i\in\left[\left\lceil\log\left(\frac{n}{\lambda}+1\right)\right\rceil\right]\right\},

such that all $\mathbf{I}^{i}$ ’s have disjoint join results, and the union of join results of all $\mathbf{I}^{i}$ ’s is the join result of $\mathbf{I}$ .

Consider an arbitrary join result $t=(t_{1},t_{2},\cdots,t_{m})$ of $\mathbf{I}$ , where $t_{j}$ is projection of $t$ onto $\mathbf{x}_{j}$ . For any $j\notin E_{x}$ , $t_{j}\in R_{j}$ . Let $\vec{y}\in\textsf{dom}(\mathbf{y})$ be the tuple such that $\pi_{\mathbf{y}}t_{j}=\vec{y}$ holds for every $j\in E_{x}$ . Denote $i^{*}=\lceil\log\frac{1}{\lambda}\cdot\widetilde{\textsf{deg}}_{E_{x}}(\mathbf{y},\vec{y})\rceil$ . Then, for any $j\in E_{x}$ , $t_{j}\in R^{i}_{j}$ if and only if $i=i^{*}$ by line 9. Thus, $t\in\left(\Join_{j\in E_{x}}R^{i^{*}}_{j}\right)\Join\left(\Join_{j\notin E_{x}}R_{j}\right)$ , but $t\notin\left(\Join_{j\in E_{x}}R^{i}_{j}\right)\Join\left(\Join_{j\notin E_{x}}R_{j}\right)$ for any $i\neq i^{*}$ .

For the third property, we actually show that each input tuple $t\in R_{i}$ appears in $O(\log^{c}(\frac{n}{\lambda}+1))$ sub-instances, where $c=\sum_{x\in\mathbf{x}}|E_{x}|$ is the total number of appearance of join attributes in all relations. In an invocation of PartitionHier $(\mathbf{I},x)$ , tuple $t\in R_{i}$ appears in $O\left(\log(\frac{n}{\lambda}+1)\right)$ sub-instances if $i\notin E_{x}$ , and only one sub-instance otherwise. Overall, the procedure PartitionByAttr will be invoked on every non-leaf attribute in $\mathcal{T}$ , thus every input tuple $t\in R_{i}$ appears in $O\left(\prod_{x\in\mathbf{x}}\log(\frac{n}{\lambda}+1)\right)=O\left(\log^{|\mathbf{x}|}(\frac{n}{\lambda}+1)\right)$ sub-instances.

At last, we first show each sub-instance corresponds to a degree characterization. Let’s focus on an arbitrary single relation $\mathbf{x}_{j}$ in Algorithm 6, which is partitioned by attributes lying on the path corresponding to $\mathbf{x}_{j}$ in $\mathcal{T}$ in a bottom-up way, say $\langle x_{1},x_{2},\cdots,x_{k}(=r)\rangle$ . When PartitionByAttr $(\mathbf{I},x_{1})$ is invoked, the sub-instance $\mathbf{I}^{i}=\{R^{i}_{j}:j\in E_{x_{1}}\}\cup\{R_{j}:j\notin E_{x_{1}}\}$ must have $\sigma(E_{x_{1}},\mathbf{y})=i$ with

\widetilde{\textsf{deg}}^{\mathbf{I}^{i}}_{E_{x_{1}}}(\mathbf{y},\vec{y})\in\left(2^{\sigma(E_{x_{1}},\mathbf{y})-1},2^{\sigma(E_{x_{1}},\mathbf{y})}\right]

holds for any $\vec{y}\in\textsf{dom}(\mathbf{y})$ with $\textsf{deg}^{\mathbf{I}^{i}}_{E_{x_{1}}}(\mathbf{y},\vec{y})>0$ . In the subsequent partitions on $\mathbf{I}^{\prime}$ , we claim that for each $\vec{y}\in\textsf{dom}(x_{1})$ , the set of tuples with the same value $\vec{y}\in\textsf{dom}(x_{1})$ always appear together. Suppose PartitionByAttr $(\mathbf{I}^{\prime},x_{\ell})$ is invoked with any $\ell\geq 2$ . Observe that for any $t\in R^{i}_{j}$ with $\pi_{\mathbf{y}_{1}}t=\vec{y}$ , $\pi_{\mathbf{y}_{\ell}}t=\pi_{\mathbf{y}_{\ell}}\vec{y}$ always holds since $\mathbf{y}_{\ell}\subseteq\mathbf{y}_{1}$ , where $\mathbf{y}_{1}=\{x_{2},x_{3},\cdots,x_{k}\}$ and $\mathbf{y}_{\ell}=\{x_{\ell+1},x_{\ell+2},\cdots,x_{k}\}$ . By line 9 in Algorithm 7, tuples with the same value $\vec{y}\in\textsf{dom}(x_{1})$ always falls into the same sub-relation. In other words, $\textsf{deg}_{E_{x_{1}}}(\mathbf{y}_{1},\vec{y})$ remains the same in subsequent partitions, for any $\vec{y}$ with $\textsf{deg}_{E_{x_{1}}}(\mathbf{y}_{1},\vec{y})>0$ . This argument can also generalized to any attribute $x$ .

Now, let’s interpret the partition process as a directed tree, such that $\mathbf{I}^{\prime}$ has sub-instances in $\textsc{PartitionByAttr}(\mathbf{I}^{\prime},x)$ as its children. Each sub-instance returned by Algorithm 6 corresponds to a distinct root-to-leaf path in this directed tree. Consider an arbitrary sub-instance $\mathbf{I}^{\prime\prime}$ corresponding to the root-to-leaf path $\langle\mathbf{I}_{1}(=\mathbf{I}),\mathbf{I}_{2},\cdots,\mathbf{I}_{|\mathbf{x}|}(=\mathbf{I}^{\prime\prime})\rangle$ . When each instance goes through an invocation of PartitionByAttr $(\mathbf{I}_{j},x)$ , a distinct value $\sigma(E_{x},\mathbf{y})$ is set up for each of $\mathbf{I}_{j}$ ’s children. As proved above, this degree value will be preserved in any $\mathbf{I}_{k}$ for $k\geq j$ . As each instance on this path corresponds to one invocation of PartitionByAttr on a distinct attribute, all possible values of $\sigma(\cdot,\cdot)$ will be set up well for $\mathbf{I}^{\prime\prime}$ . Thus, each sub-instance returned by Algorithm 6 corresponds to a degree characterization.

Moreover, for any pair of sub-instances returned by Algorithm 6, we can find the lowest common ancestor of these two corresponding root-to-leaf paths, say with an invocation of PartitionByAttr on $x$ . So, these two sub-instances must have different degree values over attributes $\mathbf{y}=\{y\in\mathbf{x}:E_{x}\subsetneq E_{y}\}$ in relations from $E_{x}$ . As proved above, the difference will be preserved in these two sub-instances. This argument can be applied to any pair of sub-instances returned by Algorithm 6, thus each sub-instances corresponds to a distinct degree characterization. ∎

In an attribute tree $\T$ rooted at $r$ , let $\textsf{path}(r,x)$ be the sequence of attributes lying on the path from $r$ to $x$ .

Proof of Lemma 4.7.

We will prove it through the following four steps:

•

Step 1: $\mathbf{y}^{\prime}=\{x\in\mathbf{x}:\exists i\in E^{\prime},j\notin E^{\prime},x\in\mathbf{x}_{i}\cap\mathbf{x}_{j}\}$ ;
•

Step 2: $\mathbf{y}^{\prime}\subsetneq(\cap_{i\in E^{\prime}}\mathbf{x}_{i})$ ;
•

Step 3: there exists no $j\notin E^{\prime}$ such that $(\cap_{i\in E^{\prime}}\mathbf{x}_{i})\subseteq\mathbf{x}_{j}$ .
•

Step 4: If $x$ is the node such that $\cap_{i\in E^{\prime}}\mathbf{x}_{i}=\textsf{path}(r,x)$ , let $x^{\prime}\in\textsf{path}(r,x)$ be the lowest ancestor of $x$ with more than one child. Then, $E^{\prime}=E_{x}$ and $\mathbf{y}^{\prime}=\textsf{path}(r,x^{\prime})$ .

Step 1. It suffices to show that the first property holds for every invocation of $T_{E,\mathbf{y}}$ . First, it holds trivially for $T_{E,\partial E}$ , which follows the definition of partial attributes $\partial E$ . By hypothesis, we assume it holds for some $T_{E,\mathbf{y}}$ . Next, we show that it is also preserved in these two recursive rules applied to $T_{E,\mathbf{y}}$ . We distinguish two cases.

•

$\bar{H}_{E}$ is disconnected. Consider an arbitrary connected component $E^{\prime\prime}\in\mathcal{C}_{E}$ . For any pair of $i\in E^{\prime\prime}$ and $j\notin E$ , $\mathbf{x}_{i}\cap\mathbf{x}_{j}\in\mathbf{y}$ is implied by the hypothesis. For any pair of $i\in E^{\prime\prime}$ and $j\in E-E^{\prime\prime}$ , $\mathbf{x}_{i}\cap\mathbf{x}_{j}\in\mathbf{y}$ is implied by the disconnect fact of $\bar{H}_{E}$ . Together, for any pair of $i\in E^{\prime\prime}$ and $j\notin E^{\prime\prime}$ , we have $\mathbf{x}_{i}\cap\mathbf{x}_{j}\in\mathbf{y}$ . Hence, for each $E^{\prime\prime}\in\mathcal{C}_{E}$ , $T_{E^{\prime\prime},\mathbf{y}\cap(\cup_{i\in E^{\prime\prime}}\mathbf{x}_{i})}$ also satisfies this property.
•

$\bar{H}_{E}$ is connected. We note that every attribute in $\cup_{i\in E}\mathbf{x}_{i}-\mathbf{y}$ only appears in relations from $E$ by hypothesis. As $\mathbf{y}\subseteq(\cap_{i\in E}\mathbf{x}_{i})$ , every attribute in $\cup_{i\in E}\mathbf{x}_{i}-(\cap_{i\in E}\mathbf{x}_{i})$ only appears in relations from $E$ . In other words, if there exists a pair $i\in E$ and $j\notin E$ , $\mathbf{x}_{i}\cap\mathbf{x}_{j}\in(\cap_{i\in E}\mathbf{x}_{i})$ . Hence, $T_{E,(\cap_{i\in E}\mathbf{x}_{i})}$ also satisfies this property.

Step 2. With the first property, we next prove the second property. From Definition 4.6, $\mathbf{y}\subseteq(\cap_{i\in E}\mathbf{x}_{i})$ . In the base case when $\textsf{deg}_{E}(\mathbf{y},\cdot)$ is invoked, we must have $\mathbf{y}\neq\mathbf{x}_{i}$ , thus $\mathbf{y}\subsetneq\mathbf{x}_{i}$ . In the recursive step when $\bar{H}_{E}$ is connected, we already show $\mathbf{y}\subsetneq(\cap_{i\in E}\mathbf{x}_{i})$ . Overall, when $\textsf{deg}_{E}(\mathbf{y},\cdot)$ is invoked, we must have $\mathbf{y}\subsetneq(\cap_{i\in E}\mathbf{x}_{i})$ .

Step 3. Then, we come to the third property. Suppose there exist some $j\in E$ such that $(\cap_{i\in E}\mathbf{x}_{i})\subseteq\mathbf{x}_{j}$ . Implied by the first property, $(\cap_{i\in E}\mathbf{x}_{i})\subseteq\mathbf{y}$ , coming to a contradiction of the second property.

Step 4. Finally, we come to the last step. As $H$ is hierarchical, any subset of relations from $E\subseteq[m]$ also forms a hierarchical join. Attributes in $\cap_{i\in E}\mathbf{x}_{i}$ form a root-to-node path in the attribute forest of $H$ . Together with the third property, if $\textsf{path}(r,x)$ equals to the intersection of relations in $E^{\prime}$ , then $E^{\prime}=E_{x}$ . Moreover, we note that $\mathbf{y}^{\prime}$ is also a root-to-node sub-path of $\textsf{path}(r,x)$ . Suppose $x^{\prime}\notin\mathbf{y}^{\prime}$ . Then, we can always find $j\in[m]-E_{x}$ , since $x^{\prime}$ has at least one child which is not an ancestor of $x$ . As $E^{\prime}=E_{x}$ , we have $j\notin E^{\prime}$ . By definition, $\mathbf{x}^{\prime}\in\mathbf{y}^{\prime}$ , coming to a contradiction. Under the assumption that every non-leaf attribute in $\mathcal{T}$ has at least two children, it is safe to replace $x^{\prime}$ with the parent of $x$ . ∎

Proof of Lemma 4.11.

Consider two neighboring instances $\mathbf{I}$ and $\mathbf{I}^{\prime}$ . Note that $|\textsf{deg}^{\mathbf{I}}_{E}(\mathbf{y},\vec{y})-\textsf{deg}^{\mathbf{I}^{\prime}}_{E}(\mathbf{y},\vec{y})|\leq 1$ holds for any $E\subseteq[m]$ , $\mathbf{y}\subseteq(\cap_{i\in E}\mathbf{x}_{i})$ , and $\vec{y}\in\textsf{dom}(\mathbf{y})$ . This is not hard to show since change one tuple can change at most one tuple in $\Join_{i\in E}R_{i}$ as well as one tuple in its projection onto attributes $\cap_{i\in[E]}\mathbf{x}_{i}$ , thus at most one tuple in $\left\{t\in\pi_{\cap_{i\in[E]}\mathbf{x}_{i}}\Join_{i\in E}R_{i}:\pi_{\mathbf{y}}t=\vec{y}\right\}$ . Let $\langle x_{1},x_{2},\cdots,x_{k}(=r)\rangle$ be the root-to-leaf path corresponding to $\mathbf{x}_{i}$ . Each input tuple $t\in R_{i}(\mathbf{x}_{i})$ contributes to at most $|\mathbf{x}_{i}|$ degree functions, i.e., $\textsf{deg}_{E_{x_{i}}}(\mathbf{y},\pi_{\mathbf{y}}t)$ with $\mathbf{y}=\{x_{1},x_{2},\cdots,x_{j-1}\}$ , for any $j\in[k]$ . Hence, $\widetilde{\textsf{deg}}\sim_{(c^{\prime}\cdot\epsilon,c^{\prime}\cdot\delta)}\widetilde{\textsf{deg}}^{\prime}$ , where $c^{\prime}=\max_{i\in[m]}|\mathbf{x}_{i}|$ is a query-dependent quantity.

Moreover, for each sub-instance $\mathbf{I}^{\sigma}$ returned, $\mathbb{F}^{\sigma}\sim_{(3\epsilon,3\delta)}\mathbb{F}^{\prime\sigma}$ implied by Lemma 3.1. As each tuple participates in at most $O(\log^{c}(\frac{n}{\lambda}+1))$ sub-instances, $\bigcup_{\sigma}\mathbb{F}^{\sigma}\sim_{(3\log^{c}(\frac{n}{\lambda}+1)\cdot\epsilon,3\log^{c}(\frac{n}{\lambda}+1)\cdot\delta)}\bigcup_{\sigma}\mathbb{F}^{\prime\sigma}$ implied by the group privacy. Putting everything together, we conclude that Algorithm 8 preserves $(O(\log^{c}n\cdot\epsilon),O(\log^{c}n\cdot\delta))$ -DP implied by the sequential composition rule. ∎

Note that for arbitrary $T_{E}$ , adding noisy degree can only enlarge it by a factor of $\sqrt{2}^{|\mathbf{x}|}$ . Let $\mathbf{y}\subseteq\mathbf{x}$ be the set of join attributes, and $\pi_{\mathbf{y}}J=\{t_{1},t_{2},\cdots,t_{k}\}$ . We consider the most fine-grained partition of $J$ such that $J_{1},J_{2},\cdots,J_{k}$ , where $J_{i}$ corresponds to the set of all join results participated by $t_{i}\in\textsf{dom}(\mathbf{y})$ . Let $\pi(i)$ be the degree characterization where $J_{i}$ falls into with true degrees, and $\widetilde{\pi}(i)$ be the degree characterization where $J_{i}$ falls into with noisy degrees by Algorithm 7. The cost of Algorithm 4 on hierarchical joins is

		$\displaystyle\sum_{\sigma}\sqrt{\sum_{\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}$
	$\displaystyle=$	$\displaystyle\sum_{\sigma}\sqrt{\sum_{i:\pi(i)=\widetilde{\pi}(i)=\sigma}\|J_{i}\|+\sum_{i:\pi(i)\neq\sigma,\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}$
	$\displaystyle\leq$	$\displaystyle\sum_{\sigma}\sqrt{\sum_{i:\pi(i)=\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}+\sum_{\sigma}\sqrt{\sum_{i:\pi(i)\neq\sigma,\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}$
	$\displaystyle\leq$	$\displaystyle\sum_{\sigma}\sqrt{\sum_{i:\pi(i)=\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}$
	$\displaystyle+$	$\displaystyle\sum_{\sigma}\sum_{\sigma^{\prime}:\sigma^{\prime}\neq\sigma}\sqrt{\sum_{i:\pi(i)=\sigma^{\prime},\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma^{\prime}}_{\textsf{count}}}\cdot\sqrt{2}^{\|x\|}$
	$\displaystyle=$	$\displaystyle\sum_{\sigma}\left(\sqrt{\sum_{i:\pi(i)=\widetilde{\pi}(i)=\sigma}\|J_{i}\|}+\sum_{\sigma^{\prime}:\sigma^{\prime}\neq\sigma}\sqrt{\sum_{i:\pi(i)=\sigma,\widetilde{\pi}(i)=\sigma^{\prime}}\|J_{i}\|}\right)\cdot\sqrt{SS^{\sigma^{\prime}}_{\textsf{count}}}$
	$\displaystyle\leq$	$\displaystyle\sum_{\sigma}\sqrt{\sum_{i:\pi(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}\cdot\frac{1}{2^{\|\mathbf{x}\|}}$

where the last inequality is implied by the fact that for any $\sigma$ , the number of $\sigma^{\prime}$ such that some $J_{i}$ could have $\pi(i)=\sigma$ and $\pi(i)=\sigma^{\prime}$ is at most $O(2^{|\mathbf{x}|})$ .

Proof of Theorem 4.13.

Let $\mathbb{I}(\mathsf{OUT}_{\sigma},\sigma)$ be the set of two-table instances with output size $\mathsf{OUT}_{\sigma}$ and following the degree characterization $\sigma$ . Our proof consists of two steps:

•

Step (1): There exists a family of queries $\mathcal{Q}_{\sigma}$ such that any $(\epsilon,\delta)$ -algorithm that takes as input an instance from $\mathbb{I}(\mathsf{OUT}_{\sigma},\sigma)$ and outputs an approximate answer to each query in $\mathcal{Q}_{\sigma}$ within error $\alpha^{\sigma}$ must require $\displaystyle{\alpha^{\sigma}\geq\Omega\left(\sqrt{\mathsf{OUT}_{\sigma}}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}\right)}$ .
•

Step (2): There exists a family of queries $\mathcal{Q}$ such that any $(\epsilon,\delta)$ -algorithm that takes as input an instance $\mathbf{I}(\overrightarrow{\mathsf{OUT}})$ and outputs an approximate answer to each query in $\mathcal{Q}$ within error $\alpha$ must require $\displaystyle{\alpha\geq\Omega\left(\max_{\sigma}\sqrt{\mathsf{OUT}_{\sigma}}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}\right)}$ .

Let’s first focus on step (1) with arbitrary $\sigma$ . Assume $LS^{\sigma}_{\textsf{count}}$ is achieved on some $i^{*}\in[m]$ , i.e., $LS^{\sigma}_{\textsf{count}}=T^{\sigma}_{[m]-\{i^{*}\}}$ . We compute the value of $T^{\sigma}_{[m]-\{i^{*}\}}$ and set $n_{\sigma}=\frac{\mathsf{OUT}_{\sigma}}{T_{[m]-\{i\}}}$ . From Theorem 1.4, let $\mathcal{Q}^{\sigma}_{\textsf{one}}$ be the set of hard queries on which any $(\epsilon,\delta)$ -differentially private algorithm takes as input any single table $R_{\sigma}\in(\{0,1\}^{d})^{n_{\sigma}}$ , and outputs an approximate answer to each query in $\mathcal{Q}^{\sigma}_{\textsf{one}}$ within error $\alpha$ must require $\alpha\geq\widetilde{\Omega}\left(\sqrt{n_{\sigma}}\cdot(\frac{\sqrt{d}\cdot\log|\mathcal{Q}^{\sigma}_{\textsf{one}}|}{\epsilon})^{\frac{1}{2}}\right)$ . For an arbitrary single table $T^{\sigma}\in(\{0,1\}^{d})^{n_{\sigma}}$ , we can construct an instance $\mathbf{I}^{\sigma}$ as follows:

•

Set $\mathbb{D}^{\sigma}_{j}=\prod_{x\in\mathbf{x}_{j}}\textsf{dom}(x)$ for any $j\in[m]$ ;
•

When visiting attributes of $\mathcal{T}$ in a bottom-up way, say $x$ , we set $|\textsf{dom}(x)|=\sigma(E_{x},\mathbf{y})$ where $\mathbf{y}$ is the set of attributes lying on the path from $r$ to $x$ ’s parent in $\mathcal{T}$ ;
•

For the root $r$ , set $\displaystyle{|\textsf{dom}(r)|=\frac{\mathsf{OUT}_{\sigma}}{\prod_{x\in\mathbf{x}-\{r\}}\textsf{dom}(x)}}$ ;
•

Every relation is a Cartesian product over its own attributes;
•

For each data record $x\in R_{\sigma}$ , we exclusively pick a tuple $t\in R_{i^{*}}$ and attach $x$ to it.

It can be easily checked that $\mathbf{I}\in\mathbb{I}(\mathsf{OUT}_{\sigma},\sigma)$ , and $|R_{i^{*}}|=n_{\sigma}$ . From Theorem 1.4, let $\mathcal{Q}^{\sigma}_{\textsf{hier}}$ be the set of all linear queries over $\mathbb{D}^{\sigma}_{1}\times\mathbb{D}^{\sigma}_{2}\times\cdots\times\mathbb{D}^{\sigma}_{m}$ . For each query $q\in\mathcal{Q}^{\sigma}_{\textsf{one}}$ , we construct another query $q^{\prime}=(q_{1},q_{2},\cdots,q_{m})\in\mathcal{Q}^{\sigma}_{\textsf{hier}}$ such that:

•

$q_{i}:\mathbb{D}^{\sigma}_{i}\to[-1,1]$ such that $q_{i}(t)=q(x)$ if and only if $x$ is attached to $t$ ;
•

$q_{j}:\mathbb{D}^{\sigma}_{j}\to\{+1\}$ for $j\in[m]-\{i^{*}\}$ .

We borrow the similar argument from Theorem LABEL:the:lb-two-table, showing that an $(\epsilon,\delta)$ -algorithm that takes an instance in $\mathbb{I}(\mathsf{OUT}_{\sigma},\sigma)$ and outputs an approximate answer to each query in $\mathcal{Q}^{\sigma}_{\textsf{hier}}$ within error $\alpha$ , there exists an $(\epsilon,\delta)$ -algorithm that takes an arbitrary single table $R_{\sigma}\in(\{0,1\})^{n_{\sigma}}$ , and outputs an approximate answer to each query in $\mathcal{Q}^{\sigma}_{\textsf{one}}$ within error $\alpha^{\sigma}/LS^{\sigma}_{\textsf{count}}$ . As $\alpha^{\sigma}/LS^{\sigma}_{\textsf{count}}\geq\sqrt{n_{\sigma}}$ , $\alpha^{\sigma}\geq\sqrt{\mathsf{OUT}_{\sigma}}\cdot\sqrt{LS^{\sigma}_{\textsf{count}}}$ .

Step (2).

From Theorem 1.4, let $\mathcal{Q}_{\textsf{hier}}$ be the family of linear queries over $\mathbb{D}_{1}\times\mathbb{D}_{2}\times\cdots\mathbb{D}_{m}$ , such that $\displaystyle{\mathcal{Q}_{\textsf{hier}}=\left\{\cup_{i\in[m]}q_{i}:q_{i}\in\mathcal{Q}^{\sigma}_{\textsf{hier}},\forall\sigma\right\}}$ . Consider an arbitrary $(\epsilon,\delta)$ -differentially private algorithm $\mathcal{A}$ takes as input in $\mathbb{I}(\overrightarrow{\mathsf{OUT}})$ and outputs an approximate answer to each query in $\mathcal{Q}_{\textsf{hier}}$ within error $\alpha$ . If $\alpha\leq\max_{\sigma}\sqrt{\mathsf{OUT}_{\sigma}}\cdot\sqrt{LS^{\sigma}_{\textsf{count}}}$ , there must exist an $(\epsilon,\delta)$ -differentially private algorithm that takes an input any instance from $\mathbb{I}(\overrightarrow{\mathsf{OUT}})$ and output an approximate answer to each query in $\mathcal{Q}^{\sigma}_{\textsf{hier}}$ for some $\sigma$ , contradicting to Step (1). ∎

Uniformize Acyclic Query.

A join query $\mathcal{H}$ is acyclic if there is a join tree $\mathcal{T}$ such that every node of $\mathcal{T}$ corresponds to a relation, and for every attribute $x\in\mathbf{x}$ , the set of nodes containing $x$ forms a connected subtree of $\mathcal{T}$ . For general acyclic join query $\mathcal{H}$ , we can first fix a join tree $\mathcal{T}$ . Let $\mathcal{C}_{E}$ be the set of connected components of relations in $E$ on $\mathcal{T}$ . For each $E^{\prime}\in\mathcal{C}_{E}$ , we pick an arbitrary relation $R_{i}$ with $\mathbf{x}_{i}\cap\partial E\neq\emptyset$ as the root, which is always feasible; otherwise $E^{\prime}$ itself is a connected component, and $T_{E^{\prime}}=|\Join_{i\in E^{\prime}}R_{i}|$ . After picking $R_{i}$ as the root, we denote $p(j,i)$ for $j\neq i$ as the parent of $R_{j}$ in $\mathcal{T}$ . For completeness, let $p(i,i)=\mathbf{x}_{i}\cap\partial E$ . Then,

T_{E^{\prime}}\leq\prod_{j\in E^{\prime}}\max_{t\in\textsf{dom}(\mathbf{x}_{j}\cap(\mathbf{x}_{p(j,i)}\cup\partial E))}\big{|}R_{j}\ltimes t\big{|}.

At last, $T_{E}=\prod_{E^{\prime}\in\mathcal{C}_{E}}T_{E^{\prime}}$ . To apply the uniformization technique, it suffices to partition each relation $R_{i}$ by the degree of values in attributes $\mathbf{x}_{j}\cap(\mathbf{x}_{p(j,i)}\cup\partial E)$ , for all possible $E\subseteq[m]$ and root $R_{i}$ .

	$\displaystyle\max_{q}\|q(\textrm{avg}_{i\leq T}\mathbb{F}_{i})-q(\mathbb{J})\|$	$\displaystyle=\max_{q}\|\textrm{avg}_{i\leq T}q(\mathbb{F}_{i})-q(\mathbb{J})\|$
		$\displaystyle\leq\max_{q}\textrm{avg}_{i\leq T}\|q(\mathbb{F}_{i})-q(\mathbb{J})\|$
		$\displaystyle\leq\textrm{avg}_{i\leq T}\max_{q}\|q(\mathbb{F}_{i})-q(\mathbb{J})\|$
		$\displaystyle\leq\textrm{avg}_{i\leq T}\lambda_{i}$

		$\displaystyle\sum_{\sigma}\sqrt{\sum_{\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}$
	$\displaystyle=$	$\displaystyle\sum_{\sigma}\sqrt{\sum_{i:\pi(i)=\widetilde{\pi}(i)=\sigma}\|J_{i}\|+\sum_{i:\pi(i)\neq\sigma,\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}$
	$\displaystyle\leq$	$\displaystyle\sum_{\sigma}\sqrt{\sum_{i:\pi(i)=\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}+\sum_{\sigma}\sqrt{\sum_{i:\pi(i)\neq\sigma,\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}$
	$\displaystyle\leq$	$\displaystyle\sum_{\sigma}\sqrt{\sum_{i:\pi(i)=\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}$
	$\displaystyle+$	$\displaystyle\sum_{\sigma}\sum_{\sigma^{\prime}:\sigma^{\prime}\neq\sigma}\sqrt{\sum_{i:\pi(i)=\sigma^{\prime},\widetilde{\pi}(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma^{\prime}}_{\textsf{count}}}\cdot\sqrt{2}^{\|x\|}$
	$\displaystyle=$	$\displaystyle\sum_{\sigma}\left(\sqrt{\sum_{i:\pi(i)=\widetilde{\pi}(i)=\sigma}\|J_{i}\|}+\sum_{\sigma^{\prime}:\sigma^{\prime}\neq\sigma}\sqrt{\sum_{i:\pi(i)=\sigma,\widetilde{\pi}(i)=\sigma^{\prime}}\|J_{i}\|}\right)\cdot\sqrt{SS^{\sigma^{\prime}}_{\textsf{count}}}$
	$\displaystyle\leq$	$\displaystyle\sum_{\sigma}\sqrt{\sum_{i:\pi(i)=\sigma}\|J_{i}\|}\cdot\sqrt{SS^{\sigma}_{\textsf{count}}}\cdot\frac{1}{2^{\|\mathbf{x}\|}}$