$\uppi$ QLB: A Privacy-preserving with Integrity-assuring Query Language for Blockchain

Nasrin Sohrabi Deakin, Australia
[email protected] Norrathep Rattanavipanon PSU, Thailand
[email protected] Zahir Tari
RMIT, Australia
[email protected]

Abstract

The increase in the adoption of blockchain technology in different application domains e.g., healthcare systems, supply chain managements, has raised the demand for a data query mechanism on blockchain. Since current blockchain systems lack the support for querying data with embedded security and privacy guarantees, there exists inherent security and privacy concerns on those systems. In particular, existing systems require users to submit queries to blockchain operators (e.g., a node validator) in plaintext. This directly jeopardizes users’ privacy as the submitted queries may contain sensitive information, e.g., location or gender preferences, that the users may not be comfortable sharing with. On the other hand, currently, the only way for users to ensure integrity of the query result is to maintain the entire blockchain database and perform the queries locally. Doing so incurs high storage and computational costs on the users, precluding this approach to be practically deployable on common light-weight devices (e.g., smartphones).

To this end, this paper proposes $\uppi$ QLB, a query language for blockchain systems that ensures both confidentiality of query inputs and integrity of query results. Additionally, $\uppi$ QLB enables SQL-like queries over the blockchain data by introducing relational data semantics into the existing blockchain database. $\uppi$ QLB has applied the recent cryptography primitive, namely the function secret sharing (FSS), to achieve the confidentiality property. To support integrity, we propose to extend the traditional FSS setting in such a way that integrity of FSS results can be efficiently verified. Successful verification indicates absence of malicious behaviors on the servers, allowing the user to establish trust from the result. To the best of our knowledge, $\uppi$ QLB is the first query model designed for blockchain databases with the support for confidentiality, integrity, and SQL-like queries.

Index Terms:

Information Security, Blockchain, Private Data retrieval.

I Introduction

In $2008$ , Nakamoto [1] proposed a new way to build a cash system without the need for having a central trusted authority, such as banks, which marked the generation of cryptocurrencies (e.g., Bitcoin and Ethereum). The innovation lied in building trust among parties where the parties cannot trust each other, yet they all agree on one value of the data. The new technology, blockchain, guarantees the integrity, immutability, and tractability of the stored data. Since then, blockchain has been gaining overwhelming attention and applied in many applications such as, energy trading, healthcare systems, security trading, supply chain management, machine learning (federating learning), and rental systems. A blockchain is defined as a decentralized distributed database that runs over a p-2-p (peer-to-peer) network. It is an append-only data structure (a growing list of ordered records) that is stored among all the peers of the p-2-p network [2, 3, 4, 5, 6, 7]. The ordered records, called blocks, are securely linked together using a cryptography primitive, a hash function, forming a chain of blocks. Peers in the underlying network hosting the blockchain do not trust each other; however, the technology yet guarantees the integrity of the data. This is achieved due to: $(i)$ adding the hash of the previous block into each block, and $(ii)$ the underlying consensus protocol of the blockchain which guarantees the peers store the same replica of the blockchain at any given point of time.

From the database perspective, a blockchain is seen as a decentralised database that stores a large collection of records. Many distributed data-intensive applications have adopted the technology which as a result has increased the demand for querying the newly designed database. For example, in a cryptocurrency application (e.g., Bitcoin), a client may want to query the network to find all the transactions that have occurred during a specific time and the amount of transaction is between a range condition, e.g., $20<Transaction_{-}Amount<200$ . In another example, consider a blockchain-based healthcare system, where the data stored on blockchain are sensitive information, such as patients’ diseases and records, the doctors who visited the patients, the medications given to the patients, etc. A client may want to query the network to find how many patients with a specific disease have visited a specific hospital over a period of time. Existing solutions to query data records in blockchains have several limitations. Some works, such as FlureeDB [8], IBM [9], Oracle [10] and Bigchain [11], provided a SQL-like query language that relies on a central trusted third party (see Figure 1). That is, the central server makes a separate copy of the blockchain in a format of a relational database and then enables SQL queries over the data. This approach obviously has a major limitation, i.e., relying on a central trusted server. The server might act maliciously and either does not provide all the results or returns wrong information to the client. This jeopardizes the integrity of the query results. Additionally, a client’s query may contain (and release) sensitive information about the querier. A malicious server may learn about this sensitive information and then use it for its own benefit, compromising the confidentiality of the user’s query. Later, in $2019$ Xu et al. [12] proposed vChain to address the query processing limitations of current blockchain databases. vChain removed the need for having a centralized trusted party to perform queries and addressed the integrity issue. It enabled a client to verify the result received from the servers. vChian fails however to provide confidentiality. Additionally, vChain supports only one type of queries, i.e., the Boolean range query (Figure 2 depicts a simplistic view of vChain). Hence, the need for having a secure, private and SQL-like query processing model for blockchains remains.

Refer to caption — Figure 1: An Overview of Query Processing model based on a central trusted third party.

Existing approaches for private query processing have limitations, making them inapplicable for blockchain settings. Many private Information Retrieval (PIR) methods [13, 14, 15, 16, 17, 18] assume that a server is honest-but-curious. This assumption is rather strong in a public and unreliable domain, such as blockchain, where every node (honest or malicious) can be a server. Additionally, PIR methods often require multiple round trips to the server to compute the result and therefore consume bandwidth. Other methods based on garbled circuits [19, 20, 21] have high computational and bandwidth costs, making them not applicable for lightweight devices. Encrypted solutions e.g. [22, 23, 24, 25, 26, 27, 28, 29] are also not applicable for blockchain, as the data on the blockchain is not encrypted. ORAM is another solution proposed to protect data privacy by hiding the access patterns of clients from untrusted servers. The main challenge in ORAM is that a client can download/access only the data that s/he is uploaded/added to the server. This is a different setting in blockchain where everyone can add to the database and clients should be able to retrieve the data added by others. Such work or methods will be explained in details in the related work section, $\S$ VII.

To address the aforementioned issues, this paper proposes $\uppi$ QLB that aims to: $(i)$ facilitate SQL-like queries over blockchain data, $(ii)$ provide confidentiality for the queries, and $(iii)$ provide integrity for the query results. $\uppi$ QLB achieve this by applying the latest cryptography primitive, namely function secret sharing (FSS) [30]. Using FSS, $\uppi$ QLB divides a client’s query into several subqueries and sends each subquery to a miner node. Recall that a typical blockchain network comprises three types of nodes: full node, miner, and light node. The full node maintains a full copy of the blockchain database (block headers and data records), the miner node is a full node that participates in the mining process (generating new blocks), and the light node hosts only the block headers. In $\uppi$ QLB, every miner and full node can provide the query processing service to the client, and hence the term service provider (SP) is used in this paper to refer to any of such nodes. An overview of $\uppi$ QLB is provided in Figure 3.

Consider a common application in a blockchain-based healthcare system that stores records of a tuple of $<T,\ V,\ W>$ , where $T$ is the time, $V$ is the cost (it can refer to an insurance number), and $W$ is a set of attributes, $W=<dr_{-}id,\ pt_{-}id,\ dr_{-}spec,\ disease,\ pres,\ hosp,\ loc>$ , where $dr_{-}id$ corresponds to the doctor id (it can be an anonymous id depending on the security requirements of the system), $pt_{-}id$ is the patient id, $dr_{-}spec$ is the doctor’s specialties, the $disease$ is the disease that has been diagnosed, $pres$ is the treatment that the doctor has prescribed for the patient, $hosp$ is the hospital’s name, $loc$ is the geographical location of the hospital. The records are stored in blockchain database which is located in the SPs’ premises. A client (e.g., a patient) might want to query the blockchain, e.g., to find all the doctors with specific specialties in nearby hospitals, or to find how many times a specific disease has been diagnosed within a time period in a specific hospital. These queries are submitted to various SPs, and they may however contain sensitive information that clients might not feel comfortable sharing with the SPs. $\uppi$ QLB enables clients to hide information based on their specific choice when submitting queries to SPs (confidentiality) as well as to prevent malicious SPs from tampering with the query results (integrity).

As depicted in Figure 3, $\uppi$ QLB first divides the client’s query into several subqueries. When dividing a query, a client will decide which information needs to be kept hidden (and protected). Each subquery is then sent to a single SP, and the SPs perform the subqueries separately and send the result back to the client. The client combines the results while checking the integrity. Only when this integrity check passes, it then can obtain the final query output. With the assumption that at least one service provider (SP) is honest, $\uppi$ QLB is proven to provide query confidentiality and response integrity security properties. It also guarantees that the query is evaluated on all the expected records.

The rest of this paper is organized as follows. Motivating examples are provided in $\S$ II followed by a description of FSS in $\S$ III. The architecture of $\uppi$ QLB is detailed in $\S$ IV and the evaluation results are provided in $\S$ VI. Existing work is elaborated in $\S$ VII and $\S$ VIII concludes the paper.

II Motivating Examples

This section provides use-case examples to motivate the importance of security and privacy of query processing in blockchain databases.

II-A Use-case examples

Consider a blockchain-based stock market where the data records in the blockchain are about the information related to the various stocks (e.g., share prices, available shares, transaction amount, number of shares). In such a system, the SPs will host the information related to the stock market.

To run transactions (e.g., buy, sell, or check the market prices on specific shares), a client will first need to query the system by sending a request to the SPs. The query will contain information that if the (potentially malicious) SP learns, it can impact the result. For example, if an SP learns that a specific stock (mentioned in a client’s query) is either a hot stock (i.e., many clients queried it) or people are bidding on opening price, then the malicious SP could tamper the result and influence the market (e.g., increase the actual price of a specific share). However, if integrity of the response can be verified, the malicious SP will be not able to carry out such an activity.

In another example, let us consider a blockchain-based rental-car system where car information (e.g., price, model, built-year, color, customer pick-up location, customer drop-off location, city, and rental-period) is the data recorded in the blocks. To rent cars, clients will first perform a search to find their preferred ones. To do so, clients send queries to various SPs. Such queries might however contain sensitive information (e.g., pick-up & drop-off locations, price ranges, or rental-period). If SPs learn such confidential information, it can jeopardize the client’s privacy.

II-B Query examples

$\uppi$ QLB enables running SQL-like queries over blockchain-based systems. In this section, we describe some examples of such queries.

Using the rental-car example, we assume that a client wishes to find the minimum price for a specific car model with a specific color. This client also prefers to hide the car model and the color of the car in the query. Hence, the client generates the following query:

\justify

SELECT MIN( $price$ ) WHERE $car_{-}model$ = ? $\wedge$ $color$ = ?

Note that “ $?$ ” indicates the sensitive information that $\uppi$ QLB must hide from SPs when sending the query.

Let us consider another example, namely a healthcare system. A client may need to find the total number of patients diagnosed with a specific disease in a specific hospital, or the total number of patients visited a specialist at a specific hospital/location.

Using $\uppi$ QLB, the client can hide the information s/he prefers (e.g., $disease$ and $dr_{-}spec$ ) by generating the following query:

\justify

SELECT COUNT( $pt_{-}id$ ) WHERE $disease$ =? $\wedge$ $dr_{-}spec$ =?

III Background

This section provides first some background about function secret sharing (FSS) that serves as the main building block in $\uppi$ QLB. Then, we describe the problem stemming from directly applying FSS-based methods to our target setting.

III-A FSS Background

Let $\mathbb{G}$ denote an Abelian group and an integer $p>1$ . FSS [30, 31, 32] is defined by a tuple ( $\sf{\mathsf{G}en}$ , $\sf{\mathsf{E}val}$ ):

•

${\sf{\mathsf{G}en}}(1^{\uplambda},f)\rightarrow K_{1},...,K_{p}$ : It gets a security parameter $1^{\uplambda}$ and a function $f:\{0,1\}^{n}\rightarrow\mathbb{G}$ , then splits $f$ into $p$ function shares (or keys) ( $K_{1},...,K_{p}$ ).
•

${\sf{\mathsf{E}val}}(K_{i},x)\rightarrow y_{i}$ : It takes $K_{i}$ and $x$ as input and outputs $y_{i}$ which is the result corresponding to the party $i$ .

FSS provides both correctness and security. FSS’s correctness ensures that adding up all outputs from $\sf{\mathsf{E}val}$ is equivalent to the original $f$ , i.e., $\sum_{i=1}^{p}y_{i}=y=f(x)$ . For security, FSS guarantees any strict subset of $K_{i}$ does not reveal anything about the original function $f$ . The work in [30] provides a game-based definition of FSS security in Definition 1. This game models an adversary ( $\sf{\mathcal{A}dv}$ ) capable of compromising $p-1$ shares and aiming to use this knowledge to identify which $\sf{\mathcal{A}dv}$ -generated function ( $f^{0}$ or $f^{1}$ ) is used to generate these $p-1$ shares. Theorem 1 states that an FSS scheme is considered secure if $\sf{\mathcal{A}dv}$ cannot distinguish $f^{0}$ and $f^{1}$ based on $p-1$ output shares with more than a negligible probability. In other words, without all $p$ shares, $\sf{\mathcal{A}dv}$ cannot infer anything about the original $f$ .

Definition 1

FSS Security Game ( $\sf{\mathsf{G}ame^{FSS}}$ ) [30]:

1.

Let $\Sigma=({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}})$ .
2.

$\sf{\mathcal{A}dv}$ selects two functions $f^{0},f^{1}:\{0,1\}^{n}\rightarrow\mathbb{G}$ and gives these functions to the challenger.
3.

The challenger samples $b\leftarrow\{0,1\}$ and computes $(K_{1},...,K_{p})\leftarrow{\sf{\mathsf{G}en}}(1^{\uplambda},f^{b})$ . Then he gives $p-1$ shares, e.g., $(K_{1},...,K_{p-1})$ , to $\sf{\mathcal{A}dv}$ .
4.

$\sf{\mathcal{A}dv}$ outputs a guess $b^{\prime}\in\{0,1\}$ . The game outputs $1$ if $b^{\prime}=b$ ; otherwise, it outputs $0$ .

Theorem 1

A scheme $\Sigma=({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}})$ is secure if for all probabilistic polynomial-time $\sf{\mathcal{A}dv}$ , there exists a negligible function $\sf{\mathsf{n}egl}$ such that:

Pr[{\sf{\mathsf{G}ame^{FSS}_{\sf{\mathcal{A}dv},\Sigma}}}(\uplambda)=1]\leq{\sf{\mathsf{n}egl}}(\uplambda)

In this work, we consider two types of $f$ for FSS:

•

Distributed point function $f_{a,y}$ that is evaluated to a fixed element $y\in\mathbb{G}$ on a specific input $a$ , and to $0$ otherwise, i.e.:

$f_{a,y}(x)=\begin{cases}y,&\text{if }\ x=a\\ 0,&\text{otherwise}\end{cases}$ (1)
•

Interval function $f_{a,b,y}$ that is evaluated to $y\in\mathbb{G}$ only when an input is within a specific interval $[a,b]$ , and to $0$ otherwise, i.e.:

$f_{a,b,y}(x)=\begin{cases}y,&\text{if }\ a\leq x\leq b\\ 0,&\text{otherwise}\end{cases}$ (2)

For these two types, Theorem 1 implies confidentiality of $a$ , $b$ and $y$ , i.e., their values cannot be inferred from any of the function shares.

III-B Using FSS for blockchain

Blockchain can be considered as a public database. It is a decentralized distributed database where the data records are transparent for public, i.e., everyone can access the data. To query the public database with privacy-preserving guarantees, several studies have been conducted and different models have been proposed. Recently, with the introduction of FSS, several researchers proposed different models to privately query the public records [33, 31, 34]. In such models, a client generates keys using FSS with $y=1$ in Eq.1 or 2, and sends these shares to the servers. A given server evaluates the function share on each record of the database, sums the evaluation together and returns the result to the client. The client upon receiving all the results adds them up and finds the result of the query.

In all these settings, while FSS guarantees confidentiality of client queries, a client is unable to verify the integrity of the results received from servers. In particular, a malicious server can introduce an error $\delta$ to the final output without being detected by the client. This can be done by having the malicious server respond to the subquery with $y_{\sf{\mathcal{A}dv}}+\delta$ , instead of $y_{\sf{\mathcal{A}dv}}$ , where $y_{\sf{\mathcal{A}dv}}$ is a benign output from ${\sf{\mathsf{E}val}}(K_{\sf{\mathcal{A}dv}},x)$ . In this case, after aggregating all outputs, the client obtains the incorrect result that is off from the expected value by $\delta$ , i.e., $(\sum_{i=1,i\neq\sf{\mathcal{A}dv}}^{p}y_{i})+(y_{\sf{\mathcal{A}dv}}+\delta)=(\sum_{i=1}^{p}y_{i})+\delta=y+\delta=f(x)+\delta$ .

In the blockchain context, verifiable integrity of the query response is of paramount importance due to the followings:

•

Blockchain is a decentralized system, i.e., there is no central authority controlling blockchain; hence, some servers might be malicious. It is therefore important that a client can verify the query response. This is different for public databases, as they are owned by organizations that can protect servers and detect malicious parties at early stages.
•

The size of a blockchain database is significantly high and lightweight devices are unable to download the entire database to perform queries locally. Hence, it is important to enable them to verify the query responses.

This work describes $\uppi$ QLB as a solution to address the shortcomings of existing models. $\uppi$ QLB makes use of FSS to enable private search of blockchain databases, while providing the integrity of the query results and retaining the original goal of ensuring the confidentiality of clients’ queries.

IV $\uppi$ QLB

IV-A System Model and Syntax

As shown in Figure 4, $\uppi$ QLB is comprised of two entities:

•

Service Provider (SP) is the full node or the miner in blockchain terminology. An SP retains a full replica of the blockchain database and is able to provide query processing services.
•

Client is a user of the system. It sends query requests to SPs. A client can be either a light node or an external node. As mentioned, a light node in the blockchain terminology refers to the node that only hosts the block header data. However, an external node is the node/user who does not hold any of he blockchain data and does not exist in the blockchain network. It only sends queries to SPs.

In terms of algorithms, $\uppi$ QLB consists of a tuple $({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}},{\sf{\mathsf{V}erif}})$ , defined as follows:

•

${\sf{\mathsf{G}en}}(1^{\uplambda},q,s)\rightarrow q^{\prime},K_{1},...,K_{p}$ : It takes as input a security parameter $1^{\uplambda}$ , a query with the format described in $\S$ IV-B and a set of secret expressions $s$ in $q$ . It then outputs $p$ function shares and a private query $q^{\prime}$ that removes $s$ value (i.e., replacing it with $?$ as in Section II). This function is performed by the client.
•

${\sf{\mathsf{E}val}}(K_{i},D,q^{\prime})\rightarrow y_{i}$ : It takes as input a function share $K_{i}$ , which corresponds to each SP, $D$ – the entire data from blockchain and a private query $q^{\prime}$ . It then performs an evaluation using $K_{i}$ and $q^{\prime}$ on $D$ and outputs $y_{i}$ . This function is performed by a SP.
•

${\sf{\mathsf{V}erif}}(y_{1},...,y_{p})\rightarrow R$ : It takes as input all the share outputs received from the SPs. Then, it verifies whether all $y_{i}$ -s are honestly generated on the same blockchain data. If it detects any malformed $y_{i}$ , it outputs $\perp$ (i.e., aborts). Otherwise, it produces the query result $R$ . This function is performed by the client.

IV-B Query Model

$\uppi$ QLB is designed to facilitate the execution of SQL-liked queries on blockchain databases. It first limits the number of blocks that SPs need to search over, then transforms the SQL syntax into a format compatible with the blockchain system. It then uses FSS [31] to divide the query into $p$ shares (queries).

Fig 5 depicts the $\uppi$ QLB’s query format, where type refers to one of the following query types: SUM, COUNT, AVG, MIN and MAX queries. blk_range_cond specifies the time range, i.e., $t_{1}<blk_{-}range_{-}cond<t_{2}$ , where $t_{1}$ is the lower bound and $t_{2}$ is the upper bound. This condition is necessary because in blockchain there is no table to be specified in order to narrow down the search on a database; but rather the full history of all the generated blocks since the genesis block. Hence, it is a must to narrow down the number of blocks that the SP needs to search over; otherwise, the search can be too expensive, time-consuming or may never stop. Thus, $\uppi$ QLB shortens the number of blocks by introducing the blk_range_cond, $t_{1}$ must be bigger than $t_{g}$ , which refers to the time that the genesis block was generated, and $t_{2}$ must be lower than $t_{c}$ , which is the current time. $\uppi$ QLB has considered another condition for blk_range_cond (i.e., the difference between $t_{1}$ and $t_{2}$ ) which should be limited and must not exceed a threshold $\uptau$ . This is to ensure that the number of blocks generated between $t_{1}$ and $t_{2}$ does not exceed a given threshold. The value of $\uptau$ depends on the block generation rate of the network. If the block generation rate is high, then $\uptau$ must be low; and if the block generation rate is low, then $\uptau$ can be bigger.

$\uppi$ QLB is designed to support two different conditions: $(i)$ single, and $(ii)$ range condition.

•

Single condition is when the query condition has only one value. That is in the query format where expr in the condition equals to only one secret value. In this case, the ${\sf{\mathsf{G}en}}$ function defines $f$ based on Eq.1.
•

Range condition is when the condition is within a range interval. In this condition, the ${\sf{\mathsf{G}en}}$ function defines $f$ following Eq.2.

Additionally, $\uppi$ QLB is designed to support two query models: simple and complex. The simple model refers to when a query has only one condition (either single or range). The complex model refers to when a query is composed of several conditions using $AND$ and $OR$ . Note that, $\uppi$ QLB supports not more than one range condition in one query request.

Figure 4 illustrates how a query processing works in $\uppi$ QLB, which involves the following steps:

1.

The client generates a query, $q$ and specifies the secret information (i.e., the information that the client prefers not to share with SPs) to specify the private query $q^{\prime}$ .
2.

Using $\uppi$ QLB. $\sf{\mathsf{G}en}$ function, the query is divided into several subqueries/function shares, depicted as $f_{i}$ in Figure 4.
3.

The client forwards the subqueries to SPs.
4.

SPs execute the subquery using $\uppi$ QLB. $\sf{\mathsf{E}val}$ function and return the results back to the client.
5.

The client combines the results and generates the final output of the query using the $\uppi$ QLB. $\sf{\mathsf{V}erif}$ function. This function enables the client to either abort or acquire the query’s response.

IV-C Threat Model

$\uppi$ QLB assumes that at least one of the SPs is honest, i.e., it always honestly follows the protocol and does not collude with other SPs. The rest of the SPs can be malicious. The malicious SP ( $\sf{\mathcal{A}dv}$ ) may attempt to learn the sensitive information from the client’s query in order to set up different attacks. For example, in a blockchain-based stock market application, if $\sf{\mathcal{A}dv}$ finds out the shares that the client is interested in purchasing, $\sf{\mathcal{A}dv}$ may modify the price of the share to charge the client more; or $\sf{\mathcal{A}dv}$ may use the location information to endanger the client. Moreover, $\sf{\mathcal{A}dv}$ can also tamper with the query and the response. Tampering with the query means the adversary may search on a different query instead of the client-provided query. Tampering with the response leads to returning an incorrect or forged response.

We assume that the client is not malicious. A malicious client is a client that attempts to access private data records or tampers with the existing records in the server. This is inapplicable here because $\uppi$ QLB is proposed for the public data records, where the database is not private and not directly modifiable by the client. Hence, everyone can see the data publicly. $\uppi$ QLB also assumes that the communication between an SP and the client is over a secure channel (e.g., SSL) and the user of the system is not compromised, i.e., the client follows the protocol.

IV-D Security Properties

At high level, $\uppi$ QLB provides two security properties:

1.

Query Confidentiality: it ensures that sensitive information from the query remains hidden to SPs.
2.

Response Integrity: it guarantees detection when malicious SPs tamper with the query responses.

The Query Confidentiality property is defined using the security game provided in Definition 2. It models $\sf{\mathcal{A}dv}$ who can compromise $p-1$ SPs with the goal of learning some sensitive information about the query. It assumes $\sf{\mathcal{A}dv}$ has oracle access to all $\uppi$ QLB algorithms. $\sf{\mathcal{A}dv}$ is allowed to submit two queries to the challenger as long as these queries contain the same structure and differ only by the secret value in their query condition. The challenger randomly selects one of the two submitted queries, computes the private query $q^{\prime}$ and $p-1$ shares from the selected query and outputs them to $\sf{\mathcal{A}dv}$ . $\sf{\mathcal{A}dv}$ wins the game if it can identify which query is used by the challenger.

Definition 2

Query Confidentiality Game ( $\sf{\mathsf{G}ame^{QConf}}$ ):

1.

Let $\Sigma=({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}},{\sf{\mathsf{V}erif}})$ .
2.

$\sf{\mathcal{A}dv}$ is given oracle access to $\Sigma$ and $D$ blockchain data.
3.

$\sf{\mathcal{A}dv}$ selects a list of secret expressions $s$ and two queries $q_{0},q_{1}$ , where these queries differ only by the $s$ value, and gives $s,q_{0},q_{1}$ to the challenger.
4.

The challenger samples $b\leftarrow\{0,1\}$ , calls $(q^{\prime},K_{1},...,K_{p})\leftarrow{\sf{\mathsf{G}en}}(1^{\uplambda},q_{b},s)$ , and outputs $q^{\prime}$ and $p-1$ shares (e.g., $K_{1},...,K_{p-1}$ ) to $\sf{\mathcal{A}dv}$ .
5.

$\sf{\mathcal{A}dv}$ outputs a guess $b^{\prime}\in\{0,1\}$ . The game outputs $1$ if $b=b^{\prime}$ ; otherwise, it outputs 0.

Next, we formalize the Response Integrity property with the security game provided in Definition 3. It models $\sf{\mathcal{A}dv}$ that aims is to manipulate the query response by gaining control of $p-1$ SPs.

Definition 3

Reponse Integirty Game ( $\sf{\mathsf{G}ame^{RInt}}$ ):

1.

Let $\Sigma=({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}},{\sf{\mathsf{V}erif}})$ .
2.

$\sf{\mathcal{A}dv}$ is given oracle access to $\Sigma$ and $D$ blockchain data.
3.

$\sf{\mathcal{A}dv}$ selects a query $q$ , a list of secret expressions $s$ and sends them to the challenger.
4.

The challenger calls ${\sf{\mathsf{G}en}}(1^{\uplambda},q,s)$ , producing $q^{\prime},K_{1},...,K_{p}$ . He then gives $q^{\prime}$ and $p-1$ shares (says $K_{1},...,K_{p-1}$ ) to $\sf{\mathcal{A}dv}$ .
5.

$\sf{\mathcal{A}dv}$ produces $y_{1}^{\sf{\mathcal{A}dv}},...,y_{p-1}^{\sf{\mathcal{A}dv}}$ and gives them to the challenger.
6.

The challenger computes $y_{p}={\sf{\mathsf{E}val}}(K_{p},D,q^{\prime})$ and $R^{\prime}={\sf{\mathsf{V}erif}}(y_{1}^{\sf{\mathcal{A}dv}},...,y_{p-1}^{\sf{\mathcal{A}dv}},y_{p})$ . He also directly queries $q$ on $D$ , producing $R$ .
7.

The game outputs $1$ if $R^{\prime}\neq\perp$ and $R^{\prime}\neq R$ ,; otherwise, it outputs $0$ .

Ultimately, we want to prove Theorem 2 and Theorem 3 below to formally guarantees Query Confidentiality and Response Integrity, respectively, in $\uppi$ QLB.

Theorem 2

$\uppi$ QLB provides Query Confidentiality if for all probabilistic polynomial-time $\sf{\mathcal{A}dv}$ there exists a negligible function $\sf{\mathsf{n}egl}$ such that:

Pr[{\sf{\mathsf{G}ame^{QConf}_{\sf{\mathcal{A}dv},{\Large\uppi}{QLB}}}}(\uplambda)=1]\leq{\sf{\mathsf{n}egl}}(\uplambda)

Theorem 3

$\uppi$ QLB provides Response Integrity if for all probabilistic polynomial-time $\sf{\mathcal{A}dv}$ there exists a negligible function $\sf{\mathsf{n}egl}$ such that:

Pr[{\sf{\mathsf{G}ame^{RInt}_{\sf{\mathcal{A}dv},{\Large\uppi}{QLB}}}}(\uplambda)=1]\leq{\sf{\mathsf{n}egl}}(\uplambda)

The proofs for these two theorems are provided in Section V-B after various concepts are introduced.

V Query Execution

For the simplicity we explain the execution of the queries with an example. Assume each block contains several records, each denoted by $O$ which represents an object/transaction id (note that each transaction or object that is generated has an id). And assume each record, say $O_{i}$ , is a tuple of $<t_{i},V_{i},W_{i}>$ , where $t$ represents the time that the $O$ is generated; $V$ is a numerical value of $O$ , e.g., for a cryptocurrency $V$ is the transaction amount and in a non-cryptocurrency application it may refer to any numerical value required by that system (e.g., the amount of rental price in a blockchain-based rental car system or it could be an index value); $W$ is the set of attributes in the form of $\{Item,\ Price,\ Color\}$ . Thus, a block content can be depicted as Figure 6. $\uppi$ QLB executes queries over $W$ . For simplicity, we assume a table representation of $W$ , as depicted in Figure 7.

To execute a query, $\uppi$ QLB first limits/specifies the number of blocks of the blockchain that it requires to search over based on the time window, denoted as blk_range_cond, which is provided in the query. Here, we assume that the blk_range_cond, returns four blocks (i.e., there are four blocks that their generation time falls between blk_range_cond) and each block contains only one record. Figure 8 depicts a table representation of the four blocks and their content.

We next exemplify $\uppi$ QLB’s steps via five queries:

•

$q_{1}$ : \justifySELECT SUM( $Price$ ) FROM $(1/06/2022)$ $<$ $blk_{-}range_{-}cond$ $<$ $(4/06/2022)$ WHERE $Item$ = $2$ .
•

$q_{2}$ : \justifySELECT COUNT( $Item$ ) FROM $(1/06/2022)$ $<$ $blk_{-}range_{-}cond$ $<$ $(4/06/2022)$ WHERE $4<Price<10$ .
•

$q_{3}$ : \justifySELECT AVG( $Price$ ) FROM $(1/06/2022)$ $<$ $blk_{-}range_{-}cond$ $<$ $(4/06/2022)$ WHERE $Item$ = $2$ .
•

$q_{4}$ : \justifySELECT MAX( $Price$ ) FROM $(1/06/2022)$ $<$ $blk_{-}range_{-}cond$ $<$ $(4/06/2022)$ WHERE $Item$ = $2$ $\wedge$ $Color=red$
•

$q_{5}$ : \justifySELECT MIN( $Price$ ) FROM $(1/06/2022)$ $<$ $blk_{-}range_{-}cond$ $<$ $(4/06/2022)$ WHERE $2\leq Item\leq 5$

Also, note that the $q^{\prime}$ denotes the query with the hidden information, i.e., the client’s sensitive information is hidden in $q^{\prime}$ . For example, $q_{1}^{\prime}$ can be constructed as follows:

\justify

SELECT SUM( $Price$ ) FROM $(1/06/2022)$ $<$ $blk_{-}range_{-}cond$ $<$ $(4/06/2022)$ WHERE $Item$ = ?

V-A Simple Query Model

We now discuss how to construct $\uppi$ QLB to serve a query with a single condition.

${\sf{\mathsf{G}en}}(1^{\uplambda},q,s)$ : depending on the query condition, single or range, this function first samples a random value $y\in\mathbb{G}$ and constructs the $f$ function from $y$ , where $f:\{0,1\}^{n}\rightarrow\mathbb{G}$ . In this work, we assume the size of group $\mathbb{G}$ is exponential in terms of $\uplambda$ . For instance, $f$ could be $f:\{0,1\}^{n}\rightarrow\mathbb{Z}_{2^{\uplambda}}$ . Among the above query examples, $q_{1}$ and $q_{3}$ are single and $q_{2}$ and $q_{5}$ are range conditions. Thus, if we pass $q_{1}$ and $q_{2}$ to the ${\sf{\mathsf{G}en}}$ function, it behaves as follows:

•

Single condition – ${\sf{\mathsf{G}en}}(1^{\uplambda},q_{1},\{Item\})$ : $\sf{\mathsf{G}en}$ takes both $q_{1}$ and the secret attribute Item as input. This signifies that Item is sensitive information and the client aims to hide its value in the query. Thus, $\sf{\mathsf{G}en}$ randomly samples $y$ from $\mathbb{G}$ and defines $f$ as follows:

$f(x)=\begin{cases}y,&\text{if }\ x=2\\ 0,&\text{otherwise}\end{cases}$ (3)
•

Range condition – ${\sf{\mathsf{G}en}}(1^{\uplambda},q_{2},\{Price\})$ : $\sf{\mathsf{G}en}$ takes as an input the $q_{2}$ and the secret attribute Price. Again, it randomly samples $y$ from $\mathbb{G}$ and then defines the $f$ as follows:

$f(x)=\begin{cases}y,&\text{if }\ 1<x<10\\ 0,&\text{otherwise}\end{cases}$ (4)

In this work, we assume that the range condition evaluates to one record only. To allow for multiple records within a specified range, $\uppi$ QLB can be extended by adopting the interval evaluation technique used by Splinter [34], which we leave for future work. After defining $f$ , it then calls FSS. ${\sf{\mathsf{G}en}}(1^{\uplambda},f)$ to output $p$ function shares. The shares and the private query $q^{\prime}$ are sent to SPs in a format of $\{K_{i},q^{\prime}\}$ .

${\sf{\mathsf{E}val}}(K_{i},D,q^{\prime})$ : the SP, upon receiving $\{K_{i},q^{\prime}\}$ , performs three operations on $D$ (blockchain data). First, it specifies/limits the number of blocks (to process the query) in $D$ according to $blk_{-}range_{-}cond$ in the $q^{\prime}$ . Second, it creates an intermediate table depending on the query condition (single or range, which can be extracted from $q^{\prime}$ ). If the query is a single condition query, the $SP$ executes $GROUPBY$ over the column in the query condition (to recall the query condition, see Figure 5). And if the query is a range condition query, then the $SP$ creates the intermediate table over the column in the query type (to recall the query type, see Figure 5) without executing $GROUPBY$ . Third, it generates the binary representation of the values of the column in the query type, denoted as $BIN$ .

After that, $\sf{\mathsf{E}val}$ evaluates two sub-functions on the intermediate table: $(i)$ $comp$ , stands for compute; and $(ii)$ , $merg$ , stands for merge. The $comp$ is equivalent to FSS. $\sf{\mathsf{E}val}$ except it is executed over the binary representation of the values of the column in the query type, $BIN$ . That is, $comp$ takes as an input two values: $(i)$ $c$ that refers to the column in the condition, the value of which is hidden; and $(ii)$ $BIN$ ; and performs the FSS. $\sf{\mathsf{E}val}$ function over these two inputs. $comp$ is executed $l$ times, where $l$ is the number of columns of $BIN$ , in other word, the maximum number of bits required to represent the binary value of the data, $2^{l}$ . The output of each execution of $comp$ is given to the $merg$ function. $merg$ takes as input all the values generated from $comp$ and merge/concatenate them together and generates a binary array, $y_{i}$ , with $l$ elements. The $y_{i}$ is the final output of $\sf{\mathsf{E}val}$ and is returned to the client. Algorithm 1 details the $\sf{\mathsf{E}val}$ function.

y_{i}\leftarrow

[];

c\leftarrow\text{GetConditionColumn}(D,q^{\prime})

;

interTable\leftarrow\text{CreateIntermediaryTable}(D,q^{\prime})

;

BIN\leftarrow\text{GetBinaryRep}(interTable,q^{\prime})

;

l\leftarrow\text{numColumn}(BIN)

;

6 for $k=0~{}\text{to}~{}l$ do

b_{k}\leftarrow comp(K_{i},c,BIN,k)

; // see Eq.5

y_{i}\leftarrow merg(y_{i},b_{k})

;

10 end for

return

y_{i}

Algorithm 1

\uppi

QLB.

{\sf{\mathsf{E}val}}(K_{i},D,q^{\prime})

The behaviour of this function, specifically the $comp$ sub-function, depends on the query types. We now explain this function in detail for each query type that $\uppi$ QLB supports, namely, SUM, AVG, COUNT, MIN, and MAX.

•

SUM: assume that the $SP$ receives $\{K_{i},q_{1}^{\prime}\}$ . $SP$ generates the intermediate table for this query, depicted in Figure 9. Then, it generates the binary representation of the values of the column $SUM(Price)$ , shown in Figure 9, as the $BIN$ (recall that the size of each row in the $BIN$ is $l$ , i.e., the maximum bits required to represent the binary value of the data). Finally, the $\sf{\mathsf{E}val}$ function is triggered. $\sf{\mathsf{E}val}$ function first executes the $comp$ subfunction which takes as an input $Item$ and $BIN$ as in Figure 9. One execution of the $comp$ is as follows:

$b_{k}=\displaystyle\sum_{j=0}^{n}FSS.{\sf{\mathsf{E}val}}(K_{i},c_{j})\cdot BIN[j][k]$ (5)

This function is repeated for all the columns in $BIN$ , Figure 9. The outputs of each execution of $comp$ , i.e., $b_{k}$ , are sent to the $merg$ subfunction, where the $b_{k}$ -s are merged/concatenated into an array $y_{i}$ . The final $y_{i}$ is then returned to the client. Algorithm 1 depicts the $\sf{\mathsf{E}val}$ function execution.

Figure 9: Intermediate Table and the binary representation of the column in query type generated for $q_{1}$ .
•

COUNT: In this query type, the $\sf{\mathsf{E}val}$ execution is similar to the SUM-based query type. Assume $SP$ receives $\{K_{i},q_{2}\prime\}$ . It creates the intermediate table and $BIN$ as depicted in Figure 10. Then, $SP$ can apply $comp$ sub-function for all columns in $c$ and appends the results into a single array $y_{i}$ , which is in turn sent back to the client.

Figure 10: Intermediate Table and the binary representation of the column in query type generated for $q_{2}$ .
•

AVG: This type of query results in a similar execution of $\sf{\mathsf{E}val}$ as the SUM- and COUNT-based query types. Suppose $SP$ receives $\{K_{i},q_{3}\prime\}$ , we refer to Figure 11 for the resulting intermediate table and $BIN$ variable in this query.

Figure 11: Intermediate Table and the binary representation of the column in query type generated for $q_{3}$ .
•

MAX: among above queries, the $q_{4}$ is a MAX query which is an example of the complex query model; hence, we defer its explanation to $\S$ V-C.
•

MIN: if the $SP$ receives $Q_{i}=\{K_{i},q_{5}\prime\}$ , it first creates the intermediate table and then performs the $comp$ and $merg$ similar to the previous queries. We thus omit its detail for brevity.

$\sf{\mathsf{V}erif}$ $(y_{1},...,y_{p})$ : Upon receiving the results from all $SP$ -s, the client executes the $\sf{\mathsf{V}erif}$ function to output the final results. Recall that $y_{i}$ is a binary array of the outputs from the $\sf{\mathsf{E}val}$ function executed by $SP_{i}$ on each column of the $BIN$ matrix. Thus, in order to find the final result of the query, the client needs to perform the $\sf{\mathsf{V}erif}$ function on each bit of the $y_{i}$ . To perform this, the client follows these steps: $(i)$ , it creates a binary matrix of the received $y_{i}$ -s; $(ii)$ , it adds the values of each column of the matrix and include it to a $temp$ variable. $if(temp==y)$ then it adds a binary value $1$ into the result array, otherwise $if(temp==0)$ then it adds a binary value $0$ into the result array; if $temp$ is neither $y$ nor $0$ , the client detects a misbehaviour and aborts the protocol. This could be due to the server being malicious or other errors in the systems. Algorithm 2 details the $\sf{\mathsf{V}erif}$ function.

1for $i=0~{}\text{to}~{}len(y_{1})$ do

temp\leftarrow\sum_{j=0}^{p}y_{j}[i]

;

3 // Recall

y

is selected in

\sf{\mathsf{G}en}

4 if $temp$ is $y$ then

R[i]\leftarrow 1

;

6 else if $temp$ is $0$ then

R[i]\leftarrow 0

;

8 else

9 // Detect misbehaviour

10 return

\perp

;

11 end if

13 end for

14return

R

;

Algorithm 2

\uppi

QLB.

{\sf{\mathsf{V}erif}}(y_{1},...,y_{p})

V-B Proof

This section provides proofs for Theorem 2 and Theorem 3.

V-B1 Query Confidentiality

Intuitively, $\uppi$ QLB inherits the secrecy property from the FSS primitive and therefore $\sf{\mathcal{A}dv}$ cannot infer anything about the secret value in the $f$ function, which is also considered sensitive in $\uppi$ QLB. To provide a more formal argument, we prove Theorem 2 via a reduction of $\sf{\mathsf{G}ame^{FSS}}$ in Definition 1 to $\sf{\mathsf{G}ame^{QConf}}$ in Definition 2. In other words, we aim to show that the existence of $\sf{\mathcal{A}dv}$ that breaks $\sf{\mathsf{G}ame^{QConf}}$ can be used to construct $\sf{\mathcal{A}dv}^{FSS}$ that breaks the FSS security guarantee in Theorem 1.

Proof:

Suppose there exists $\sf{\mathcal{A}dv}$ such that $Pr[{\sf{\mathsf{G}ame^{QConf}_{\sf{\mathcal{A}dv},{\Large\uppi}{QLB}}}}=1]>{\sf{\mathsf{n}egl}}(\uplambda)$ . $\sf{\mathcal{A}dv}^{FSS}$ asks $\sf{\mathcal{A}dv}$ to play $\sf{\mathsf{G}ame^{QConf}_{\sf{\mathcal{A}dv},{\Large\uppi}{QLB}}}$ and can simply use the game output $b^{\prime}$ to win $\sf{\mathsf{G}ame^{FSS}}$ . This works because we can construct the $\sf{\mathsf{G}ame^{FSS}}$ sequence from $\sf{\mathsf{G}ame^{QConf}}$ as follows:

1.

$\sf{\mathcal{A}dv}$ can extract $f^{0}$ and $f^{1}$ from $q_{0}$ and $q_{1}$ . Since these queries are sent to the challenger in $\sf{\mathsf{G}ame^{QConf}}$ , these functions are also included in this message. (corresponding to Step 2 of $\sf{\mathsf{G}ame^{FSS}}$ )
2.

The challenger samples $b\leftarrow\{0,1\}$ and invokes $\uppi$ QLB. $\sf{\mathsf{G}en}$ using $q_{b}$ . Since $\uppi$ QLB. $\sf{\mathsf{G}en}$ derives $f^{b}$ from $q_{b}$ and calls FSS. $\sf{\mathsf{G}en}$ on $f^{b}$ , it means that in this step the challenger also computes $(K_{1},...,K_{p})\leftarrow FSS.{\sf{\mathsf{G}en}}(1^{\uplambda},f^{b})$ and outputs $p-1$ shares to $\sf{\mathcal{A}dv}$ . (corresponding to Step 3 of $\sf{\mathsf{G}ame^{FSS}}$ )
3.

$\sf{\mathcal{A}dv}$ outputs $b^{\prime}\in\{0,1\}$ . (corresponding to Step 4 of $\sf{\mathsf{G}ame^{FSS}}$ )

Therefore, $\sf{\mathcal{A}dv}^{FSS}$ can win $\sf{\mathsf{G}ame^{FSS}}$ with the same non-negligible probability that $\sf{\mathcal{A}dv}$ has of winning $\sf{\mathsf{G}ame^{QConf}}$ .

∎

V-B2 Response Integrity

To prove the response integrity property, we first show $\uppi$ QLB satisfies $\sf{\mathsf{G}ame^{RInt}}$ in a relaxed scenario, where an intermediate table can be represented using only one bit, hence proving Lemma 1. Then, we can extend this proof to the generic scenario that requires an $l$ -column intermediate table, for arbitrary value of $l$ .

Lemma 1

Theorem 3 holds for a relaxed scenario where blockchain data $D$ is transformable to a one-bit intermediate table for the input query $q$ .

Proof:

In this special relaxed scenario, we consider that after SPs convert $D$ into an intermediate table, this table contains exactly 1 column (or 1 bit).

The goal of this Lemma is to prove that $\sf{\mathcal{A}dv}$ cannot win $\sf{\mathsf{G}ame^{RInt}}$ except with negligible probability in this scenario.

To prove that, we show that even if $\sf{\mathcal{A}dv}$ learns and can manipulate the input query $q$ , a list of secret expressions $s$ , $p-1$ shares $K_{i}$ and $p-1$ share outputs $y_{i}$ (see Definition 3), he cannot force $\uppi$ QLB. $\sf{\mathsf{V}erif}$ (which is performed by the challenger) to produce a valid output without aborting. With a one-column intermediate table, $\uppi$ QLB. $\sf{\mathsf{E}val}$ outputs a single element $y_{i}\in\mathbb{G}$ and $\uppi$ QLB. $\sf{\mathsf{V}erif}$ returns a bit $R$ , where:

R=\begin{cases}0&\text{if $\sum_{j=1}^{p}y_{j}=0$}\\ 1&\text{if $\sum_{j=1}^{p}y_{j}=y$}\\ \perp&\text{otherwise}\end{cases}

(6)

We note that $y$ is generated in $\uppi$ QLB. $\sf{\mathsf{G}en}$ and cannot be inferred from $p-1$ shares (due to the secrecy property of FSS in Theorem 1). It also cannot be influenced by the input query $q$ or the secret expressions $s$ since $y$ is selected at random; in short $\sf{\mathcal{A}dv}$ cannot uncover or manipulate $y$ .

Recall that to win $\sf{\mathsf{G}ame^{RInt}}$ in this scenario, $\sf{\mathcal{A}dv}$ must output $R^{\prime}$ such that $R^{\prime}\neq\perp$ and $R^{\prime}$ must be different from the benign output produced by an honest execution of the protocol. Without loss of generality, we assume if all SPs are honest, $\uppi$ QLB. $\sf{\mathsf{E}val}$ will output $R=0$ ; thus to win the game, $\sf{\mathcal{A}dv}$ goal is to produce $R^{\prime}=1$ . Suppose $\sf{\mathcal{A}dv}$ can compromise all SPs except SP_p. Then, from Equation 6, to get $R^{\prime}=1$ , $\sf{\mathcal{A}dv}$ must output $y_{1}^{\sf{\mathcal{A}dv}},...,{y}_{p-1}^{\sf{\mathcal{A}dv}}$ such that $\sum_{j=1}^{p-1}y_{j}^{\sf{\mathcal{A}dv}}=y-y_{p}$ . Since $y$ and $y_{p}$ are random and unknown to $\sf{\mathcal{A}dv}$ , he does not have any other advantage of coming up with $y-y_{p}$ from $y_{1},...,y_{p-1}$ except for directly guessing one element in $\mathbb{G}$ and hoping that this element corresponds to $y-y_{p}$ . The probability of $\sf{\mathcal{A}dv}$ correctly guessing $y-y_{p}$ and thus breaking Lemma 1 is $\frac{1}{|\mathbb{G}|}=\frac{1}{\exp(\uplambda)}={\sf{\mathsf{n}egl}}(\uplambda)$ .

∎

To win $\sf{\mathsf{G}ame^{RInt}}$ for a generic scenario with an $l$ -column intermediate table and hence prove Theorem 3, $\sf{\mathcal{A}dv}$ must be able to trick the challenger to execute $\uppi$ QLB. $\sf{\mathsf{V}erif}$ and output an $l$ -bit $R^{\prime}$ such that (1) no abort has happened during execution and (2) $R^{\prime}$ is different from the benign value $R$ . However, in order to do that, $\sf{\mathcal{A}dv}$ must get at least one bit from $R^{\prime}$ to be different from the bit at the same location in $R$ . Doing so requires $\sf{\mathcal{A}dv}$ to break Lemma 1. Hence, $\sf{\mathcal{A}dv}$ can win $\sf{\mathsf{G}ame^{RInt}}$ with the same negligible probability that $\sf{\mathcal{A}dv}$ has of breaking Lemma 1. $\blacksquare$

V-C Complex Query Model

In this section, we explain how complex queries are executed. As mentioned, complex queries are the ones with the combination of $AND$ and $OR$ in the query condition. That is, the query can have more than two conditions. We support the following complex queries:

•

AND - Single Conditions: if the complex query is formed by the AND, we concatenate the conditions into one condition and behave the newly formed query as a simple model query. Assume that the query condition is $c_{1}=secret_{1}$ AND $c_{2}=secret_{2}$ AND … AND $c_{n}=secret_{n}$ , where each query condition is a single condition. $\uppi$ QLB concatenates these conditions into one: $C=c_{1}||c_{2}||...||c_{n}$ . This results in a single condition query of the simple model which $\uppi$ QLB can process, explained in $\S$ V-A.
•

OR - Single Condition: if the complex query is formed by the OR and the query conditions are disjoint, i.e., $c_{1}$ does not overlap with $c_{2}$ , then we create $f$ for each condition individually and then add all $f$ -s together. Assume that the query condition is $c_{1}=secret_{1}$ OR $c_{2}=secret_{2}$ OR $...$ OR $c_{n}=secret_{n}$ . $\uppi$ QLB generates $f$ for each $c_{i}$ using the simple model functions, then combine all the generated functions, i.e, $f=f_{c_{1}}+f_{c_{2}}+...+f_{c_{n}}$ . And finally it passes $f$ to the $\uppi$ QLB. $\sf{\mathsf{G}en}$ function, explained in the simple model above, $\S$ V-A

Below, we explain an example of a complex query.

Among the above queries, $q_{4}$ is a complex query as it contains $\wedge$ in the query condition. Now assume that the $SP$ receives $\{K_{i},q_{4}\prime\}$ . In this type of query, compared to the previous queries explained in $\S$ V-A, $SP$ performs one additional task when creating the intermediate table. It first creates a join column consisting of two columns in the query condition (i.e., for $q_{4}\prime$ , it is $Color$ and $Item$ ). And then it performs the $GROUPBY$ over this newly created column. Thus, in this example, $q_{4}\prime$ , $SP$ creates the intermediate table as shown in Figure 12 based on the joint $Color-Item$ columns. Later, it generates $BIN$ (see Figure 12) for the column $MAX(Price)$ . After the setup is finished, then it performs $comp$ and $merg$ on this intermediate table. The rest of this query is similar to execution of the rest of queries explained in $\S$ V-A.

VI Evaluation

To evaluate $\uppi$ QLB, we compared its performance, in terms of computation cost ( $\S$ VI-A), bandwidth ( $\S$ VI-B), and latency ( $\S$ VI-C), with two baselines: Splinter and Waldo. We selected these baselines as they both applied the FSS to provide query’s confidentiality, similar to $\uppi$ QLB.

VI-A Computation Comparison

Computation cost/complexity refers to the computations required to process the query and retrieve the data. This cost is $O(N\times logN)$ for Splinter, where $N$ is the number of database records. $\uppi$ QLB requires less asymptotic time, i.e., $O(N\times l)$ , where $l$ is a small constant indicating the number of bits required to represent each database column. Whereas Waldo’s computation cost grows linearly with the number of query predicates: $O(N\times p)$ .

We calculated the computation complexity of $\uppi$ QLB and then plotted its behaviour for different number of data records in the database. We then plotted the behaviour of Splinter and Waldo over the same set of data records used for $\uppi$ QLB. For Waldo and Splinter we used the computation complexity provided in the original papers. As depicted in Figure 13, $\uppi$ QLB performs slightly better compared to the baselines when the number of records in the database are small. However, when the record size increases, the performance of Waldo and Splinter deteriorates significantly compared to $\uppi$ QLB. More specifically, when the number of data records increases from $10^{8}$ to $10^{9}$ , we can see that $\uppi$ QLB performs faster (a magnitude of two) compared to Splinter. This difference is even more when we compare $\uppi$ QLB and Waldo, where for $10^{9}$ number of records $\uppi$ QLB performs over four times faster than Waldo (with $2-predicates$ ). When the number of predicates increases Waldo’s performance degrades the most compared to Splinter and $\uppi$ QLB.

VI-B Bandwidth Comparison

After we evaluated the computation cost of $\uppi$ QLB, we switched our attention to the bandwidth comparison. We calculated the client and server bandwidth separately and compared the $\uppi$ QLB’s results with the baselines. Figure 14 illustrates the evaluation results. For the client bandwidth, we compared $\uppi$ QLB with Waldo only as Splinter did not provide enough information on the paper to allow us to compute the client’s bandwidth. The results show that in $\uppi$ QLB, a client requires slightly less bandwidth to perform queries compared to Waldo. This is due to the fact that Waldo generates two pairs of FSS keys for each SP. That is, when the number of SPs are two, Waldo’s client generates in total $four$ pairs of keys to send to the SPs. While in $\uppi$ QLB client only generates one key for each SP.

However, the server’s bandwidth is computed based on: $(i),$ the round trips between client and server, $(ii),$ the size of the data that the server sends to client, and $(iii),$ the Maximum Transmission Unit, $MTU$ , each server has. We assume all the servers are running on AWS (Amazon Web Services) EC2 instances and each instance type is $m5.xlarge$ with $10\ Gbps$ network bandwidth and the $MTU$ is $1500$ bytes. As illustrated in Figure 14, the server’s bandwidth in $\uppi$ QLB is constant and independent from the number of records. This is exactly similar in Splinter for sum-based queries. However, Splinter requires $10$ times more bandwidth for TOPK, MIN, and MAX queries compared to $\uppi$ QLB. This is due to the extra communication between server and client for building the intermediate table in Splinter. Waldo, however, requires the highest bandwidth compared to $\uppi$ QLB and Splinter as it grows linearly when the number of records increases.

VI-C Latency Comparison

Last, we evaluated the latency. We assumed the minimum latency between client and the server (SP) for each message is $1\ ms$ . To plot the results, we computed the latency for one query over different size of records and repeated it for $\uppi$ QLB, Splinter, and Waldo (with $2,4,$ and $8$ predicates). Note that the number of predicates does not affect the latency of the $\uppi$ QLB and Splinter. However, the latency of Waldo varies depending on the number of predicates. As illustrated in Figure 15, $\uppi$ QLB has the lowest latency compared to the baselines. More specifically, when the number of records are small (below $2^{27}$ ), $\uppi$ QLB, Splinter, Waldo-2-predicates, and Waldo-4-predicates have almost the same latency, less than $2$ seconds. However, when the number of records increases, the latency surges significantly for Splinter and Waldo, while $\uppi$ QLB experiences a moderate increase in latency. For example, when the number of records is increased to $2^{28}$ , $\uppi$ QLB’s latency is still $1$ seconds; however, Splinter’s and Waldo-4-predicates’s latency surges to around $5$ seconds. For the maximum records of $2^{30}$ , $\uppi$ QLB’s latency is around $5$ seconds which is a factor of $2$ and $4$ times less than Waldo-2-predicates, and Splinter and Waldo-4-predicates, respectively. Waldo-8-predicates experiences the highest latency.

VII Related work

As previously elaborated, $\uppi$ QLB’s main goal is to provide query confidentiality and response integrity for performing search queries over a blockchain database. There has been a large body of studies conducted in a similar area (i.e., secure and private search over public data) and this section aims to describe such studies and put our work context.

Related work can be classified into three categories: $(i)$ , those providing query confidentiality, $(ii)$ , those enabling queries over encrypted database to protect the stored data, $(iii)$ those enabling queries over blockchain database. Compared to the work in $(i)$ , $\uppi$ QLB provides integrity for the query responses, improving the security by a large factor. The second classification aims to solve a different problem compared to $\uppi$ QLB, as they protect private and sensitive data from a malicious server while providing a search function. And the final classification is very closely related to $\uppi$ QLB, as $\uppi$ QLB is proposed for blockchain databases.

VII-A Queries on database with query confidentiality

VII-A1 PIR Systems [13, 14, 15, 16, 17, 18]

In $1995$ , Chor et al. [13] introduced private information retrieval (PIR) that allows a user to retrieve an element of a database in a setting where the server is untrusted (i.e., the client does not want to reveal to the server what information s/he is retrieving). In other word, the goal of PIR is to enable querying the $i$ -th item of a database with $n$ items, without revealing $i$ .

The introduction of PIR led to the generation of two new schemes: $(i)$ information theoretic PIR (IT-PIR), also known as multi-server PIR [13, 35, 36, 37]; and $(ii)$ computational PIR (CPIR), also known as single-server PIR [38, 39, 40, 14, 41, 42]. The IT-PIR scheme comprises of more than one server and each server hosts a replica of the database. The client sends a query to all the servers and combines all the responses received from the servers. IT-PIR relies on a strong security assumption; i.e., the servers are honest and do not collude together. Meanwhile, a limitation of CPIR relates to the high computational cost of the cryptographic operation on each element of the database.

VII-A2 Splinter [34]

In $2017$ , Wang et al. [34] applied FSS [30] cryptographic primitive to build a new query scheme, called Splinter, that facilitates private SQL-like queries over a database. The database is public, i.e., is accessible by everyone to read (but not to write) and it is based on a centralized setting (i.e., it is owned/controlled/modified by one organization/company). Splinter supports various types queries, such as SUM, COUNT, AVG, STDEV, MIN, MAX, and TOPK queries. Although it has achieved significantly better performance compared to previous private information retrieval systems, thanks to the use of FSS, it has some limitations: $(i)$ it does not support the integrity of the query response. That is, Splinter assumes that the service providers are honest (i.e., faithfully following the protocol); $(ii)$ Splinter is designed for the centralized settings, i.e., the database is controlled by one organization. Hence, it cannot be applied in the context of the public blockchain.

Splinter is closely related to $\uppi$ QLB as they both have applied FSS cryptographic primitive to facilitate private search over databases. Nonetheless, $\uppi$ QLB additionally addresses the aforementioned limitations, making query processing applicable to the blockchain setting.

VII-A3 Waldo [43]

It is a private time-series encrypted database that supports multi-predicate filtering while hiding the query content and search patterns. Waldo used FSS to achieve their security guarantee. It relies on the majority honest parties, and has defined up to $three$ parties. This means that they allow only one malicious party. They have achieved a better security compared to their related work based on time-series databases. We considered Waldo to be related to $\uppi$ QLB as both of them are based on FSS. Nonetheless, Waldo targets a private setting, where the user has a full control over the data on each service provider, which is not applicable to the public setting in blockchain.

VII-A4 Garbled circuit [19, 20, 21]

It is a cryptographic primitive that provides secure multiparty computations. Embark [44] applied this method to design a system that enables a cloud provider to support the outsourcing of network processing tasks (e.g., firewalling, NATs, web proxies, load balancers, and data exfiltration systems), similar the way that compute and storage are outsourced; while maintaining the client’s confidentiality. To achieve this, Embark first encrypts the traffic that reaches the cloud and then the cloud processes the encrypted traffic without decrypting it. Another work, namely BlindBox [45], also used Garbled circuits to perform deep-packet inspection directly on the encrypted traffic. BlindBox is suggested to be a good candidate for applications such as intrusion detection (IDS), exfiltration detection, and parental filtering. These works mainly have used garbeled circuits to perform private communication on a single untrusted server. The main issue with these proposals is the high computation and bandwidth costs for queries when the datasets are large as a new garbled circuit needs to be generated for each query.

VII-A5 ORAM

Oblivious RAM (ORAM) is a cryptographic primitive that protects the privacy of clients by hiding the memory access patterns seen by an untrusted storage server. It basically eliminates the information leakage in memory access traces. In an ORAM scheme, a client can accesses data blocks that are hosted on an untruseted server. However, for any two logical access sequences of the same length, the communications between a client and a server cannot be distinguished. Several work [46, 47, 48] have applied ORAM to develop a secure access model to stored data. However, ORAM is not applicable in the settings where a client is resourced constrained, i.e., it does not have enough memory to be able to load a large volume of data from the server. Additionally, ORAM allows clients to access/download the data that they have written on the server. Hence, it is not suitable for the case where the database is public where everyone can write/read to/from it.

VII-B Queries on encrypted databases

Several prior work [22, 23, 24, 25, 26, 27, 28, 29] proposed methods to run queries over an encrypted database. Data in such systems is encrypted before stored, and queries are executed over encrypted data. The aim here is to protect private data from an untrusted or a compromised server. Hence, the purpose of such systems is different from the one of $\uppi$ QLB, which aims to protect the confidentiality of queries over a public database, where servers could be malicious and the database is publicly accessible.

VII-C Queries on blockchain

Lastly, current search approaches for blockchain databases require users to maintain the entire blockchain to run queries (in order to support query integrity). This is not an economical approach as the maintenance cost of blockchain is significant and blockchain’s data size is considerably large. Many companies, including IBM [9], Oracle [10], FlureeDB[8], and BigchainDB [11] provided a search functionality for blockchain databases that unfortunately rely on a central trusted party that executes user’s queries. This obviously causes a single point of failure and confidentiality as well as integrity guarantees are not also supported.

In $2019$ , Xu et al. [12] proposed vChain to address a limitation in the previous search schemes for blockchain. vChain’s main focus was to provide the integrity for the search without requiring nodes to download the entire blockchain database. It enabled verifiable Boolean range queries for blockchain databases, where clients are able to verify the results of their searches. Although vChain did achieve a good search performance and provided integrity, it did not consider the confidentiality of clients’ queries.

VQL [49] is another work that has been proposed for search improvement on blockchain databases. To provide both efficient and verifiable data queries, VQL adds a Verifiable Query Layer to the blockchain systems. This middleware layer extracts transactions stored in the underlying blockchain system and reorganizes them in databases to provide various query services for public users. It has also applied a cryptographic hash value for each constructed database to prevent data being modified. The main issue with this approach is clearly the extra layer added to the blockchain system. This directly impacts the blockchain size. Additionally, it degrades the overall performance of the blockchain systems as reorganizing and creating the extra layer incur an overhead on top of the tasks the blockchain systems need to perform. More importantly, VQL does not provide the query confidentiality.

VIII Conclusion

This paper described $\uppi$ QLB, a query language for blockchain systems that ensures both confidentiality of query inputs and integrity of query results. To achieve confidentiality property $\uppi$ QLB has applied the function secret sharing primitive (FSS). And, to support integrity, we extended the traditional FSS setting in such a way that integrity of FSS results can be efficiently verified. $\uppi$ QLB also introduces relational data semantics into existing blockchain databases and enables SQL-like queries over the data. We evaluated the performance of $\uppi$ QLB in terms of bandwidth, latency, and computation costs. The evaluation results showed that $\uppi$ QLB performs better when compared with the baselines.

In future work, we aim to enhance the search performance by applying the indexing to the stored data and enabling the FSS keys to search over the indexed data. Moreover, we plan to explore system integration and deployment aspects in greater depth, where we will analyze the real-world implementation of $\uppi$ QLB and its integration with current blockchain architectures.

References

Nakamoto [2008] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” Decentralized Business Review, 2008.
Mohan [2017] C. Mohan, “Tutorial: Blockchains and databases,” Proceedings of the VLDB Endowment, vol. 10, no. 12, pp. 2000–2001, 2017.
Sohrabi and Tari [2020a] N. Sohrabi and Z. Tari, “Zyconchain: A scalable blockchain for general applications,” IEEE Access, vol. 8, pp. 158 893–158 910, 2020.
Dinh et al. [2017] T. T. A. Dinh, J. Wang, G. Chen, R. Liu, B. C. Ooi, and K.-L. Tan, “Blockbench: A framework for analyzing private blockchains,” in ACM International Conference on Management of Data (SIGMOD), 2017, pp. 1085–1100.
Dinh and al. [2018] A. Dinh and al., “Untangling blockchain: A data processing view of blockchain systems,” vol. 30, no. 7, pp. 1366–1385, 2018.
Wang et al. [2018] S. Wang, T. T. A. Dinh, Q. Lin, Z. Xie, M. Zhang, Q. Cai, G. Chen, B. C. Ooi, and P. Ruan, “Forkbase: An efficient storage engine for blockchain and forkable applications,” Proceedings of the VLDB Endowment, vol. 11, no. 10, 2018.
Sohrabi and Tari [2020b] N. Sohrabi and Z. Tari, “On the scalability of blockchain systems,” in 2020 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2020, pp. 124–133.
[8] “Flureedb.” [Online]. Available: https://flur.ee/
[9] “Ibm blockchain.” [Online]. Available: https://www.ibm.com/blockchain
[10] “Oracle blockchain.” [Online]. Available: https://www.oracle.com/blockchain/
[11] “Bigchaindb.” [Online]. Available: https://www.bigchaindb.com/
Xu et al. [2019] C. Xu, C. Zhang, and J. Xu, “vchain: Enabling verifiable boolean range queries over blockchain databases,” in ACM SIGMOD, 2019.
Chor et al. [1995] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information retrieval,” in IEEE FOCS, 1995.
Yi et al. [2013] X. Yi, R. Paulet, and E. Bertino, “Private information retrieval,” Synthesis Lectures on Information Security, Privacy, and Trust, vol. 4, no. 2, pp. 1–114, 2013.
Benny et al. [1998] C. Benny, K. Eyal, G. Oded, and S. Madhu, “Private information retrieval,” in Journal of the ACM (JACM), 1998, p. 965–981.
Ostrovsky and Skeith [2007] R. Ostrovsky and W. E. Skeith, “A survey of single-database private information retrieval: Techniques and applications,” in International Workshop on Public Key Cryptography, 2007.
Olumofin and Goldberg [2011] F. Olumofin and I. Goldberg, “Revisiting the computational practicality of private information retrieval,” in Financial Cryptography and Data Security, 2011.
Ongaro and Ousterhout [2014] D. Ongaro and J. Ousterhout, “In search of an understandable consensus algorithm,” in USENIX Annual Technical Conference (Usenix ATC), 2014.
Bellare et al. [2012] M. Bellare, T. Hiang, and P. Rogaway, “Foundations of garbled circuits,” 2012, pp. 784–796.
Goldwasser [1997] S. Goldwasser, “Multi party computations: past and present,” in 16th Annual ACM Symposium on Principles of Distributed Computing (SPDC), 1997, pp. 1–6.
Yakoubov [2017] S. Yakoubov, “A gentle introduction to yao’s garbled circuits,” preprint on webpage at https://web. mit. edu/sonka89/www/papers/2017ygc. pdf, 2017.
Popa et al. [2011] R. A. Popa, C. M. Redfield, N. Zeldovich, and H. Balakrishnan, “Cryptdb: protecting confidentiality with encrypted query processing,” in 23rd ACM Symposium on Operating Systems principles (SOSP), 2011, pp. 85–100.
Demertzis et al. [2020] I. Demertzis, D. Papadopoulos, C. Papamanthou, and S. Shintre, “ $\{$ SEAL $\}$ : Attack mitigation for encrypted databases via adjustable leakage,” in 29th USENIX Security Symposium (USENIX Security), 2020, pp. 2433–2450.
Fuller et al. [2017] B. Fuller, M. Varia, A. Yerukhimovich, E. Shen, A. Hamlin, V. Gadepally, R. Shay, J. D. Mitchell, and R. K. Cunningham, “Sok: Cryptographically protected database search,” in 2017 IEEE Symposium on Security and Privacy (S&P). IEEE, 2017, pp. 172–191.
Papadimitriou et al. [2016] A. Papadimitriou, R. Bhagwan, N. Chandran, R. Ramjee, A. Haeberlen, H. Singh, A. Modi, and S. Badrinarayanan, “Big data analytics over encrypted datasets with seabed,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016, pp. 587–602.
Pappas et al. [2014] V. Pappas, F. Krell, B. Vo, V. Kolesnikov, T. Malkin, S. G. Choi, W. George, A. Keromytis, and S. Bellovin, “Blind seer: A scalable private dbms,” in 2014 IEEE Symposium on Security and Privacy (S&P). IEEE, 2014, pp. 359–374.
Popa et al. [2014] R. A. Popa, E. Stark, S. Valdez, J. Helfer, N. Zeldovich, and H. Balakrishnan, “Building web applications on top of encrypted data using mylar,” in 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2014, pp. 157–172.
Mahajan et al. [2011] P. Mahajan, S. Setty, S. Lee, A. Clement, L. Alvisi, M. Dahlin, and M. Walfish, “Depot: Cloud storage with minimal trust,” ACM Transactions on Computer Systems (TOCS), vol. 29, no. 4, pp. 1–38, 2011.
Li et al. [2004] J. Li, M. N. Krohn, D. Mazieres, and D. E. Shasha, “Secure untrusted data repository (sundr).” in Osdi, vol. 4, 2004, pp. 9–9.
Boyle et al. [2015] E. Boyle, N. Gilboa, and Y. Ishai, “Function secret sharing,” in Eurocrypt. Springer, 2015.
Boyle et al. [2016] ——, “Function secret sharing: Improvements and extensions,” in ACM CCS, 2016.
de Castro and Polychroniadou [2022] L. de Castro and A. Polychroniadou, “Lightweight, maliciously secure verifiable function secret sharing,” in Eurocrypt. Springer, 2022.
Boyle et al. [2021] E. Boyle, N. Chandran, N. Gilboa, D. Gupta, Y. Ishai, N. Kumar, and M. Rathee, “Function secret sharing for mixed-mode and fixed-point secure computation,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 2021, pp. 871–900.
Wang et al. [2017] F. Wang, C. Yun, S. Goldwasser, V. Vaikuntanathan, and M. Zaharia, “Splinter: Practical private queries on public data,” in USENIX NSDI, 2017.
Demmler et al. [2014] D. Demmler, A. Herzberg, and T. Schneider, “Raid-pir: practical multi-server pir,” in 6th edition of the ACM Workshop on Cloud Computing Security, 2014, pp. 45–56.
Devet et al. [2012] C. Devet, I. Goldberg, and N. Heninger, “Optimally robust private information retrieval,” in 21st USENIX Security Symposium (USENIX Security), 2012, pp. 269–283.
Goldberg [2007] I. Goldberg, “Improving the robustness of private information retrieval,” in 2007 IEEE Symposium on Security and Privacy (S&P). IEEE, 2007, pp. 131–148.
Brakerski and Vaikuntanathan [2014] Z. Brakerski and V. Vaikuntanathan, “Efficient fully homomorphic encryption from (standard) lwe,” SIAM Journal on Computing, vol. 43, no. 2, pp. 831–871, 2014.
Cachin et al. [1999] C. Cachin, S. Micali, and M. Stadler, “Computationally private information retrieval with polylogarithmic communication,” in International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 1999, pp. 402–414.
Kushilevitz and Ostrovsky [1997] E. Kushilevitz and R. Ostrovsky, “Replication is not needed: Single database, computationally-private information retrieval,” in 38th annual Symposium on Foundations of Computer Science. IEEE, 1997, pp. 364–373.
Melchor et al. [2016] C. A. Melchor, J. Barrier, L. Fousse, and M.-O. Killijian, “Xpir: Private information retrieval for everyone,” Proceedings on Privacy Enhancing Technologies, pp. 155–174, 2016.
Angel et al. [2018] S. Angel, H. Chen, K. Laine, and S. Setty, “Pir with compressed queries and amortized query processing,” in 2018 IEEE symposium on security and privacy (S&P). IEEE, 2018, pp. 962–979.
Dauterman et al. [2022] E. Dauterman, M. Rathee, R. A. Popa, and I. Stoica, “Waldo: A private time-series database from function secret sharing,” IEEE S&P, 2022.
Lan et al. [2016] C. Lan, J. Sherry, R. A. Popa, S. Ratnasamy, and Z. Liu, “Embark: Securely outsourcing middleboxes to the cloud,” in 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2016, pp. 255–273.
Sherry et al. [2015] J. Sherry, C. Lan, R. A. Popa, and S. Ratnasamy, “Blindbox: Deep packet inspection over encrypted traffic,” in ACM Conference on Special Interest Group on Data Communication, 2015, pp. 213–226.
Lorch et al. [2013] J. R. Lorch, B. Parno, J. Mickens, M. Raykova, and J. Schiffman, “Shroud: ensuring private access to $\{$ large-scale $\}$ data in the data center,” in 11th USENIX Conference on File and Storage Technologies (FAST), 2013, pp. 199–213.
Stefanov et al. [2018] E. Stefanov, M. V. Dijk, E. Shi, T.-H. H. Chan, C. Fletcher, L. Ren, X. Yu, and S. Devadas, “Path oram: an extremely simple oblivious ram protocol,” Journal of the ACM (JACM), vol. 65, no. 4, pp. 1–26, 2018.
Ren et al. [2015] L. Ren, C. Fletcher, A. Kwon, E. Stefanov, E. Shi, M. Van Dijk, and S. Devadas, “Constants count: Practical improvements to oblivious $\{$ RAM $\}$ ,” in 24th USENIX Security Symposium (USENIX Security), 2015, pp. 415–430.
Wu et al. [2021] H. Wu, Z. Peng, S. Guo, Y. Yang, and B. Xiao, “Vql: efficient and verifiable cloud query services for blockchain systems,” IEEE TPDS, vol. 33, no. 6, pp. 1393–1406, 2021.

π\uppiQLB: A Privacy-preserving with Integrity-assuring Query Language for Blockchain

Abstract

Index Terms:

I Introduction

II Motivating Examples

II-A Use-case examples

II-B Query examples

III Background

III-A FSS Background

Definition 1

Theorem 1

III-B Using FSS for blockchain

IV π\uppiQLB

IV-A System Model and Syntax

IV-B Query Model

IV-C Threat Model

IV-D Security Properties

Definition 2

Definition 3

Theorem 2

Theorem 3

V Query Execution

V-A Simple Query Model

V-B Proof

V-B1 Query Confidentiality

Proof:

V-B2 Response Integrity

Lemma 1

Proof:

V-C Complex Query Model

VI Evaluation

VI-A Computation Comparison

VI-B Bandwidth Comparison

VI-C Latency Comparison

VII Related work

VII-A Queries on database with query confidentiality

VII-A1 PIR Systems [13, 14, 15, 16, 17, 18]

VII-A2 Splinter [34]

VII-A3 Waldo [43]

VII-A4 Garbled circuit [19, 20, 21]

VII-A5 ORAM

VII-B Queries on encrypted databases

VII-C Queries on blockchain

VIII Conclusion

References

$\uppi$ QLB: A Privacy-preserving with Integrity-assuring Query Language for Blockchain

IV $\uppi$ QLB