This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

π\uppiQLB: A Privacy-preserving with Integrity-assuring Query Language for Blockchain

Nasrin Sohrabi Deakin, Australia
[email protected]
   Norrathep Rattanavipanon PSU, Thailand
[email protected]
   Zahir Tari
RMIT, Australia
[email protected]
Abstract

The increase in the adoption of blockchain technology in different application domains e.g., healthcare systems, supply chain managements, has raised the demand for a data query mechanism on blockchain. Since current blockchain systems lack the support for querying data with embedded security and privacy guarantees, there exists inherent security and privacy concerns on those systems. In particular, existing systems require users to submit queries to blockchain operators (e.g., a node validator) in plaintext. This directly jeopardizes users’ privacy as the submitted queries may contain sensitive information, e.g., location or gender preferences, that the users may not be comfortable sharing with. On the other hand, currently, the only way for users to ensure integrity of the query result is to maintain the entire blockchain database and perform the queries locally. Doing so incurs high storage and computational costs on the users, precluding this approach to be practically deployable on common light-weight devices (e.g., smartphones).

To this end, this paper proposes π\uppiQLB, a query language for blockchain systems that ensures both confidentiality of query inputs and integrity of query results. Additionally, π\uppiQLB enables SQL-like queries over the blockchain data by introducing relational data semantics into the existing blockchain database. π\uppiQLB has applied the recent cryptography primitive, namely the function secret sharing (FSS), to achieve the confidentiality property. To support integrity, we propose to extend the traditional FSS setting in such a way that integrity of FSS results can be efficiently verified. Successful verification indicates absence of malicious behaviors on the servers, allowing the user to establish trust from the result. To the best of our knowledge, π\uppiQLB is the first query model designed for blockchain databases with the support for confidentiality, integrity, and SQL-like queries.

Index Terms:
Information Security, Blockchain, Private Data retrieval.

I Introduction

In 20082008, Nakamoto [1] proposed a new way to build a cash system without the need for having a central trusted authority, such as banks, which marked the generation of cryptocurrencies (e.g., Bitcoin and Ethereum). The innovation lied in building trust among parties where the parties cannot trust each other, yet they all agree on one value of the data. The new technology, blockchain, guarantees the integrity, immutability, and tractability of the stored data. Since then, blockchain has been gaining overwhelming attention and applied in many applications such as, energy trading, healthcare systems, security trading, supply chain management, machine learning (federating learning), and rental systems. A blockchain is defined as a decentralized distributed database that runs over a p-2-p (peer-to-peer) network. It is an append-only data structure (a growing list of ordered records) that is stored among all the peers of the p-2-p network [2, 3, 4, 5, 6, 7]. The ordered records, called blocks, are securely linked together using a cryptography primitive, a hash function, forming a chain of blocks. Peers in the underlying network hosting the blockchain do not trust each other; however, the technology yet guarantees the integrity of the data. This is achieved due to: (i)(i) adding the hash of the previous block into each block, and (ii)(ii) the underlying consensus protocol of the blockchain which guarantees the peers store the same replica of the blockchain at any given point of time.

From the database perspective, a blockchain is seen as a decentralised database that stores a large collection of records. Many distributed data-intensive applications have adopted the technology which as a result has increased the demand for querying the newly designed database. For example, in a cryptocurrency application (e.g., Bitcoin), a client may want to query the network to find all the transactions that have occurred during a specific time and the amount of transaction is between a range condition, e.g., 20<TransactionAmount<20020<Transaction_{-}Amount<200. In another example, consider a blockchain-based healthcare system, where the data stored on blockchain are sensitive information, such as patients’ diseases and records, the doctors who visited the patients, the medications given to the patients, etc. A client may want to query the network to find how many patients with a specific disease have visited a specific hospital over a period of time. Existing solutions to query data records in blockchains have several limitations. Some works, such as FlureeDB [8], IBM [9], Oracle [10] and Bigchain [11], provided a SQL-like query language that relies on a central trusted third party (see Figure 1). That is, the central server makes a separate copy of the blockchain in a format of a relational database and then enables SQL queries over the data. This approach obviously has a major limitation, i.e., relying on a central trusted server. The server might act maliciously and either does not provide all the results or returns wrong information to the client. This jeopardizes the integrity of the query results. Additionally, a client’s query may contain (and release) sensitive information about the querier. A malicious server may learn about this sensitive information and then use it for its own benefit, compromising the confidentiality of the user’s query. Later, in 20192019 Xu et al. [12] proposed vChain to address the query processing limitations of current blockchain databases. vChain removed the need for having a centralized trusted party to perform queries and addressed the integrity issue. It enabled a client to verify the result received from the servers. vChian fails however to provide confidentiality. Additionally, vChain supports only one type of queries, i.e., the Boolean range query (Figure 2 depicts a simplistic view of vChain). Hence, the need for having a secure, private and SQL-like query processing model for blockchains remains.

Refer to caption
Figure 1: An Overview of Query Processing model based on a central trusted third party.
Refer to caption
Figure 2: An Overview of Query Processing Proposed by vChain.

Existing approaches for private query processing have limitations, making them inapplicable for blockchain settings. Many private Information Retrieval (PIR) methods [13, 14, 15, 16, 17, 18] assume that a server is honest-but-curious. This assumption is rather strong in a public and unreliable domain, such as blockchain, where every node (honest or malicious) can be a server. Additionally, PIR methods often require multiple round trips to the server to compute the result and therefore consume bandwidth. Other methods based on garbled circuits [19, 20, 21] have high computational and bandwidth costs, making them not applicable for lightweight devices. Encrypted solutions e.g. [22, 23, 24, 25, 26, 27, 28, 29] are also not applicable for blockchain, as the data on the blockchain is not encrypted. ORAM is another solution proposed to protect data privacy by hiding the access patterns of clients from untrusted servers. The main challenge in ORAM is that a client can download/access only the data that s/he is uploaded/added to the server. This is a different setting in blockchain where everyone can add to the database and clients should be able to retrieve the data added by others. Such work or methods will be explained in details in the related work section, §\S VII.

To address the aforementioned issues, this paper proposes π\uppiQLB that aims to: (i)(i) facilitate SQL-like queries over blockchain data, (ii)(ii) provide confidentiality for the queries, and (iii)(iii) provide integrity for the query results. π\uppiQLB achieve this by applying the latest cryptography primitive, namely function secret sharing (FSS) [30]. Using FSS, π\uppiQLB divides a client’s query into several subqueries and sends each subquery to a miner node. Recall that a typical blockchain network comprises three types of nodes: full node, miner, and light node. The full node maintains a full copy of the blockchain database (block headers and data records), the miner node is a full node that participates in the mining process (generating new blocks), and the light node hosts only the block headers. In π\uppiQLB, every miner and full node can provide the query processing service to the client, and hence the term service provider (SP) is used in this paper to refer to any of such nodes. An overview of π\uppiQLB is provided in Figure 3.

Refer to caption
Figure 3: π\uppiQLB Architecture Overview

Consider a common application in a blockchain-based healthcare system that stores records of a tuple of <T,V,W><T,\ V,\ W>, where TT is the time, VV is the cost (it can refer to an insurance number), and WW is a set of attributes, W=<drid,ptid,drspec,disease,pres,hosp,loc>W=<dr_{-}id,\ pt_{-}id,\ dr_{-}spec,\ disease,\ pres,\ hosp,\ loc>, where driddr_{-}id corresponds to the doctor id (it can be an anonymous id depending on the security requirements of the system), ptidpt_{-}id is the patient id, drspecdr_{-}spec is the doctor’s specialties, the diseasedisease is the disease that has been diagnosed, prespres is the treatment that the doctor has prescribed for the patient, hosphosp is the hospital’s name, locloc is the geographical location of the hospital. The records are stored in blockchain database which is located in the SPs’ premises. A client (e.g., a patient) might want to query the blockchain, e.g., to find all the doctors with specific specialties in nearby hospitals, or to find how many times a specific disease has been diagnosed within a time period in a specific hospital. These queries are submitted to various SPs, and they may however contain sensitive information that clients might not feel comfortable sharing with the SPs. π\uppiQLB enables clients to hide information based on their specific choice when submitting queries to SPs (confidentiality) as well as to prevent malicious SPs from tampering with the query results (integrity).

As depicted in Figure 3, π\uppiQLB first divides the client’s query into several subqueries. When dividing a query, a client will decide which information needs to be kept hidden (and protected). Each subquery is then sent to a single SP, and the SPs perform the subqueries separately and send the result back to the client. The client combines the results while checking the integrity. Only when this integrity check passes, it then can obtain the final query output. With the assumption that at least one service provider (SP) is honest, π\uppiQLB is proven to provide query confidentiality and response integrity security properties. It also guarantees that the query is evaluated on all the expected records.

The rest of this paper is organized as follows. Motivating examples are provided in §\S II followed by a description of FSS in §\SIII. The architecture of π\uppiQLB is detailed in §\SIV and the evaluation results are provided in §\SVI. Existing work is elaborated in §\S VII and §\SVIII concludes the paper.

II Motivating Examples

This section provides use-case examples to motivate the importance of security and privacy of query processing in blockchain databases.

II-A Use-case examples

Consider a blockchain-based stock market where the data records in the blockchain are about the information related to the various stocks (e.g., share prices, available shares, transaction amount, number of shares). In such a system, the SPs will host the information related to the stock market.

To run transactions (e.g., buy, sell, or check the market prices on specific shares), a client will first need to query the system by sending a request to the SPs. The query will contain information that if the (potentially malicious) SP learns, it can impact the result. For example, if an SP learns that a specific stock (mentioned in a client’s query) is either a hot stock (i.e., many clients queried it) or people are bidding on opening price, then the malicious SP could tamper the result and influence the market (e.g., increase the actual price of a specific share). However, if integrity of the response can be verified, the malicious SP will be not able to carry out such an activity.

In another example, let us consider a blockchain-based rental-car system where car information (e.g., price, model, built-year, color, customer pick-up location, customer drop-off location, city, and rental-period) is the data recorded in the blocks. To rent cars, clients will first perform a search to find their preferred ones. To do so, clients send queries to various SPs. Such queries might however contain sensitive information (e.g., pick-up & drop-off locations, price ranges, or rental-period). If SPs learn such confidential information, it can jeopardize the client’s privacy.

II-B Query examples

π\uppiQLB enables running SQL-like queries over blockchain-based systems. In this section, we describe some examples of such queries.

Using the rental-car example, we assume that a client wishes to find the minimum price for a specific car model with a specific color. This client also prefers to hide the car model and the color of the car in the query. Hence, the client generates the following query:

\justify

SELECT MIN(priceprice) WHERE carmodelcar_{-}model = ? \wedge colorcolor = ?

Note that “??” indicates the sensitive information that π\uppiQLB must hide from SPs when sending the query.

Let us consider another example, namely a healthcare system. A client may need to find the total number of patients diagnosed with a specific disease in a specific hospital, or the total number of patients visited a specialist at a specific hospital/location.

Using π\uppiQLB, the client can hide the information s/he prefers (e.g., diseasedisease and drspecdr_{-}spec) by generating the following query:

\justify

SELECT COUNT(ptidpt_{-}id) WHERE diseasedisease =? \wedge drspecdr_{-}spec =?

III Background

This section provides first some background about function secret sharing (FSS) that serves as the main building block in π\uppiQLB. Then, we describe the problem stemming from directly applying FSS-based methods to our target setting.

III-A FSS Background

Let 𝔾\mathbb{G} denote an Abelian group and an integer p>1p>1. FSS [30, 31, 32] is defined by a tuple (𝖦𝖾𝗇\sf{\mathsf{G}en}, 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val}):

  • 𝖦𝖾𝗇(1λ,f)K1,,Kp{\sf{\mathsf{G}en}}(1^{\uplambda},f)\rightarrow K_{1},...,K_{p}: It gets a security parameter 1λ1^{\uplambda} and a function f:{0,1}n𝔾f:\{0,1\}^{n}\rightarrow\mathbb{G}, then splits ff into pp function shares (or keys) (K1,,KpK_{1},...,K_{p}).

  • 𝖤𝗏𝖺𝗅(Ki,x)yi{\sf{\mathsf{E}val}}(K_{i},x)\rightarrow y_{i}: It takes KiK_{i} and xx as input and outputs yiy_{i} which is the result corresponding to the party ii.

FSS provides both correctness and security. FSS’s correctness ensures that adding up all outputs from 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} is equivalent to the original ff, i.e., i=1pyi=y=f(x)\sum_{i=1}^{p}y_{i}=y=f(x). For security, FSS guarantees any strict subset of KiK_{i} does not reveal anything about the original function ff. The work in [30] provides a game-based definition of FSS security in Definition 1. This game models an adversary (𝒜𝖽𝗏\sf{\mathcal{A}dv}) capable of compromising p1p-1 shares and aiming to use this knowledge to identify which 𝒜𝖽𝗏\sf{\mathcal{A}dv}-generated function (f0f^{0} or f1f^{1}) is used to generate these p1p-1 shares. Theorem 1 states that an FSS scheme is considered secure if 𝒜𝖽𝗏\sf{\mathcal{A}dv} cannot distinguish f0f^{0} and f1f^{1} based on p1p-1 output shares with more than a negligible probability. In other words, without all pp shares, 𝒜𝖽𝗏\sf{\mathcal{A}dv} cannot infer anything about the original ff.

Definition 1

FSS Security Game (𝖦𝖺𝗆𝖾𝖥𝖲𝖲\sf{\mathsf{G}ame^{FSS}}[30]:

  1. 1.

    Let Σ=(𝖦𝖾𝗇,𝖤𝗏𝖺𝗅)\Sigma=({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}}).

  2. 2.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} selects two functions f0,f1:{0,1}n𝔾f^{0},f^{1}:\{0,1\}^{n}\rightarrow\mathbb{G} and gives these functions to the challenger.

  3. 3.

    The challenger samples b{0,1}b\leftarrow\{0,1\} and computes (K1,,Kp)𝖦𝖾𝗇(1λ,fb)(K_{1},...,K_{p})\leftarrow{\sf{\mathsf{G}en}}(1^{\uplambda},f^{b}). Then he gives p1p-1 shares, e.g., (K1,,Kp1)(K_{1},...,K_{p-1}), to 𝒜𝖽𝗏\sf{\mathcal{A}dv}.

  4. 4.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} outputs a guess b{0,1}b^{\prime}\in\{0,1\}. The game outputs 11 if b=bb^{\prime}=b; otherwise, it outputs 0.

Theorem 1

A scheme Σ=(𝖦𝖾𝗇,𝖤𝗏𝖺𝗅)\Sigma=({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}}) is secure if for all probabilistic polynomial-time 𝒜𝖽𝗏\sf{\mathcal{A}dv}, there exists a negligible function 𝗇𝖾𝗀𝗅\sf{\mathsf{n}egl} such that:

Pr[𝖦𝖺𝗆𝖾𝒜𝖽𝗏,Σ𝖥𝖲𝖲(λ)=1]𝗇𝖾𝗀𝗅(λ)Pr[{\sf{\mathsf{G}ame^{FSS}_{\sf{\mathcal{A}dv},\Sigma}}}(\uplambda)=1]\leq{\sf{\mathsf{n}egl}}(\uplambda)

In this work, we consider two types of ff for FSS:

  • Distributed point function fa,yf_{a,y} that is evaluated to a fixed element y𝔾y\in\mathbb{G} on a specific input aa, and to 0 otherwise, i.e.:

    fa,y(x)={y,if x=a0,otherwisef_{a,y}(x)=\begin{cases}y,&\text{if }\ x=a\\ 0,&\text{otherwise}\end{cases} (1)
  • Interval function fa,b,yf_{a,b,y} that is evaluated to y𝔾y\in\mathbb{G} only when an input is within a specific interval [a,b][a,b], and to 0 otherwise, i.e.:

    fa,b,y(x)={y,if axb0,otherwisef_{a,b,y}(x)=\begin{cases}y,&\text{if }\ a\leq x\leq b\\ 0,&\text{otherwise}\end{cases} (2)

For these two types, Theorem 1 implies confidentiality of aa, bb and yy, i.e., their values cannot be inferred from any of the function shares.

III-B Using FSS for blockchain

Blockchain can be considered as a public database. It is a decentralized distributed database where the data records are transparent for public, i.e., everyone can access the data. To query the public database with privacy-preserving guarantees, several studies have been conducted and different models have been proposed. Recently, with the introduction of FSS, several researchers proposed different models to privately query the public records [33, 31, 34]. In such models, a client generates keys using FSS with y=1y=1 in Eq.1 or 2, and sends these shares to the servers. A given server evaluates the function share on each record of the database, sums the evaluation together and returns the result to the client. The client upon receiving all the results adds them up and finds the result of the query.

In all these settings, while FSS guarantees confidentiality of client queries, a client is unable to verify the integrity of the results received from servers. In particular, a malicious server can introduce an error δ\delta to the final output without being detected by the client. This can be done by having the malicious server respond to the subquery with y𝒜𝖽𝗏+δy_{\sf{\mathcal{A}dv}}+\delta, instead of y𝒜𝖽𝗏y_{\sf{\mathcal{A}dv}}, where y𝒜𝖽𝗏y_{\sf{\mathcal{A}dv}} is a benign output from 𝖤𝗏𝖺𝗅(K𝒜𝖽𝗏,x){\sf{\mathsf{E}val}}(K_{\sf{\mathcal{A}dv}},x). In this case, after aggregating all outputs, the client obtains the incorrect result that is off from the expected value by δ\delta, i.e., (i=1,i𝒜𝖽𝗏pyi)+(y𝒜𝖽𝗏+δ)=(i=1pyi)+δ=y+δ=f(x)+δ(\sum_{i=1,i\neq\sf{\mathcal{A}dv}}^{p}y_{i})+(y_{\sf{\mathcal{A}dv}}+\delta)=(\sum_{i=1}^{p}y_{i})+\delta=y+\delta=f(x)+\delta.

In the blockchain context, verifiable integrity of the query response is of paramount importance due to the followings:

  • Blockchain is a decentralized system, i.e., there is no central authority controlling blockchain; hence, some servers might be malicious. It is therefore important that a client can verify the query response. This is different for public databases, as they are owned by organizations that can protect servers and detect malicious parties at early stages.

  • The size of a blockchain database is significantly high and lightweight devices are unable to download the entire database to perform queries locally. Hence, it is important to enable them to verify the query responses.

This work describes π\uppiQLB as a solution to address the shortcomings of existing models. π\uppiQLB makes use of FSS to enable private search of blockchain databases, while providing the integrity of the query results and retaining the original goal of ensuring the confidentiality of clients’ queries.

IV π\uppiQLB

IV-A System Model and Syntax

As shown in Figure 4, π\uppiQLB is comprised of two entities:

  • Service Provider (SP) is the full node or the miner in blockchain terminology. An SP retains a full replica of the blockchain database and is able to provide query processing services.

  • Client is a user of the system. It sends query requests to SPs. A client can be either a light node or an external node. As mentioned, a light node in the blockchain terminology refers to the node that only hosts the block header data. However, an external node is the node/user who does not hold any of he blockchain data and does not exist in the blockchain network. It only sends queries to SPs.

Refer to caption
Figure 4: π\uppiQLB Architecture Detail

In terms of algorithms, π\uppiQLB consists of a tuple (𝖦𝖾𝗇,𝖤𝗏𝖺𝗅,𝖵𝖾𝗋𝗂𝖿)({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}},{\sf{\mathsf{V}erif}}), defined as follows:

  • 𝖦𝖾𝗇(1λ,q,s)q,K1,,Kp{\sf{\mathsf{G}en}}(1^{\uplambda},q,s)\rightarrow q^{\prime},K_{1},...,K_{p}: It takes as input a security parameter 1λ1^{\uplambda}, a query with the format described in §\SIV-B and a set of secret expressions ss in qq. It then outputs pp function shares and a private query qq^{\prime} that removes ss value (i.e., replacing it with ?? as in Section II). This function is performed by the client.

  • 𝖤𝗏𝖺𝗅(Ki,D,q)yi{\sf{\mathsf{E}val}}(K_{i},D,q^{\prime})\rightarrow y_{i}: It takes as input a function share KiK_{i}, which corresponds to each SP, DD – the entire data from blockchain and a private query qq^{\prime}. It then performs an evaluation using KiK_{i} and qq^{\prime} on DD and outputs yiy_{i}. This function is performed by a SP.

  • 𝖵𝖾𝗋𝗂𝖿(y1,,yp)R{\sf{\mathsf{V}erif}}(y_{1},...,y_{p})\rightarrow R: It takes as input all the share outputs received from the SPs. Then, it verifies whether all yiy_{i}-s are honestly generated on the same blockchain data. If it detects any malformed yiy_{i}, it outputs \perp (i.e., aborts). Otherwise, it produces the query result RR. This function is performed by the client.

IV-B Query Model

π\uppiQLB is designed to facilitate the execution of SQL-liked queries on blockchain databases. It first limits the number of blocks that SPs need to search over, then transforms the SQL syntax into a format compatible with the blockchain system. It then uses FSS [31] to divide the query into pp shares (queries).

Fig 5 depicts the π\uppiQLB’s query format, where type refers to one of the following query types: SUM, COUNT, AVG, MIN and MAX queries. blk_range_cond specifies the time range, i.e., t1<blkrangecond<t2t_{1}<blk_{-}range_{-}cond<t_{2}, where t1t_{1} is the lower bound and t2t_{2} is the upper bound. This condition is necessary because in blockchain there is no table to be specified in order to narrow down the search on a database; but rather the full history of all the generated blocks since the genesis block. Hence, it is a must to narrow down the number of blocks that the SP needs to search over; otherwise, the search can be too expensive, time-consuming or may never stop. Thus, π\uppiQLB shortens the number of blocks by introducing the blk_range_cond, t1t_{1} must be bigger than tgt_{g}, which refers to the time that the genesis block was generated, and t2t_{2} must be lower than tct_{c}, which is the current time. π\uppiQLB has considered another condition for blk_range_cond (i.e., the difference between t1t_{1} and t2t_{2}) which should be limited and must not exceed a threshold τ\uptau. This is to ensure that the number of blocks generated between t1t_{1} and t2t_{2} does not exceed a given threshold. The value of τ\uptau depends on the block generation rate of the network. If the block generation rate is high, then τ\uptau must be low; and if the block generation rate is low, then τ\uptau can be bigger.

Refer to caption
Figure 5: Query format in π\uppiQLB

π\uppiQLB is designed to support two different conditions: (i)(i) single, and (ii)(ii) range condition.

  • Single condition is when the query condition has only one value. That is in the query format where expr in the condition equals to only one secret value. In this case, the 𝖦𝖾𝗇{\sf{\mathsf{G}en}} function defines ff based on Eq.1.

  • Range condition is when the condition is within a range interval. In this condition, the 𝖦𝖾𝗇{\sf{\mathsf{G}en}} function defines ff following Eq.2.

Additionally, π\uppiQLB is designed to support two query models: simple and complex. The simple model refers to when a query has only one condition (either single or range). The complex model refers to when a query is composed of several conditions using ANDAND and OROR. Note that, π\uppiQLB supports not more than one range condition in one query request.

Figure 4 illustrates how a query processing works in π\uppiQLB, which involves the following steps:

  1. 1.

    The client generates a query, qq and specifies the secret information (i.e., the information that the client prefers not to share with SPs) to specify the private query qq^{\prime}.

  2. 2.

    Using π\uppiQLB.𝖦𝖾𝗇\sf{\mathsf{G}en} function, the query is divided into several subqueries/function shares, depicted as fif_{i} in Figure 4.

  3. 3.

    The client forwards the subqueries to SPs.

  4. 4.

    SPs execute the subquery using π\uppiQLB.𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} function and return the results back to the client.

  5. 5.

    The client combines the results and generates the final output of the query using the π\uppiQLB.𝖵𝖾𝗋𝗂𝖿\sf{\mathsf{V}erif} function. This function enables the client to either abort or acquire the query’s response.

IV-C Threat Model

π\uppiQLB assumes that at least one of the SPs is honest, i.e., it always honestly follows the protocol and does not collude with other SPs. The rest of the SPs can be malicious. The malicious SP (𝒜𝖽𝗏\sf{\mathcal{A}dv}) may attempt to learn the sensitive information from the client’s query in order to set up different attacks. For example, in a blockchain-based stock market application, if 𝒜𝖽𝗏\sf{\mathcal{A}dv} finds out the shares that the client is interested in purchasing, 𝒜𝖽𝗏\sf{\mathcal{A}dv} may modify the price of the share to charge the client more; or 𝒜𝖽𝗏\sf{\mathcal{A}dv} may use the location information to endanger the client. Moreover, 𝒜𝖽𝗏\sf{\mathcal{A}dv} can also tamper with the query and the response. Tampering with the query means the adversary may search on a different query instead of the client-provided query. Tampering with the response leads to returning an incorrect or forged response.

We assume that the client is not malicious. A malicious client is a client that attempts to access private data records or tampers with the existing records in the server. This is inapplicable here because π\uppiQLB is proposed for the public data records, where the database is not private and not directly modifiable by the client. Hence, everyone can see the data publicly. π\uppiQLB also assumes that the communication between an SP and the client is over a secure channel (e.g., SSL) and the user of the system is not compromised, i.e., the client follows the protocol.

IV-D Security Properties

At high level, π\uppiQLB provides two security properties:

  1. 1.

    Query Confidentiality: it ensures that sensitive information from the query remains hidden to SPs.

  2. 2.

    Response Integrity: it guarantees detection when malicious SPs tamper with the query responses.

The Query Confidentiality property is defined using the security game provided in Definition 2. It models 𝒜𝖽𝗏\sf{\mathcal{A}dv} who can compromise p1p-1 SPs with the goal of learning some sensitive information about the query. It assumes 𝒜𝖽𝗏\sf{\mathcal{A}dv} has oracle access to all π\uppiQLB algorithms. 𝒜𝖽𝗏\sf{\mathcal{A}dv} is allowed to submit two queries to the challenger as long as these queries contain the same structure and differ only by the secret value in their query condition. The challenger randomly selects one of the two submitted queries, computes the private query qq^{\prime} and p1p-1 shares from the selected query and outputs them to 𝒜𝖽𝗏\sf{\mathcal{A}dv}. 𝒜𝖽𝗏\sf{\mathcal{A}dv} wins the game if it can identify which query is used by the challenger. 

Definition 2

Query Confidentiality Game (𝖦𝖺𝗆𝖾𝖰𝖢𝗈𝗇𝖿\sf{\mathsf{G}ame^{QConf}}):

  1. 1.

    Let Σ=(𝖦𝖾𝗇,𝖤𝗏𝖺𝗅,𝖵𝖾𝗋𝗂𝖿)\Sigma=({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}},{\sf{\mathsf{V}erif}}).

  2. 2.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} is given oracle access to Σ\Sigma and DD blockchain data.

  3. 3.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} selects a list of secret expressions ss and two queries q0,q1q_{0},q_{1}, where these queries differ only by the ss value, and gives s,q0,q1s,q_{0},q_{1} to the challenger.

  4. 4.

    The challenger samples b{0,1}b\leftarrow\{0,1\}, calls (q,K1,,Kp)𝖦𝖾𝗇(1λ,qb,s)(q^{\prime},K_{1},...,K_{p})\leftarrow{\sf{\mathsf{G}en}}(1^{\uplambda},q_{b},s), and outputs qq^{\prime} and p1p-1 shares (e.g., K1,,Kp1K_{1},...,K_{p-1}) to 𝒜𝖽𝗏\sf{\mathcal{A}dv}.

  5. 5.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} outputs a guess b{0,1}b^{\prime}\in\{0,1\}. The game outputs 11 if b=bb=b^{\prime}; otherwise, it outputs 0.

Next, we formalize the Response Integrity property with the security game provided in Definition 3. It models 𝒜𝖽𝗏\sf{\mathcal{A}dv} that aims is to manipulate the query response by gaining control of p1p-1 SPs. 

Definition 3

Reponse Integirty Game (𝖦𝖺𝗆𝖾𝖱𝖨𝗇𝗍\sf{\mathsf{G}ame^{RInt}}):

  1. 1.

    Let Σ=(𝖦𝖾𝗇,𝖤𝗏𝖺𝗅,𝖵𝖾𝗋𝗂𝖿)\Sigma=({\sf{\mathsf{G}en}},{\sf{\mathsf{E}val}},{\sf{\mathsf{V}erif}}).

  2. 2.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} is given oracle access to Σ\Sigma and DD blockchain data.

  3. 3.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} selects a query qq, a list of secret expressions ss and sends them to the challenger.

  4. 4.

    The challenger calls 𝖦𝖾𝗇(1λ,q,s){\sf{\mathsf{G}en}}(1^{\uplambda},q,s), producing q,K1,,Kpq^{\prime},K_{1},...,K_{p}. He then gives qq^{\prime} and p1p-1 shares (says K1,,Kp1K_{1},...,K_{p-1}) to 𝒜𝖽𝗏\sf{\mathcal{A}dv}.

  5. 5.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} produces y1𝒜𝖽𝗏,,yp1𝒜𝖽𝗏y_{1}^{\sf{\mathcal{A}dv}},...,y_{p-1}^{\sf{\mathcal{A}dv}} and gives them to the challenger.

  6. 6.

    The challenger computes yp=𝖤𝗏𝖺𝗅(Kp,D,q)y_{p}={\sf{\mathsf{E}val}}(K_{p},D,q^{\prime}) and R=𝖵𝖾𝗋𝗂𝖿(y1𝒜𝖽𝗏,,yp1𝒜𝖽𝗏,yp)R^{\prime}={\sf{\mathsf{V}erif}}(y_{1}^{\sf{\mathcal{A}dv}},...,y_{p-1}^{\sf{\mathcal{A}dv}},y_{p}). He also directly queries qq on DD, producing RR.

  7. 7.

    The game outputs 11 if RR^{\prime}\neq\perp and RRR^{\prime}\neq R,; otherwise, it outputs 0.

Ultimately, we want to prove Theorem 2 and Theorem 3 below to formally guarantees Query Confidentiality and Response Integrity, respectively, in π\uppiQLB. 

Theorem 2

π\uppiQLB provides Query Confidentiality if for all probabilistic polynomial-time 𝒜𝖽𝗏\sf{\mathcal{A}dv} there exists a negligible function 𝗇𝖾𝗀𝗅\sf{\mathsf{n}egl} such that:

Pr[𝖦𝖺𝗆𝖾𝒜𝖽𝗏,π𝖰𝖫𝖡𝖰𝖢𝗈𝗇𝖿(λ)=1]𝗇𝖾𝗀𝗅(λ)Pr[{\sf{\mathsf{G}ame^{QConf}_{\sf{\mathcal{A}dv},{\Large\uppi}{QLB}}}}(\uplambda)=1]\leq{\sf{\mathsf{n}egl}}(\uplambda)
Theorem 3

π\uppiQLB provides Response Integrity if for all probabilistic polynomial-time 𝒜𝖽𝗏\sf{\mathcal{A}dv} there exists a negligible function 𝗇𝖾𝗀𝗅\sf{\mathsf{n}egl} such that:

Pr[𝖦𝖺𝗆𝖾𝒜𝖽𝗏,π𝖰𝖫𝖡𝖱𝖨𝗇𝗍(λ)=1]𝗇𝖾𝗀𝗅(λ)Pr[{\sf{\mathsf{G}ame^{RInt}_{\sf{\mathcal{A}dv},{\Large\uppi}{QLB}}}}(\uplambda)=1]\leq{\sf{\mathsf{n}egl}}(\uplambda)

The proofs for these two theorems are provided in Section V-B after various concepts are introduced.

V Query Execution

For the simplicity we explain the execution of the queries with an example. Assume each block contains several records, each denoted by OO which represents an object/transaction id (note that each transaction or object that is generated has an id). And assume each record, say OiO_{i}, is a tuple of <ti,Vi,Wi><t_{i},V_{i},W_{i}>, where tt represents the time that the OO is generated; VV is a numerical value of OO, e.g., for a cryptocurrency VV is the transaction amount and in a non-cryptocurrency application it may refer to any numerical value required by that system (e.g., the amount of rental price in a blockchain-based rental car system or it could be an index value); WW is the set of attributes in the form of {Item,Price,Color}\{Item,\ Price,\ Color\}. Thus, a block content can be depicted as Figure 6. π\uppiQLB executes queries over WW. For simplicity, we assume a table representation of WW, as depicted in Figure 7.

Refer to caption
Figure 6: Block Content

To execute a query, π\uppiQLB first limits/specifies the number of blocks of the blockchain that it requires to search over based on the time window, denoted as blk_range_cond, which is provided in the query. Here, we assume that the blk_range_cond, returns four blocks (i.e., there are four blocks that their generation time falls between blk_range_cond) and each block contains only one record. Figure 8 depicts a table representation of the four blocks and their content.

Refer to caption
Figure 7: Block Representation
Refer to caption
Figure 8: Table

We next exemplify π\uppiQLB’s steps via five queries:

  • q1q_{1}: \justifySELECT SUM(PricePrice) FROM (1/06/2022)(1/06/2022) << blkrangecondblk_{-}range_{-}cond << (4/06/2022)(4/06/2022) WHERE ItemItem=22.

  • q2q_{2}: \justifySELECT COUNT(ItemItem) FROM (1/06/2022)(1/06/2022) << blkrangecondblk_{-}range_{-}cond << (4/06/2022)(4/06/2022) WHERE 4<Price<104<Price<10.

  • q3q_{3}: \justifySELECT AVG(PricePrice) FROM (1/06/2022)(1/06/2022) << blkrangecondblk_{-}range_{-}cond << (4/06/2022)(4/06/2022) WHERE ItemItem = 22.

  • q4q_{4}: \justifySELECT MAX(PricePrice) FROM (1/06/2022)(1/06/2022) << blkrangecondblk_{-}range_{-}cond << (4/06/2022)(4/06/2022) WHERE ItemItem = 22 \wedge Color=redColor=red

  • q5q_{5}: \justifySELECT MIN(PricePrice) FROM (1/06/2022)(1/06/2022) << blkrangecondblk_{-}range_{-}cond << (4/06/2022)(4/06/2022) WHERE 2Item52\leq Item\leq 5

Also, note that the qq^{\prime} denotes the query with the hidden information, i.e., the client’s sensitive information is hidden in qq^{\prime}. For example, q1q_{1}^{\prime} can be constructed as follows:

\justify

SELECT SUM(PricePrice) FROM (1/06/2022)(1/06/2022) << blkrangecondblk_{-}range_{-}cond << (4/06/2022)(4/06/2022) WHERE ItemItem = ?

V-A Simple Query Model

We now discuss how to construct π\uppiQLB to serve a query with a single condition.

𝖦𝖾𝗇(1λ,q,s){\sf{\mathsf{G}en}}(1^{\uplambda},q,s): depending on the query condition, single or range, this function first samples a random value y𝔾y\in\mathbb{G} and constructs the ff function from yy, where f:{0,1}n𝔾f:\{0,1\}^{n}\rightarrow\mathbb{G}. In this work, we assume the size of group 𝔾\mathbb{G} is exponential in terms of λ\uplambda. For instance, ff could be f:{0,1}n2λf:\{0,1\}^{n}\rightarrow\mathbb{Z}_{2^{\uplambda}}. Among the above query examples, q1q_{1} and q3q_{3} are single and q2q_{2} and q5q_{5} are range conditions. Thus, if we pass q1q_{1} and q2q_{2} to the 𝖦𝖾𝗇{\sf{\mathsf{G}en}} function, it behaves as follows:

  • Single condition – 𝖦𝖾𝗇(1λ,q1,{Item}){\sf{\mathsf{G}en}}(1^{\uplambda},q_{1},\{Item\}): 𝖦𝖾𝗇\sf{\mathsf{G}en} takes both q1q_{1} and the secret attribute Item as input. This signifies that Item is sensitive information and the client aims to hide its value in the query. Thus, 𝖦𝖾𝗇\sf{\mathsf{G}en} randomly samples yy from 𝔾\mathbb{G} and defines ff as follows:

    f(x)={y,if x=20,otherwisef(x)=\begin{cases}y,&\text{if }\ x=2\\ 0,&\text{otherwise}\end{cases} (3)
  • Range condition – 𝖦𝖾𝗇(1λ,q2,{Price}){\sf{\mathsf{G}en}}(1^{\uplambda},q_{2},\{Price\}): 𝖦𝖾𝗇\sf{\mathsf{G}en} takes as an input the q2q_{2} and the secret attribute Price. Again, it randomly samples yy from 𝔾\mathbb{G} and then defines the ff as follows:

    f(x)={y,if  1<x<100,otherwisef(x)=\begin{cases}y,&\text{if }\ 1<x<10\\ 0,&\text{otherwise}\end{cases} (4)

In this work, we assume that the range condition evaluates to one record only. To allow for multiple records within a specified range, π\uppiQLB can be extended by adopting the interval evaluation technique used by Splinter [34], which we leave for future work. After defining ff, it then calls FSS.𝖦𝖾𝗇(1λ,f){\sf{\mathsf{G}en}}(1^{\uplambda},f) to output pp function shares. The shares and the private query qq^{\prime} are sent to SPs in a format of {Ki,q}\{K_{i},q^{\prime}\}

𝖤𝗏𝖺𝗅(Ki,D,q){\sf{\mathsf{E}val}}(K_{i},D,q^{\prime}): the SP, upon receiving {Ki,q}\{K_{i},q^{\prime}\}, performs three operations on DD (blockchain data). First, it specifies/limits the number of blocks (to process the query) in DD according to blkrangecondblk_{-}range_{-}cond in the qq^{\prime}. Second, it creates an intermediate table depending on the query condition (single or range, which can be extracted from qq^{\prime}). If the query is a single condition query, the SPSP executes GROUPBYGROUPBY over the column in the query condition (to recall the query condition, see Figure 5). And if the query is a range condition query, then the SPSP creates the intermediate table over the column in the query type (to recall the query type, see Figure 5) without executing GROUPBYGROUPBY. Third, it generates the binary representation of the values of the column in the query type, denoted as BINBIN.

After that, 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} evaluates two sub-functions on the intermediate table: (i)(i) compcomp, stands for compute; and (ii)(ii), mergmerg, stands for merge. The compcomp is equivalent to FSS.𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} except it is executed over the binary representation of the values of the column in the query type, BINBIN. That is, compcomp takes as an input two values: (i)(i) cc that refers to the column in the condition, the value of which is hidden; and (ii)(ii) BINBIN; and performs the FSS.𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} function over these two inputs. compcomp is executed ll times, where ll is the number of columns of BINBIN, in other word, the maximum number of bits required to represent the binary value of the data, 2l2^{l}. The output of each execution of compcomp is given to the mergmerg function. mergmerg takes as input all the values generated from compcomp and merge/concatenate them together and generates a binary array, yiy_{i}, with ll elements. The yiy_{i} is the final output of 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} and is returned to the client. Algorithm 1 details the 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} function.

1yiy_{i}\leftarrow [];
2 cGetConditionColumn(D,q)c\leftarrow\text{GetConditionColumn}(D,q^{\prime});
3 interTableCreateIntermediaryTable(D,q)interTable\leftarrow\text{CreateIntermediaryTable}(D,q^{\prime});
4 BINGetBinaryRep(interTable,q)BIN\leftarrow\text{GetBinaryRep}(interTable,q^{\prime});
5 lnumColumn(BIN)l\leftarrow\text{numColumn}(BIN);
6 for k=0tolk=0~{}\text{to}~{}l do
7       bkcomp(Ki,c,BIN,k)b_{k}\leftarrow comp(K_{i},c,BIN,k) ; // see Eq.5
8       yimerg(yi,bk)y_{i}\leftarrow merg(y_{i},b_{k});
9      
10 end for
return yiy_{i}
Algorithm 1 π\uppiQLB.𝖤𝗏𝖺𝗅(Ki,D,q){\sf{\mathsf{E}val}}(K_{i},D,q^{\prime})

The behaviour of this function, specifically the compcomp sub-function, depends on the query types. We now explain this function in detail for each query type that π\uppiQLB supports, namely, SUM, AVG, COUNT, MIN, and MAX.

  • SUM: assume that the SPSP receives {Ki,q1}\{K_{i},q_{1}^{\prime}\}. SPSP generates the intermediate table for this query, depicted in Figure 9. Then, it generates the binary representation of the values of the column SUM(Price)SUM(Price), shown in Figure 9, as the BINBIN (recall that the size of each row in the BINBIN is ll, i.e., the maximum bits required to represent the binary value of the data). Finally, the 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} function is triggered. 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} function first executes the compcomp subfunction which takes as an input ItemItem and BINBIN as in Figure 9. One execution of the compcomp is as follows:

    bk=j=0nFSS.𝖤𝗏𝖺𝗅(Ki,cj)BIN[j][k]b_{k}=\displaystyle\sum_{j=0}^{n}FSS.{\sf{\mathsf{E}val}}(K_{i},c_{j})\cdot BIN[j][k] (5)

    This function is repeated for all the columns in BINBIN, Figure 9. The outputs of each execution of compcomp, i.e., bkb_{k}, are sent to the mergmerg subfunction, where the bkb_{k}-s are merged/concatenated into an array yiy_{i}. The final yiy_{i} is then returned to the client. Algorithm 1 depicts the 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} function execution.

    Refer to caption
    Figure 9: Intermediate Table and the binary representation of the column in query type generated for q1q_{1}.
  • COUNT: In this query type, the 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} execution is similar to the SUM-based query type. Assume SPSP receives {Ki,q2}\{K_{i},q_{2}\prime\}. It creates the intermediate table and BINBIN as depicted in Figure 10. Then, SPSP can apply compcomp sub-function for all columns in cc and appends the results into a single array yiy_{i}, which is in turn sent back to the client.

    Refer to caption
    Figure 10: Intermediate Table and the binary representation of the column in query type generated for q2q_{2}.
  • AVG: This type of query results in a similar execution of 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} as the SUM- and COUNT-based query types. Suppose SPSP receives {Ki,q3}\{K_{i},q_{3}\prime\}, we refer to Figure 11 for the resulting intermediate table and BINBIN variable in this query.

    Refer to caption
    Figure 11: Intermediate Table and the binary representation of the column in query type generated for q3q_{3}.
  • MAX: among above queries, the q4q_{4} is a MAX query which is an example of the complex query model; hence, we defer its explanation to §\S V-C.

  • MIN: if the SPSP receives Qi={Ki,q5}Q_{i}=\{K_{i},q_{5}\prime\}, it first creates the intermediate table and then performs the compcomp and mergmerg similar to the previous queries. We thus omit its detail for brevity.

𝖵𝖾𝗋𝗂𝖿\sf{\mathsf{V}erif} (y1,,yp)(y_{1},...,y_{p}): Upon receiving the results from all SPSP-s, the client executes the 𝖵𝖾𝗋𝗂𝖿\sf{\mathsf{V}erif} function to output the final results. Recall that yiy_{i} is a binary array of the outputs from the 𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} function executed by SPiSP_{i} on each column of the BINBIN matrix. Thus, in order to find the final result of the query, the client needs to perform the 𝖵𝖾𝗋𝗂𝖿\sf{\mathsf{V}erif} function on each bit of the yiy_{i}. To perform this, the client follows these steps: (i)(i), it creates a binary matrix of the received yiy_{i}-s; (ii)(ii), it adds the values of each column of the matrix and include it to a temptemp variable. if(temp==y)if(temp==y) then it adds a binary value 11 into the result array, otherwise if(temp==0)if(temp==0) then it adds a binary value 0 into the result array; if temptemp is neither yy nor 0, the client detects a misbehaviour and aborts the protocol. This could be due to the server being malicious or other errors in the systems. Algorithm 2 details the 𝖵𝖾𝗋𝗂𝖿\sf{\mathsf{V}erif} function.

1for i=0tolen(y1)i=0~{}\text{to}~{}len(y_{1}) do
2       tempj=0pyj[i]temp\leftarrow\sum_{j=0}^{p}y_{j}[i];
3       // Recall yy is selected in 𝖦𝖾𝗇\sf{\mathsf{G}en} 
4       if temptemp is yy then
5             R[i]1R[i]\leftarrow 1;
6       else if temptemp is 0 then
7             R[i]0R[i]\leftarrow 0;
8       else
9             // Detect misbehaviour 
10             return \perp;
11       end if
12      
13 end for
14return RR
Algorithm 2 π\uppiQLB.𝖵𝖾𝗋𝗂𝖿(y1,,yp){\sf{\mathsf{V}erif}}(y_{1},...,y_{p})

V-B Proof

This section provides proofs for Theorem 2 and Theorem 3.

V-B1 Query Confidentiality

Intuitively, π\uppiQLB inherits the secrecy property from the FSS primitive and therefore 𝒜𝖽𝗏\sf{\mathcal{A}dv} cannot infer anything about the secret value in the ff function, which is also considered sensitive in π\uppiQLB. To provide a more formal argument, we prove Theorem 2 via a reduction of 𝖦𝖺𝗆𝖾𝖥𝖲𝖲\sf{\mathsf{G}ame^{FSS}} in Definition 1 to 𝖦𝖺𝗆𝖾𝖰𝖢𝗈𝗇𝖿\sf{\mathsf{G}ame^{QConf}} in Definition 2. In other words, we aim to show that the existence of 𝒜𝖽𝗏\sf{\mathcal{A}dv} that breaks 𝖦𝖺𝗆𝖾𝖰𝖢𝗈𝗇𝖿\sf{\mathsf{G}ame^{QConf}} can be used to construct 𝒜𝖽𝗏𝖥𝖲𝖲\sf{\mathcal{A}dv}^{FSS} that breaks the FSS security guarantee in Theorem 1.

Proof:

Suppose there exists 𝒜𝖽𝗏\sf{\mathcal{A}dv} such that Pr[𝖦𝖺𝗆𝖾𝒜𝖽𝗏,π𝖰𝖫𝖡𝖰𝖢𝗈𝗇𝖿=1]>𝗇𝖾𝗀𝗅(λ)Pr[{\sf{\mathsf{G}ame^{QConf}_{\sf{\mathcal{A}dv},{\Large\uppi}{QLB}}}}=1]>{\sf{\mathsf{n}egl}}(\uplambda). 𝒜𝖽𝗏𝖥𝖲𝖲\sf{\mathcal{A}dv}^{FSS} asks 𝒜𝖽𝗏\sf{\mathcal{A}dv} to play 𝖦𝖺𝗆𝖾𝒜𝖽𝗏,π𝖰𝖫𝖡𝖰𝖢𝗈𝗇𝖿\sf{\mathsf{G}ame^{QConf}_{\sf{\mathcal{A}dv},{\Large\uppi}{QLB}}} and can simply use the game output bb^{\prime} to win 𝖦𝖺𝗆𝖾𝖥𝖲𝖲\sf{\mathsf{G}ame^{FSS}}. This works because we can construct the 𝖦𝖺𝗆𝖾𝖥𝖲𝖲\sf{\mathsf{G}ame^{FSS}} sequence from 𝖦𝖺𝗆𝖾𝖰𝖢𝗈𝗇𝖿\sf{\mathsf{G}ame^{QConf}} as follows:

  1. 1.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} can extract f0f^{0} and f1f^{1} from q0q_{0} and q1q_{1}. Since these queries are sent to the challenger in 𝖦𝖺𝗆𝖾𝖰𝖢𝗈𝗇𝖿\sf{\mathsf{G}ame^{QConf}}, these functions are also included in this message. (corresponding to Step 2 of 𝖦𝖺𝗆𝖾𝖥𝖲𝖲\sf{\mathsf{G}ame^{FSS}})

  2. 2.

    The challenger samples b{0,1}b\leftarrow\{0,1\} and invokes π\uppiQLB.𝖦𝖾𝗇\sf{\mathsf{G}en} using qbq_{b}. Since π\uppiQLB.𝖦𝖾𝗇\sf{\mathsf{G}en} derives fbf^{b} from qbq_{b} and calls FSS.𝖦𝖾𝗇\sf{\mathsf{G}en} on fbf^{b}, it means that in this step the challenger also computes (K1,,Kp)FSS.𝖦𝖾𝗇(1λ,fb)(K_{1},...,K_{p})\leftarrow FSS.{\sf{\mathsf{G}en}}(1^{\uplambda},f^{b}) and outputs p1p-1 shares to 𝒜𝖽𝗏\sf{\mathcal{A}dv}. (corresponding to Step 3 of 𝖦𝖺𝗆𝖾𝖥𝖲𝖲\sf{\mathsf{G}ame^{FSS}})

  3. 3.

    𝒜𝖽𝗏\sf{\mathcal{A}dv} outputs b{0,1}b^{\prime}\in\{0,1\}. (corresponding to Step 4 of 𝖦𝖺𝗆𝖾𝖥𝖲𝖲\sf{\mathsf{G}ame^{FSS}})

Therefore, 𝒜𝖽𝗏𝖥𝖲𝖲\sf{\mathcal{A}dv}^{FSS} can win 𝖦𝖺𝗆𝖾𝖥𝖲𝖲\sf{\mathsf{G}ame^{FSS}} with the same non-negligible probability that 𝒜𝖽𝗏\sf{\mathcal{A}dv} has of winning 𝖦𝖺𝗆𝖾𝖰𝖢𝗈𝗇𝖿\sf{\mathsf{G}ame^{QConf}}.

V-B2 Response Integrity

To prove the response integrity property, we first show π\uppiQLB satisfies 𝖦𝖺𝗆𝖾𝖱𝖨𝗇𝗍\sf{\mathsf{G}ame^{RInt}} in a relaxed scenario, where an intermediate table can be represented using only one bit, hence proving Lemma 1. Then, we can extend this proof to the generic scenario that requires an ll-column intermediate table, for arbitrary value of ll

Lemma 1

Theorem 3 holds for a relaxed scenario where blockchain data DD is transformable to a one-bit intermediate table for the input query qq

Proof:

In this special relaxed scenario, we consider that after SPs convert DD into an intermediate table, this table contains exactly 1 column (or 1 bit).

The goal of this Lemma is to prove that 𝒜𝖽𝗏\sf{\mathcal{A}dv} cannot win 𝖦𝖺𝗆𝖾𝖱𝖨𝗇𝗍\sf{\mathsf{G}ame^{RInt}} except with negligible probability in this scenario.

To prove that, we show that even if 𝒜𝖽𝗏\sf{\mathcal{A}dv} learns and can manipulate the input query qq, a list of secret expressions ss, p1p-1 shares KiK_{i} and p1p-1 share outputs yiy_{i} (see Definition 3), he cannot force π\uppiQLB.𝖵𝖾𝗋𝗂𝖿\sf{\mathsf{V}erif} (which is performed by the challenger) to produce a valid output without aborting. With a one-column intermediate table, π\uppiQLB.𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} outputs a single element yi𝔾y_{i}\in\mathbb{G} and π\uppiQLB.𝖵𝖾𝗋𝗂𝖿\sf{\mathsf{V}erif} returns a bit RR, where:

R={0if j=1pyj=01if j=1pyj=yotherwiseR=\begin{cases}0&\text{if $\sum_{j=1}^{p}y_{j}=0$}\\ 1&\text{if $\sum_{j=1}^{p}y_{j}=y$}\\ \perp&\text{otherwise}\end{cases} (6)

We note that yy is generated in π\uppiQLB.𝖦𝖾𝗇\sf{\mathsf{G}en} and cannot be inferred from p1p-1 shares (due to the secrecy property of FSS in Theorem 1). It also cannot be influenced by the input query qq or the secret expressions ss since yy is selected at random; in short 𝒜𝖽𝗏\sf{\mathcal{A}dv} cannot uncover or manipulate yy.

Recall that to win 𝖦𝖺𝗆𝖾𝖱𝖨𝗇𝗍\sf{\mathsf{G}ame^{RInt}} in this scenario, 𝒜𝖽𝗏\sf{\mathcal{A}dv} must output RR^{\prime} such that RR^{\prime}\neq\perp and RR^{\prime} must be different from the benign output produced by an honest execution of the protocol. Without loss of generality, we assume if all SPs are honest, π\uppiQLB.𝖤𝗏𝖺𝗅\sf{\mathsf{E}val} will output R=0R=0; thus to win the game, 𝒜𝖽𝗏\sf{\mathcal{A}dv} goal is to produce R=1R^{\prime}=1. Suppose 𝒜𝖽𝗏\sf{\mathcal{A}dv} can compromise all SPs except SPp. Then, from Equation 6, to get R=1R^{\prime}=1, 𝒜𝖽𝗏\sf{\mathcal{A}dv} must output y1𝒜𝖽𝗏,,yp1𝒜𝖽𝗏y_{1}^{\sf{\mathcal{A}dv}},...,{y}_{p-1}^{\sf{\mathcal{A}dv}} such that j=1p1yj𝒜𝖽𝗏=yyp\sum_{j=1}^{p-1}y_{j}^{\sf{\mathcal{A}dv}}=y-y_{p}. Since yy and ypy_{p} are random and unknown to 𝒜𝖽𝗏\sf{\mathcal{A}dv}, he does not have any other advantage of coming up with yypy-y_{p} from y1,,yp1y_{1},...,y_{p-1} except for directly guessing one element in 𝔾\mathbb{G} and hoping that this element corresponds to yypy-y_{p}. The probability of 𝒜𝖽𝗏\sf{\mathcal{A}dv} correctly guessing yypy-y_{p} and thus breaking Lemma 1 is 1|𝔾|=1exp(λ)=𝗇𝖾𝗀𝗅(λ)\frac{1}{|\mathbb{G}|}=\frac{1}{\exp(\uplambda)}={\sf{\mathsf{n}egl}}(\uplambda).

To win 𝖦𝖺𝗆𝖾𝖱𝖨𝗇𝗍\sf{\mathsf{G}ame^{RInt}} for a generic scenario with an ll-column intermediate table and hence prove Theorem 3, 𝒜𝖽𝗏\sf{\mathcal{A}dv} must be able to trick the challenger to execute π\uppiQLB.𝖵𝖾𝗋𝗂𝖿\sf{\mathsf{V}erif} and output an ll-bit RR^{\prime} such that (1) no abort has happened during execution and (2) RR^{\prime} is different from the benign value RR. However, in order to do that, 𝒜𝖽𝗏\sf{\mathcal{A}dv} must get at least one bit from RR^{\prime} to be different from the bit at the same location in RR. Doing so requires 𝒜𝖽𝗏\sf{\mathcal{A}dv} to break Lemma 1. Hence, 𝒜𝖽𝗏\sf{\mathcal{A}dv} can win 𝖦𝖺𝗆𝖾𝖱𝖨𝗇𝗍\sf{\mathsf{G}ame^{RInt}} with the same negligible probability that 𝒜𝖽𝗏\sf{\mathcal{A}dv} has of breaking Lemma 1. \blacksquare

V-C Complex Query Model

In this section, we explain how complex queries are executed. As mentioned, complex queries are the ones with the combination of ANDAND and OROR in the query condition. That is, the query can have more than two conditions. We support the following complex queries:

  • AND - Single Conditions: if the complex query is formed by the AND, we concatenate the conditions into one condition and behave the newly formed query as a simple model query. Assume that the query condition is c1=secret1c_{1}=secret_{1} AND c2=secret2c_{2}=secret_{2} AND … AND cn=secretnc_{n}=secret_{n}, where each query condition is a single condition. π\uppiQLB concatenates these conditions into one: C=c1||c2||||cnC=c_{1}||c_{2}||...||c_{n}. This results in a single condition query of the simple model which π\uppiQLB can process, explained in §\SV-A.

  • OR - Single Condition: if the complex query is formed by the OR and the query conditions are disjoint, i.e., c1c_{1} does not overlap with c2c_{2}, then we create ff for each condition individually and then add all ff-s together. Assume that the query condition is c1=secret1c_{1}=secret_{1} OR c2=secret2c_{2}=secret_{2} OR ... OR cn=secretnc_{n}=secret_{n}. π\uppiQLB generates ff for each cic_{i} using the simple model functions, then combine all the generated functions, i.e, f=fc1+fc2++fcnf=f_{c_{1}}+f_{c_{2}}+...+f_{c_{n}}. And finally it passes ff to the π\uppiQLB.𝖦𝖾𝗇\sf{\mathsf{G}en} function, explained in the simple model above, §\SV-A

Below, we explain an example of a complex query.

Among the above queries, q4q_{4} is a complex query as it contains \wedge in the query condition. Now assume that the SPSP receives {Ki,q4}\{K_{i},q_{4}\prime\}. In this type of query, compared to the previous queries explained in §\SV-A, SPSP performs one additional task when creating the intermediate table. It first creates a join column consisting of two columns in the query condition (i.e., for q4q_{4}\prime, it is ColorColor and ItemItem). And then it performs the GROUPBYGROUPBY over this newly created column. Thus, in this example, q4q_{4}\prime, SPSP creates the intermediate table as shown in Figure 12 based on the joint ColorItemColor-Item columns. Later, it generates BINBIN (see Figure 12) for the column MAX(Price)MAX(Price). After the setup is finished, then it performs compcomp and mergmerg on this intermediate table. The rest of this query is similar to execution of the rest of queries explained in §\S V-A.

Refer to caption
Figure 12: Intermediate Table and the binary representation of the column in query type generated for q4q_{4}.

VI Evaluation

To evaluate π\uppiQLB, we compared its performance, in terms of computation cost (§\SVI-A), bandwidth (§\SVI-B), and latency (§\SVI-C), with two baselines: Splinter and Waldo. We selected these baselines as they both applied the FSS to provide query’s confidentiality, similar to π\uppiQLB.

VI-A Computation Comparison

Computation cost/complexity refers to the computations required to process the query and retrieve the data. This cost is O(N×logN)O(N\times logN) for Splinter, where NN is the number of database records. π\uppiQLB requires less asymptotic time, i.e., O(N×l)O(N\times l), where ll is a small constant indicating the number of bits required to represent each database column. Whereas Waldo’s computation cost grows linearly with the number of query predicates: O(N×p)O(N\times p).

We calculated the computation complexity of π\uppiQLB and then plotted its behaviour for different number of data records in the database. We then plotted the behaviour of Splinter and Waldo over the same set of data records used for π\uppiQLB. For Waldo and Splinter we used the computation complexity provided in the original papers. As depicted in Figure 13, π\uppiQLB performs slightly better compared to the baselines when the number of records in the database are small. However, when the record size increases, the performance of Waldo and Splinter deteriorates significantly compared to π\uppiQLB. More specifically, when the number of data records increases from 10810^{8} to 10910^{9}, we can see that π\uppiQLB performs faster (a magnitude of two) compared to Splinter. This difference is even more when we compare π\uppiQLB and Waldo, where for 10910^{9} number of records π\uppiQLB performs over four times faster than Waldo (with 2predicates2-predicates). When the number of predicates increases Waldo’s performance degrades the most compared to Splinter and π\uppiQLB.

Refer to caption
Figure 13: Computation cost comparison

VI-B Bandwidth Comparison

After we evaluated the computation cost of π\uppiQLB, we switched our attention to the bandwidth comparison. We calculated the client and server bandwidth separately and compared the π\uppiQLB’s results with the baselines. Figure 14 illustrates the evaluation results. For the client bandwidth, we compared π\uppiQLB with Waldo only as Splinter did not provide enough information on the paper to allow us to compute the client’s bandwidth. The results show that in π\uppiQLB, a client requires slightly less bandwidth to perform queries compared to Waldo. This is due to the fact that Waldo generates two pairs of FSS keys for each SP. That is, when the number of SPs are two, Waldo’s client generates in total fourfour pairs of keys to send to the SPs. While in π\uppiQLB client only generates one key for each SP.

Refer to caption
Figure 14: Client and server bandwidth comparison

However, the server’s bandwidth is computed based on: (i),(i), the round trips between client and server, (ii),(ii), the size of the data that the server sends to client, and (iii),(iii), the Maximum Transmission Unit, MTUMTU, each server has. We assume all the servers are running on AWS (Amazon Web Services) EC2 instances and each instance type is m5.xlargem5.xlarge with 10Gbps10\ Gbps network bandwidth and the MTUMTU is 15001500 bytes. As illustrated in Figure 14, the server’s bandwidth in π\uppiQLB is constant and independent from the number of records. This is exactly similar in Splinter for sum-based queries. However, Splinter requires 1010 times more bandwidth for TOPK, MIN, and MAX queries compared to π\uppiQLB. This is due to the extra communication between server and client for building the intermediate table in Splinter. Waldo, however, requires the highest bandwidth compared to π\uppiQLB and Splinter as it grows linearly when the number of records increases.

VI-C Latency Comparison

Last, we evaluated the latency. We assumed the minimum latency between client and the server (SP) for each message is 1ms1\ ms. To plot the results, we computed the latency for one query over different size of records and repeated it for π\uppiQLB, Splinter, and Waldo (with 2,4,2,4, and 88 predicates). Note that the number of predicates does not affect the latency of the π\uppiQLB and Splinter. However, the latency of Waldo varies depending on the number of predicates. As illustrated in Figure 15, π\uppiQLB has the lowest latency compared to the baselines. More specifically, when the number of records are small (below 2272^{27}), π\uppiQLB, Splinter, Waldo-2-predicates, and Waldo-4-predicates have almost the same latency, less than 22 seconds. However, when the number of records increases, the latency surges significantly for Splinter and Waldo, while π\uppiQLB experiences a moderate increase in latency. For example, when the number of records is increased to 2282^{28}, π\uppiQLB’s latency is still 11 seconds; however, Splinter’s and Waldo-4-predicates’s latency surges to around 55 seconds. For the maximum records of 2302^{30}, π\uppiQLB’s latency is around 55 seconds which is a factor of 22 and 44 times less than Waldo-2-predicates, and Splinter and Waldo-4-predicates, respectively. Waldo-8-predicates experiences the highest latency.

Refer to caption
Figure 15: Latency comparison

VII Related work

As previously elaborated, π\uppiQLB’s main goal is to provide query confidentiality and response integrity for performing search queries over a blockchain database. There has been a large body of studies conducted in a similar area (i.e., secure and private search over public data) and this section aims to describe such studies and put our work context.

Related work can be classified into three categories: (i)(i), those providing query confidentiality, (ii)(ii), those enabling queries over encrypted database to protect the stored data, (iii)(iii) those enabling queries over blockchain database. Compared to the work in (i)(i), π\uppiQLB provides integrity for the query responses, improving the security by a large factor. The second classification aims to solve a different problem compared to π\uppiQLB, as they protect private and sensitive data from a malicious server while providing a search function. And the final classification is very closely related to π\uppiQLB, as π\uppiQLB is proposed for blockchain databases.

VII-A Queries on database with query confidentiality

VII-A1 PIR Systems [13, 14, 15, 16, 17, 18]

In 19951995, Chor et al. [13] introduced private information retrieval (PIR) that allows a user to retrieve an element of a database in a setting where the server is untrusted (i.e., the client does not want to reveal to the server what information s/he is retrieving). In other word, the goal of PIR is to enable querying the ii-th item of a database with nn items, without revealing ii.

The introduction of PIR led to the generation of two new schemes: (i)(i) information theoretic PIR (IT-PIR), also known as multi-server PIR [13, 35, 36, 37]; and (ii)(ii) computational PIR (CPIR), also known as single-server PIR [38, 39, 40, 14, 41, 42]. The IT-PIR scheme comprises of more than one server and each server hosts a replica of the database. The client sends a query to all the servers and combines all the responses received from the servers. IT-PIR relies on a strong security assumption; i.e., the servers are honest and do not collude together. Meanwhile, a limitation of CPIR relates to the high computational cost of the cryptographic operation on each element of the database.

VII-A2 Splinter [34]

In 20172017, Wang et al. [34] applied FSS [30] cryptographic primitive to build a new query scheme, called Splinter, that facilitates private SQL-like queries over a database. The database is public, i.e., is accessible by everyone to read (but not to write) and it is based on a centralized setting (i.e., it is owned/controlled/modified by one organization/company). Splinter supports various types queries, such as SUM, COUNT, AVG, STDEV, MIN, MAX, and TOPK queries. Although it has achieved significantly better performance compared to previous private information retrieval systems, thanks to the use of FSS, it has some limitations: (i)(i) it does not support the integrity of the query response. That is, Splinter assumes that the service providers are honest (i.e., faithfully following the protocol); (ii)(ii) Splinter is designed for the centralized settings, i.e., the database is controlled by one organization. Hence, it cannot be applied in the context of the public blockchain.

Splinter is closely related to π\uppiQLB as they both have applied FSS cryptographic primitive to facilitate private search over databases. Nonetheless, π\uppiQLB additionally addresses the aforementioned limitations, making query processing applicable to the blockchain setting.

VII-A3 Waldo [43]

It is a private time-series encrypted database that supports multi-predicate filtering while hiding the query content and search patterns. Waldo used FSS to achieve their security guarantee. It relies on the majority honest parties, and has defined up to threethree parties. This means that they allow only one malicious party. They have achieved a better security compared to their related work based on time-series databases. We considered Waldo to be related to π\uppiQLB as both of them are based on FSS. Nonetheless, Waldo targets a private setting, where the user has a full control over the data on each service provider, which is not applicable to the public setting in blockchain.

VII-A4 Garbled circuit [19, 20, 21]

It is a cryptographic primitive that provides secure multiparty computations. Embark [44] applied this method to design a system that enables a cloud provider to support the outsourcing of network processing tasks (e.g., firewalling, NATs, web proxies, load balancers, and data exfiltration systems), similar the way that compute and storage are outsourced; while maintaining the client’s confidentiality. To achieve this, Embark first encrypts the traffic that reaches the cloud and then the cloud processes the encrypted traffic without decrypting it. Another work, namely BlindBox [45], also used Garbled circuits to perform deep-packet inspection directly on the encrypted traffic. BlindBox is suggested to be a good candidate for applications such as intrusion detection (IDS), exfiltration detection, and parental filtering. These works mainly have used garbeled circuits to perform private communication on a single untrusted server. The main issue with these proposals is the high computation and bandwidth costs for queries when the datasets are large as a new garbled circuit needs to be generated for each query.

VII-A5 ORAM

Oblivious RAM (ORAM) is a cryptographic primitive that protects the privacy of clients by hiding the memory access patterns seen by an untrusted storage server. It basically eliminates the information leakage in memory access traces. In an ORAM scheme, a client can accesses data blocks that are hosted on an untruseted server. However, for any two logical access sequences of the same length, the communications between a client and a server cannot be distinguished. Several work [46, 47, 48] have applied ORAM to develop a secure access model to stored data. However, ORAM is not applicable in the settings where a client is resourced constrained, i.e., it does not have enough memory to be able to load a large volume of data from the server. Additionally, ORAM allows clients to access/download the data that they have written on the server. Hence, it is not suitable for the case where the database is public where everyone can write/read to/from it.

VII-B Queries on encrypted databases

Several prior work [22, 23, 24, 25, 26, 27, 28, 29] proposed methods to run queries over an encrypted database. Data in such systems is encrypted before stored, and queries are executed over encrypted data. The aim here is to protect private data from an untrusted or a compromised server. Hence, the purpose of such systems is different from the one of π\uppiQLB, which aims to protect the confidentiality of queries over a public database, where servers could be malicious and the database is publicly accessible.

VII-C Queries on blockchain

Lastly, current search approaches for blockchain databases require users to maintain the entire blockchain to run queries (in order to support query integrity). This is not an economical approach as the maintenance cost of blockchain is significant and blockchain’s data size is considerably large. Many companies, including IBM [9], Oracle [10], FlureeDB[8], and BigchainDB [11] provided a search functionality for blockchain databases that unfortunately rely on a central trusted party that executes user’s queries. This obviously causes a single point of failure and confidentiality as well as integrity guarantees are not also supported.

In 20192019, Xu et al. [12] proposed vChain to address a limitation in the previous search schemes for blockchain. vChain’s main focus was to provide the integrity for the search without requiring nodes to download the entire blockchain database. It enabled verifiable Boolean range queries for blockchain databases, where clients are able to verify the results of their searches. Although vChain did achieve a good search performance and provided integrity, it did not consider the confidentiality of clients’ queries.

VQL [49] is another work that has been proposed for search improvement on blockchain databases. To provide both efficient and verifiable data queries, VQL adds a Verifiable Query Layer to the blockchain systems. This middleware layer extracts transactions stored in the underlying blockchain system and reorganizes them in databases to provide various query services for public users. It has also applied a cryptographic hash value for each constructed database to prevent data being modified. The main issue with this approach is clearly the extra layer added to the blockchain system. This directly impacts the blockchain size. Additionally, it degrades the overall performance of the blockchain systems as reorganizing and creating the extra layer incur an overhead on top of the tasks the blockchain systems need to perform. More importantly, VQL does not provide the query confidentiality.

VIII Conclusion

This paper described π\uppiQLB, a query language for blockchain systems that ensures both confidentiality of query inputs and integrity of query results. To achieve confidentiality property π\uppiQLB has applied the function secret sharing primitive (FSS). And, to support integrity, we extended the traditional FSS setting in such a way that integrity of FSS results can be efficiently verified. π\uppiQLB also introduces relational data semantics into existing blockchain databases and enables SQL-like queries over the data. We evaluated the performance of π\uppiQLB in terms of bandwidth, latency, and computation costs. The evaluation results showed that π\uppiQLB performs better when compared with the baselines.

In future work, we aim to enhance the search performance by applying the indexing to the stored data and enabling the FSS keys to search over the indexed data. Moreover, we plan to explore system integration and deployment aspects in greater depth, where we will analyze the real-world implementation of π\uppiQLB and its integration with current blockchain architectures.

References

  • Nakamoto [2008] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” Decentralized Business Review, 2008.
  • Mohan [2017] C. Mohan, “Tutorial: Blockchains and databases,” Proceedings of the VLDB Endowment, vol. 10, no. 12, pp. 2000–2001, 2017.
  • Sohrabi and Tari [2020a] N. Sohrabi and Z. Tari, “Zyconchain: A scalable blockchain for general applications,” IEEE Access, vol. 8, pp. 158 893–158 910, 2020.
  • Dinh et al. [2017] T. T. A. Dinh, J. Wang, G. Chen, R. Liu, B. C. Ooi, and K.-L. Tan, “Blockbench: A framework for analyzing private blockchains,” in ACM International Conference on Management of Data (SIGMOD), 2017, pp. 1085–1100.
  • Dinh and al. [2018] A. Dinh and al., “Untangling blockchain: A data processing view of blockchain systems,” vol. 30, no. 7, pp. 1366–1385, 2018.
  • Wang et al. [2018] S. Wang, T. T. A. Dinh, Q. Lin, Z. Xie, M. Zhang, Q. Cai, G. Chen, B. C. Ooi, and P. Ruan, “Forkbase: An efficient storage engine for blockchain and forkable applications,” Proceedings of the VLDB Endowment, vol. 11, no. 10, 2018.
  • Sohrabi and Tari [2020b] N. Sohrabi and Z. Tari, “On the scalability of blockchain systems,” in 2020 IEEE International Conference on Cloud Engineering (IC2E).   IEEE, 2020, pp. 124–133.
  • [8] “Flureedb.” [Online]. Available: https://flur.ee/
  • [9] “Ibm blockchain.” [Online]. Available: https://www.ibm.com/blockchain
  • [10] “Oracle blockchain.” [Online]. Available: https://www.oracle.com/blockchain/
  • [11] “Bigchaindb.” [Online]. Available: https://www.bigchaindb.com/
  • Xu et al. [2019] C. Xu, C. Zhang, and J. Xu, “vchain: Enabling verifiable boolean range queries over blockchain databases,” in ACM SIGMOD, 2019.
  • Chor et al. [1995] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information retrieval,” in IEEE FOCS, 1995.
  • Yi et al. [2013] X. Yi, R. Paulet, and E. Bertino, “Private information retrieval,” Synthesis Lectures on Information Security, Privacy, and Trust, vol. 4, no. 2, pp. 1–114, 2013.
  • Benny et al. [1998] C. Benny, K. Eyal, G. Oded, and S. Madhu, “Private information retrieval,” in Journal of the ACM (JACM), 1998, p. 965–981.
  • Ostrovsky and Skeith [2007] R. Ostrovsky and W. E. Skeith, “A survey of single-database private information retrieval: Techniques and applications,” in International Workshop on Public Key Cryptography, 2007.
  • Olumofin and Goldberg [2011] F. Olumofin and I. Goldberg, “Revisiting the computational practicality of private information retrieval,” in Financial Cryptography and Data Security, 2011.
  • Ongaro and Ousterhout [2014] D. Ongaro and J. Ousterhout, “In search of an understandable consensus algorithm,” in USENIX Annual Technical Conference (Usenix ATC), 2014.
  • Bellare et al. [2012] M. Bellare, T. Hiang, and P. Rogaway, “Foundations of garbled circuits,” 2012, pp. 784–796.
  • Goldwasser [1997] S. Goldwasser, “Multi party computations: past and present,” in 16th Annual ACM Symposium on Principles of Distributed Computing (SPDC), 1997, pp. 1–6.
  • Yakoubov [2017] S. Yakoubov, “A gentle introduction to yao’s garbled circuits,” preprint on webpage at https://web. mit. edu/sonka89/www/papers/2017ygc. pdf, 2017.
  • Popa et al. [2011] R. A. Popa, C. M. Redfield, N. Zeldovich, and H. Balakrishnan, “Cryptdb: protecting confidentiality with encrypted query processing,” in 23rd ACM Symposium on Operating Systems principles (SOSP), 2011, pp. 85–100.
  • Demertzis et al. [2020] I. Demertzis, D. Papadopoulos, C. Papamanthou, and S. Shintre, “{\{SEAL}\}: Attack mitigation for encrypted databases via adjustable leakage,” in 29th USENIX Security Symposium (USENIX Security), 2020, pp. 2433–2450.
  • Fuller et al. [2017] B. Fuller, M. Varia, A. Yerukhimovich, E. Shen, A. Hamlin, V. Gadepally, R. Shay, J. D. Mitchell, and R. K. Cunningham, “Sok: Cryptographically protected database search,” in 2017 IEEE Symposium on Security and Privacy (S&P).   IEEE, 2017, pp. 172–191.
  • Papadimitriou et al. [2016] A. Papadimitriou, R. Bhagwan, N. Chandran, R. Ramjee, A. Haeberlen, H. Singh, A. Modi, and S. Badrinarayanan, “Big data analytics over encrypted datasets with seabed,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016, pp. 587–602.
  • Pappas et al. [2014] V. Pappas, F. Krell, B. Vo, V. Kolesnikov, T. Malkin, S. G. Choi, W. George, A. Keromytis, and S. Bellovin, “Blind seer: A scalable private dbms,” in 2014 IEEE Symposium on Security and Privacy (S&P).   IEEE, 2014, pp. 359–374.
  • Popa et al. [2014] R. A. Popa, E. Stark, S. Valdez, J. Helfer, N. Zeldovich, and H. Balakrishnan, “Building web applications on top of encrypted data using mylar,” in 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2014, pp. 157–172.
  • Mahajan et al. [2011] P. Mahajan, S. Setty, S. Lee, A. Clement, L. Alvisi, M. Dahlin, and M. Walfish, “Depot: Cloud storage with minimal trust,” ACM Transactions on Computer Systems (TOCS), vol. 29, no. 4, pp. 1–38, 2011.
  • Li et al. [2004] J. Li, M. N. Krohn, D. Mazieres, and D. E. Shasha, “Secure untrusted data repository (sundr).” in Osdi, vol. 4, 2004, pp. 9–9.
  • Boyle et al. [2015] E. Boyle, N. Gilboa, and Y. Ishai, “Function secret sharing,” in Eurocrypt.   Springer, 2015.
  • Boyle et al. [2016] ——, “Function secret sharing: Improvements and extensions,” in ACM CCS, 2016.
  • de Castro and Polychroniadou [2022] L. de Castro and A. Polychroniadou, “Lightweight, maliciously secure verifiable function secret sharing,” in Eurocrypt.   Springer, 2022.
  • Boyle et al. [2021] E. Boyle, N. Chandran, N. Gilboa, D. Gupta, Y. Ishai, N. Kumar, and M. Rathee, “Function secret sharing for mixed-mode and fixed-point secure computation,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2021, pp. 871–900.
  • Wang et al. [2017] F. Wang, C. Yun, S. Goldwasser, V. Vaikuntanathan, and M. Zaharia, “Splinter: Practical private queries on public data,” in USENIX NSDI, 2017.
  • Demmler et al. [2014] D. Demmler, A. Herzberg, and T. Schneider, “Raid-pir: practical multi-server pir,” in 6th edition of the ACM Workshop on Cloud Computing Security, 2014, pp. 45–56.
  • Devet et al. [2012] C. Devet, I. Goldberg, and N. Heninger, “Optimally robust private information retrieval,” in 21st USENIX Security Symposium (USENIX Security), 2012, pp. 269–283.
  • Goldberg [2007] I. Goldberg, “Improving the robustness of private information retrieval,” in 2007 IEEE Symposium on Security and Privacy (S&P).   IEEE, 2007, pp. 131–148.
  • Brakerski and Vaikuntanathan [2014] Z. Brakerski and V. Vaikuntanathan, “Efficient fully homomorphic encryption from (standard) lwe,” SIAM Journal on Computing, vol. 43, no. 2, pp. 831–871, 2014.
  • Cachin et al. [1999] C. Cachin, S. Micali, and M. Stadler, “Computationally private information retrieval with polylogarithmic communication,” in International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 1999, pp. 402–414.
  • Kushilevitz and Ostrovsky [1997] E. Kushilevitz and R. Ostrovsky, “Replication is not needed: Single database, computationally-private information retrieval,” in 38th annual Symposium on Foundations of Computer Science.   IEEE, 1997, pp. 364–373.
  • Melchor et al. [2016] C. A. Melchor, J. Barrier, L. Fousse, and M.-O. Killijian, “Xpir: Private information retrieval for everyone,” Proceedings on Privacy Enhancing Technologies, pp. 155–174, 2016.
  • Angel et al. [2018] S. Angel, H. Chen, K. Laine, and S. Setty, “Pir with compressed queries and amortized query processing,” in 2018 IEEE symposium on security and privacy (S&P).   IEEE, 2018, pp. 962–979.
  • Dauterman et al. [2022] E. Dauterman, M. Rathee, R. A. Popa, and I. Stoica, “Waldo: A private time-series database from function secret sharing,” IEEE S&P, 2022.
  • Lan et al. [2016] C. Lan, J. Sherry, R. A. Popa, S. Ratnasamy, and Z. Liu, “Embark: Securely outsourcing middleboxes to the cloud,” in 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2016, pp. 255–273.
  • Sherry et al. [2015] J. Sherry, C. Lan, R. A. Popa, and S. Ratnasamy, “Blindbox: Deep packet inspection over encrypted traffic,” in ACM Conference on Special Interest Group on Data Communication, 2015, pp. 213–226.
  • Lorch et al. [2013] J. R. Lorch, B. Parno, J. Mickens, M. Raykova, and J. Schiffman, “Shroud: ensuring private access to {\{large-scale}\} data in the data center,” in 11th USENIX Conference on File and Storage Technologies (FAST), 2013, pp. 199–213.
  • Stefanov et al. [2018] E. Stefanov, M. V. Dijk, E. Shi, T.-H. H. Chan, C. Fletcher, L. Ren, X. Yu, and S. Devadas, “Path oram: an extremely simple oblivious ram protocol,” Journal of the ACM (JACM), vol. 65, no. 4, pp. 1–26, 2018.
  • Ren et al. [2015] L. Ren, C. Fletcher, A. Kwon, E. Stefanov, E. Shi, M. Van Dijk, and S. Devadas, “Constants count: Practical improvements to oblivious {\{RAM}\},” in 24th USENIX Security Symposium (USENIX Security), 2015, pp. 415–430.
  • Wu et al. [2021] H. Wu, Z. Peng, S. Guo, Y. Yang, and B. Xiao, “Vql: efficient and verifiable cloud query services for blockchain systems,” IEEE TPDS, vol. 33, no. 6, pp. 1393–1406, 2021.