Communication-Efficient Search under Fully Homomorphic Encryption for Federated Machine Learning

Dongfang Zhao
University of Washington
[email protected]

Abstract

Homomorphic encryption (HE) has found extensive utilization in federated learning (FL) systems, capitalizing on its dual advantages: (i) ensuring the confidentiality of shared models contributed by participating entities, and (ii) enabling algebraic operations directly on ciphertexts representing encrypted models. Particularly, the approximate fully homomorphic encryption (FHE) scheme, known as CKKS, has emerged as the de facto encryption scheme, notably supporting decimal numbers. While recent research predominantly focuses on enhancing CKKS’s encryption rate and evaluation speed in the context of FL, the search operation has been relatively disregarded due to the tendency of some applications to discard intermediate encrypted models. Yet, emerging studies emphasize the importance of managing and searching intermediate models for specific applications like large-scale scientific computing, necessitating robust data provenance and auditing support. To address this, our paper introduces an innovative approach that efficiently searches for a target encrypted value, incurring only a logarithmic number of network interactions. The proposed method capitalizes on CKKS’s additive and multiplicative properties on encrypted models, propagating equality comparisons between values through a balanced binary tree structure to ultimately reach a single aggregate. A comprehensive analysis of the proposed algorithm underscores its potential to significantly broaden FL’s applicability and impact.

Introduction

Homomorphic encryption (HE) has been widely employed in federated learning (FL) (Aharoni et al. 2023), offering a groundbreaking solution to the critical challenge of preserving data privacy while enabling collaborative machine learning. By allowing computations to be performed on encrypted data without the need for decryption, HE maintains the confidentiality of individual data contributions throughout the federated learning process. This cryptographic technique has gained prominence due to its ability to facilitate the secure aggregation of model updates from decentralized devices or servers, thereby ensuring that sensitive information remains protected. As federated learning continues to emerge as a powerful paradigm for training machine learning models across distributed environments, the integration of HE not only safeguards the privacy of participants but also paves the way for efficient and privacy-preserving collaborative AI advancements.

Due to the nature of Federated Learning (FL) applications, the data types primarily consist of decimal values. This prevalence of decimal data is a result of the diverse range of domains that utilize FL, spanning areas such as healthcare, finance, and IoT. The use of decimal values is often driven by the need to capture and process nuanced information with high precision, as is the case with medical measurements, financial transactions, and sensor readings. In FL, where machine learning models are collaboratively trained across decentralized devices or servers while keeping data localized, the representation of decimal values becomes essential for maintaining the integrity and accuracy of the information being shared. Consequently, addressing the challenges and intricacies associated with handling decimal data within the FL framework becomes a crucial endeavor, influencing the design of secure and efficient protocols for privacy-preserving federated learning.

Because CKKS (Cheon et al. 2017) (Cheon-Kim-Kim-Song) remains the only Homomorphic Encryption (HE) scheme supporting decimal values, recent research efforts have been concentrated on optimizing CKKS for various applications. Specifically, these endeavors revolve around enhancing two pivotal aspects: the encryption of model updates and the aggregation of local models. CKKS’s unique capability to operate on real numbers directly makes it a vital candidate for preserving the precision and accuracy required by decimal-based datasets, commonly encountered in domains like finance, scientific computing, and cryptography. As a result, refining the encryption techniques for model updates ensures that the privacy of sensitive data is upheld during collaborative training in Federated Learning (FL). Additionally, focusing on the aggregation of local models under the constraints of CKKS aims to guarantee efficient and accurate fusion of contributions from various devices or servers in the FL paradigm. This concerted research in CKKS optimization not only advances the capabilities of HE for real-world applications but also contributes to the continued evolution of privacy-preserving technologies in the era of decentralized and collaborative computing.

Although encryption and evaluation operations have been extensively studied in the realm of Homomorphic Encryption (HE), there remains a notable gap in research pertaining to other equally crucial operations, particularly the search operation on CKKS ciphertexts. While much attention has been directed toward enabling computations on encrypted data, the unique challenges and intricacies associated with searching encrypted data have received relatively limited exploration. This discrepancy in focus is particularly significant considering the pivotal role that search operations play in various applications, such as scientific computing, where data provenance and auditing are of paramount importance. In fields like scientific research, where maintaining the integrity and origin of data is critical, the ability to search encrypted data in a privacy-preserving manner offers a compelling avenue for advancing security and accountability. As CKKS is a promising HE scheme known for its support of real number arithmetic and precision, exploring search operations within this framework has the potential to reshape the landscape of secure data management and utilization, leading to novel solutions that align with the demands of modern decentralized and privacy-conscious computing paradigms.

This paper proposes a worst-case optimal algorithm for searching in the CKKS ciphertexts. Specifically, the proposed algorithm incurs only a logarithmic number of interactions between the client and the server. Our algorithm does not require the expensive preprocessing, e.g., sorting, over the ciphertext on the server, which takes $\mathcal{O}(n\log n)$ ; rather, the server only applies a linear number of algebraic operations over the ciphertexts. The key idea is to leverage the additive and multiplicative homomorphism in CKKS and construct a balanced binary tree where each intermediate node stores some compaction information in a fashion similar to a Merkle tree (Merkle 1988).

We start with some preliminaries and related work.

Preliminaries and Related Work

Federated Learning

The term federated learning (FL) was initially coined in the work by McMahan et al. (McMahan et al. 2017), aiming to facilitate distributed machine learning where clients can contribute without compromising their private data. While FL has gained substantial preference over conventional machine learning approaches, there remain several areas for improvement that researchers consistently strive to address. For instance, Cui et al. (Cui et al. 2021) introduced Fair and Consistent Federated Learning (FCFL), a method designed to mitigate disparities and performance inconsistencies in FL. In a similar vein, Gong et al. (Gong et al. 2022) proposed the application of ensemble cross-domain knowledge distillation to safeguard privacy within the FL framework. Hamer et al. (Hamer, Mohri, and Suresh 2020) leveraged ensemble algorithms to alleviate communication costs in both server-to-client and client-to-server interactions in FL. Furthermore, Reisizadeh et al. (Reisizadeh et al. 2020) presented FedPAQ, a method addressing communication and scalability challenges within the FL paradigm. As FL remains a dynamic field, these innovative approaches collectively contribute to its ongoing refinement and optimization.

Attacks on Federated Learning

The threat of a poisoning attack looms large as one of the primary risks in the realm of machine learning models. For instance, when adversaries or malicious clients deliberately inject compromised samples into the model’s training dataset (Alfeld, Zhu, and Barford 2016), this form of attack is referred to as a data poisoning attack. In parallel, a malicious client could also upload a tampered model to the aggregator, constituting a model poisoning attack (also known as an adversarial attack) (Li et al. 2020). This category encompasses numerous well-documented attacks, including the likes of backdoor attacks (Yang et al. 2019; Bhagoji et al. 2019; Bagdasaryan et al. 2020; Xie et al. 2020). To counteract these threats, various defense strategies have emerged within the realm of Federated Learning (FL), broadly categorized as either server-based or client-based approaches. Sun et al. (Sun et al. 2021) introduced a client-based defense coined as White Blood Cell for Federated Learning (FL-WBC), aimed at mitigating model poisoning attacks that have infiltrated the global model. In a server-based defense, Yin et al. (Yin et al. 2018) proposed robust aggregation to bolster FL’s resilience against model poisoning attacks, while Sun et al. (Sun et al. 2019) employed local update clipping for similar purposes. Krum (Blanchard et al. 2017) harnessed the similarity among benign clients’ local updates to fend off poisoning attacks in FL. Additionally, Shejwalkar and Houmansadr (Shejwalkar and Houmansadr 2021) devised a defense strategy named divide-and-conquer to counter FL poisoning threats effectively. These pioneering endeavors collectively contribute to the development of a fortified FL landscape, with strategies aimed at safeguarding against the ever-evolving spectrum of poisoning attacks.

Homomorphic Encryption

Homomorphic encryption, a groundbreaking cryptographic technique, empowers computations on encrypted data while upholding the privacy of the underlying information. An encryption function $f()$ exhibits the trait of being additive homomorphic when it satisfies the equation $f^{-1}\left(f(a)\oplus f(b)\right)=a+b$ , with $f^{-1}()$ signifying the decryption function and $\oplus$ representing the binary operation within the scope of $f()$ . The concept of fully homomorphic encryption (FHE) broadens this attribute to encompass both additive and multiplicative operations. Although diverse FHE schemes, including BFV (Fan and Vercauteren 2012) and CKKS (Cheon et al. 2017), have been conceived, their integration into Homomorphic Encryption for Federated Learning remains limited due to their pronounced computational overhead. Instead, HE schemes that exclusively support additive or multiplicative homomorphism exhibit greater efficiency and have thus found their way into the fold of Privacy-Preserving Machine Learning (PPML) systems (Savvides, Khandelwal, and Eugster 2020). A pertinent example is the Paillier (Paillier 1999) encryption scheme, which has been harnessed in endeavors like (Hardy et al. 2017; Zhang et al. 2020) to ensure privacy preservation within the context of federated learning.

CKKS Scheme

The CKKS (Cheon et al. 2017) homomorphic encryption scheme represents a transformative advancement in the field of cryptography, offering a potent solution for secure computation on encrypted data while accommodating real-number arithmetic. At its heart, CKKS employs a unique representation of ciphertext as a pair of polynomials, strategically leveraging mathematical operations on these polynomials to facilitate computations on encrypted values. This distinctive approach not only preserves the precision of real-number data but also ensures the confidentiality of sensitive information. CKKS’s dual capability to perform both additive and multiplicative operations on encrypted data renders it highly adept for applications requiring precise computations on private data. However, the realization of these benefits comes with certain challenges. The arithmetic operations in CKKS tend to enlarge ciphertext and introduce noise, necessitating countermeasures such as relinearization to manage the growth in ciphertext size. This process optimizes the performance of multiplication operations while maintaining manageable ciphertext dimensions. Additionally, the increased noise due to homomorphic operations necessitates ”modulo switching” to manage noise accumulation. This procedure strategically refreshes the ciphertext noise levels, allowing secure and accurate computations while safeguarding data privacy.

Security Proof

The security of cryptographic schemes, including homomorphic encryption, is often assessed through formal proofs based on established security notions. The Indistinguishability under Chosen Plaintext Attack (IND-CPA) security is a fundamental concept that assures the confidentiality of encrypted data even when adversaries have access to chosen plaintext-ciphertext pairs. The proof of IND-CPA involves the application of reduction techniques, analysis of probabilities, and the concept of negligible functions.

Reduction Techniques

The proof template for IND-CPA employs reduction, where the security of the cryptographic scheme is reduced to the assumed infeasibility of a particular computational problem. This is usually the hardness of breaking the underlying cryptographic primitive. In the context of homomorphic encryption, this reduction demonstrates that if an adversary can distinguish between two ciphertexts encrypted from distinct plaintexts, then the adversary could also solve the assumed computational problem, which is considered hard.

Probabilistic Analysis

IND-CPA security hinges on the concept that an adversary should not be able to distinguish between two ciphertexts that correspond to different plaintexts. The proof template involves a probabilistic analysis, where the probability that an adversary can distinguish between these ciphertexts is shown to be negligibly close to 0. This probabilistic approach ensures that, even with an unbounded adversary, the probability of successful distinguishing remains negligible.

Negligible Functions

Central to the proof template are negligible functions, which are functions that decrease faster than the reciprocal of any polynomial. In the context of security proofs, if the advantage of an adversary (the probability that they can distinguish between ciphertexts) is bounded by a negligible function, then the scheme is considered IND-CPA secure. This notion encapsulates the concept that as the input size grows, the negligible function diminishes more rapidly than any polynomial increase.

Search over CKKS Ciphertexts

Notations

Fully Homomorphic Encryption (FHE) introduces a set of notations that are commonly used to describe its operations and properties. Let $Enc$ denote the encryption function that takes a plaintext $m$ and transforms it into a ciphertext $c$ using the public key, i.e., $c=Enc(m)$ . The decryption function, $Dec$ , inversely converts the ciphertext back to the plaintext using the secret key, yielding $m=Dec(c)$ . For additive homomorphic encryption, the notation $\oplus$ signifies the addition operation performed on ciphertexts and $\ominus$ represents subtraction, allowing computations on encrypted data without decryption. Similarly, $\odot$ represents the multiplication operation used in multiplicative homomorphic encryption. The notation $Enc_{pk}(m)$ denotes encryption under the public key $pk$ , and $Enc_{sk}(m)$ denotes encryption under the secret key $sk$ . Fully homomorphic encryption supports both additive and multiplicative operations on ciphertexts, enabling a wide range of computations while maintaining data privacy.

To work with vectors, $\mathbf{v}$ usually denotes a vector of plaintext values, and $\mathbf{c}$ represents the corresponding vector of ciphertexts obtained through encryption, i.e., $\mathbf{c}=Enc(\mathbf{v})$ . The subscript notation $c_{i}$ and $v_{i}$ refer to the $i$ -th components of ciphertext and plaintext vectors, respectively. Additionally, $\mathbf{0}$ stands for the vector of all zeros, and $\mathbf{1}$ represents the vector of all ones.

Threat Model

Incorporating the chosen-plaintext attack (IND-CPA) security model within the framework of Fully Homomorphic Encryption (FHE) for Federated Learning (FL) establishes a robust foundation for privacy preservation and security assessment.

IND-CPA security is a fundamental cryptographic property that ensures the confidentiality of encrypted data, even when adversaries possess the ability to influence the plaintexts being encrypted. Specifically, in the context of FHE for FL, IND-CPA security implies that an attacker, despite being able to choose plaintexts for encryption and receiving the corresponding ciphertexts, cannot gain any meaningful insight into the plaintexts’ content. This property is vital for protecting sensitive information during the federated learning process.

In the FL setting, where client devices collaboratively contribute encrypted data and model updates to a central aggregator for training, IND-CPA security guarantees that an adversary cannot exploit the encrypted information to deduce the clients’ individual contributions. This resilience is critical for maintaining the privacy of participants’ data.

By embracing the IND-CPA security model, FHE-enabled FL systems can effectively thwart various adversarial strategies, including those attempting to infer sensitive information from encrypted data or exploit patterns within the encrypted model updates. This model forms the cornerstone for designing cryptographic protocols and encryption schemes that maintain data privacy while enabling collaborative and secure federated learning across distributed environments.

Security Assumptions

The security of a Federated Learning (FL) system augmented by Fully Homomorphic Encryption (FHE) hinges on a set of foundational security assumptions that underpin the confidentiality and integrity of the data being processed. These assumptions form the bedrock upon which cryptographic protocols and techniques are designed to safeguard sensitive information within a distributed learning paradigm. The integration of FHE into FL introduces several key security considerations:

FHE Scheme Security

The security of FHE-based FL heavily relies on the chosen FHE scheme’s cryptographic robustness. It is assumed that the underlying FHE scheme exhibits the prescribed security properties, such as IND-CPA or IND-CCA security, ensuring that encrypted data remains confidential against various types of adversaries.

Key Management

A critical assumption is that the secret keys used for encryption and decryption are securely managed. Adversarial access to secret keys can compromise the confidentiality of data and undermine the integrity of the federated learning process.

Secure Communication

It is assumed that the communication channels between clients and the central aggregator are secure, preventing eavesdropping and tampering. This includes safeguarding against potential man-in-the-middle attacks and ensuring that encrypted data remains confidential during transmission.

Malicious Clients

Security assumptions encompass the behavior of clients in the FL system. While benign clients are assumed to follow the protocol and contribute valid data, malicious clients may deviate from the prescribed behavior, attempting to inject poisoned data or manipulate the learning process. FHE and FL protocols should be designed to detect and mitigate the impact of such adversarial behavior.

Aggregator Integrity

The central aggregator, responsible for aggregating model updates and facilitating federated learning, is assumed to be honest and immune to attacks. Ensuring the integrity of the aggregator’s operations is crucial for preventing attacks that could compromise the accuracy and privacy of the aggregated model.

Cryptanalysis and Exploits

It is assumed that the FHE scheme used in FL is resistant to known cryptanalytic attacks and vulnerabilities. This assumption acknowledges that adversaries may attempt to exploit weaknesses in cryptographic primitives to break the security guarantees of the FHE-based FL system.

Implementation and Side Channels

Security assumptions extend to the correct implementation of FHE and FL protocols, accounting for potential side-channel attacks that exploit implementation-specific vulnerabilities, such as timing information, power consumption, or electromagnetic radiation.

By explicitly addressing these security assumptions, FHE-based FL systems can be designed with a heightened awareness of potential vulnerabilities and mitigation strategies. These assumptions serve as a guide for formulating security requirements, designing cryptographic protocols, and implementing countermeasures that collectively contribute to robust and privacy-preserving federated learning.

Server Processing

Description

The Server Processing algorithm 1 is devised to facilitate the search for a target encrypted value within a collection of encrypted ciphertexts, utilizing the capabilities of a homomorphic encryption scheme. The algorithm employs a binary tree structure, implemented as a vector, to streamline the search process efficiently. Each step is carefully designed to maintain the confidentiality of the data while enabling meaningful computations.

Input: A set of

n

ciphertexts

c_{i}

encrypted by a homomorphic encryption scheme; A target (encrypted) value

c_{t}

to be searched in

c_{i}

; We use

\ominus

and

\odot

to denote homomorphic subtraction and homomorphic multiplication, respectively;

Output: A binary tree t, implemented as a vector, where an intermediate node with a value of zero implies that the descendant subtree has a zero-leaf node.

3 Function PairwiseMul(c):

\textbf{v}\coloneqq[\;]

m\coloneqq|\textbf{c}|

6 for $i\coloneqq 0;i<m;i\coloneqq i+2$ do

7 if $i\geq m$ then

8 break

10 end if

11 if $i+1=m$ then

\textbf{v}.append(c_{i})

14 else

c\coloneqq v_{i}\odot v_{i+1}

\textbf{v}.append(c)

18 end if

20 end for

21 return v

24End Function

26for $i\coloneqq 0;i<n;i\coloneqq i+1$ do

r_{i}\coloneqq c_{i}\ominus c_{t}

29 end for

\textbf{t}[2^{\log n-1},2^{\log n-1}+n]\coloneqq\textbf{r}

\textbf{c}\coloneqq\textbf{r}

32 for $l\coloneqq\log|\textbf{c}|-1;l>=0;l\coloneqq l-1$ do

\textbf{t}[2^{l},2^{l}+n]\coloneqq\texttt{PairwiseMul}(\textbf{c})

\textbf{c}\coloneqq\textbf{t}[2^{l},2^{l}+n]

36 end for

37return t

Algorithm 1 Server Processing

The algorithm commences with a function called PairwiseMul, which operates on an array of ciphertexts. This function performs pairwise multiplication of consecutive ciphertexts, optimizing the process for computational efficiency. The main loop then iterates through the ciphertexts, calculating the differences between each ciphertext and the target ciphertext using homomorphic subtraction. These differences are recorded in an array. The subsequent steps focus on constructing the binary tree, which serves as the foundation for the search operation. The tree initialization phase assigns the calculated differences to a specific interval within the tree.

As the binary tree is constructed, the algorithm capitalizes on the PairwiseMul function to generate intermediate nodes. Starting from the highest level of the tree and working downwards, the algorithm computes pairwise multiplications of ciphertexts within specified intervals, filling in the tree with computed values. This process iterates for each level of the tree, ultimately resulting in a fully constructed binary tree.

Once the binary tree construction concludes, the algorithm returns the complete tree structure. This tree is now equipped for subsequent search operations, enabling secure and efficient querying of the encrypted values. Overall, the Server Processing algorithm showcases the orchestration of homomorphic encryption techniques and binary tree structures to enable privacy-preserving and secure search operations within a federated learning context.

Correctness

The correctness of the Alg. 1 is grounded in its meticulous design and the mathematical properties of homomorphic encryption. The algorithm meticulously constructs a binary tree where each node encapsulates relevant information derived from the encrypted ciphertexts. The PairwiseMul function ensures that the multiplicative operations on ciphertexts are executed accurately, maintaining the integrity of the computations.

The algorithm’s fidelity is bolstered by the fundamental property of homomorphic encryption, which guarantees that operations on encrypted data yield results that correspond to those obtained from the same operations on the underlying plaintexts. As the algorithm progresses, it appropriately applies homomorphic subtraction and multiplication operations to derive the differences between ciphertexts and the target ciphertext. These operations, executed within the homomorphic encryption framework, maintain the confidentiality of the data and uphold the mathematical consistency between ciphertexts and plaintexts.

Additionally, the binary tree structure is adeptly utilized to aggregate the intermediate results of these operations. The algorithm’s approach of gradually constructing the tree, starting from individual ciphertext differences and culminating in a fully formed binary tree, adheres to sound principles of encryption and secure computation. By leveraging the power of homomorphic encryption and employing a systematic tree-building process, the algorithm ensures the accuracy and privacy-preserving nature of its operations, thus justifying its correctness within the context of secure and efficient federated learning scenarios.

Complexity

The time complexity of Alg. 1 is driven by the operations performed within its iterative loops and function calls. The initial loop, traversing through the set of ciphertexts, incurs a linear time complexity of $\mathcal{O}(n)$ , where $n$ represents the number of ciphertexts. Within this loop, the homomorphic subtraction operation and subsequent operations involving the PairwiseMul function contribute constant time overhead. As a result, the time complexity of this segment remains linear.

The subsequent steps of initializing and constructing the binary tree encompass additional iterations and function calls. The binary tree construction loop iterates logarithmically, with the number of iterations determined by the logarithm of the length of the array containing ciphertext differences. This results in a time complexity of $\mathcal{O}(\log n)$ . The PairwiseMul function’s execution also adds a constant factor to the time complexity within this loop. Overall, due to its dependence on the logarithmic and linear components, Alg. 1 exhibits a time complexity of $\mathcal{O}(n+\log n)=\mathcal{O}(n)$ , aptly capturing its efficient execution in federated learning scenarios.

Regarding space complexity, Alg. 1 primary memory usage is allocated to data structures storing ciphertexts, ciphertext differences, and the binary tree vector. These structures collectively occupy $\mathcal{O}(n)$ space, accommodating the input ciphertexts and their differences. The binary tree, implemented as a vector, consumes additional space proportional to the number of nodes in the tree, which is related to the number of ciphertext differences. This results in a space complexity of $\mathcal{O}(n)$ , ensuring the algorithm’s memory consumption remains linear in relation to the size of the input. The algorithm’s space complexity aligns well with the federated learning context, where secure and efficient storage of cryptographic data is pivotal for maintaining privacy and scalability.

Security Proof

Assume, by contradiction, that there exists an efficient adversary $\mathcal{A}$ that can distinguish between the target ciphertext $c_{t}$ and other ciphertexts in Alg. 1’s execution with non-negligible advantage $\epsilon$ . We will construct an algorithm $\mathcal{B}$ that breaks the IND-CPA security of the homomorphic encryption scheme $\Pi$ by simulating the adversary $\mathcal{A}$ .

Algorithm $\mathcal{B}$ :

1.

$\mathcal{B}$ receives the public key $pk$ as input.
2.

$\mathcal{B}$ simulates the encryption oracle by generating random plaintexts and obtaining their corresponding ciphertexts from the encryption algorithm using $pk$ .
3.

When $\mathcal{A}$ queries the decryption oracle on a ciphertext $c$ , $\mathcal{B}$ forwards the query to its decryption oracle, obtaining the decrypted plaintext $m$ .
4.

$\mathcal{B}$ selects a random plaintext $m^{\prime}$ and encrypts it using the encryption algorithm to obtain $c^{\prime}$ . It then outputs $c^{\prime}$ as the decryption oracle’s response.
5.

$\mathcal{B}$ runs Alg. 1 with the input ciphertexts as chosen by $\mathcal{A}$ . It follows the same steps as the real algorithm but replaces $c_{t}$ with a random ciphertext.
6.

$\mathcal{B}$ returns the output of $\mathcal{A}$ as its own output.

By construction, $\mathcal{B}$ simulates the interactions with $\mathcal{A}$ and behaves exactly like the real execution of Alg. 1. The advantage of $\mathcal{B}$ in distinguishing between the target ciphertext $c_{t}$ and other ciphertexts is the same as $\mathcal{A}$ ’s advantage, i.e., $\epsilon$ . Since $\epsilon$ is non-negligible, $\mathcal{B}$ has a non-negligible advantage in breaking the IND-CPA security of the homomorphic encryption scheme $\Pi$ .

This contradiction implies that the assumption of an efficient adversary $\mathcal{A}$ distinguishing the ciphertexts with non-negligible advantage $\epsilon$ is false. Therefore, the ”Server Processing” algorithm, executed on ciphertexts encrypted using $\Pi$ , satisfies the IND-CPA security property.

Interaction Protocol

Description

Alg. 2 defines a protocol for communication between a client and a server to efficiently search for a specific target encrypted value using homomorphic encryption. The protocol employs a binary tree structure, where the server holds encrypted intermediate values, and the client interacts with the server to navigate through the tree, ultimately determining the index of the target value.

At the outset, the server initializes the protocol by generating an intermediate binary tree t. This is achieved by performing pairwise multiplication on the target ciphertext $c_{t}$ . The server then sends the value of the root node, denoted as $t_{1}$ , to the client. This initial interaction serves as the foundation for subsequent steps.

Upon receiving $t_{1}$ from the server, the client’s involvement begins. The client’s initial task is to decrypt $t_{1}$ and determine whether the decrypted value is non-zero. If the decryption yields a non-zero result, the client deduces that the target value must be present within the binary tree. As a response, the client sends a message to the server with a predefined value of $pivot=1$ . Conversely, if the decryption of $t_{1}$ results in zero, the client concludes that the target value is not present and returns -1 as an indication.

The core of the algorithm lies in the subsequent traversal of the binary tree. This traversal takes place iteratively for each level of the binary tree, as determined by the logarithm of the total number of ciphertexts. During each iteration, the client interacts with the server by sending the current $pivot$ value. The server, in turn, calculates the indices of the left child ( $lchild$ ) and right child ( $rchild$ ) nodes and communicates the corresponding values $t_{lchild}$ and $t_{rchild}$ back to the client.

The client processes the values received from the server, decrypting $c_{0}$ and utilizing its result to determine the next $pivot$ value. Depending on the decryption outcome, the $pivot$ value is either doubled (in the case of the left child) or incremented by one (for the right child). If the updated $pivot$ value surpasses the maximum index of the binary tree, the client concludes its operation and directly returns the $pivot$ value, indicating the index of the target value. If the maximum index is not reached, the client continues the interaction by sending the updated $pivot$ value back to the server for the subsequent iteration.

Input: A target (encrypted) value

c_{t}

to be searched on the server;

Output: The index

pivot\in[0,n)

of the target value on the server; -1 if not found

// On the server:

\textbf{t}\coloneqq\text{PairwiseMul}(c_{t})

5 Server sends

t_{1}=\textbf{t}[1]

to client

// On the client:

8if $0\not=Dec(t_{1})$ then

9 return -1

11 else

12 Sends message

pivot=1

to the server

14 end if

17for $lvl\coloneqq 1;lvl\leq\log n;lvl\coloneqq lvl+1$ do

// On the server:

20 Receives message

pivot

lchild\coloneqq 2\cdot pivot

rchild\coloneqq lchild+1

23 Sends pair

(t_{lchild},t_{rchild})

to the client

// On the client:

26 Receives pair

(c_{0},c_{1})

from the server

27 if $0=Dec(c_{0})$ then

pivot\coloneqq 2\cdot pivot

30 else

pivot\coloneqq 2\cdot pivot+1

33 end if

34 if $pivot\geq 2^{\log n}$ then

35 return

pivot

37 else

38 Sends

pivot

to the server

40 end if

42 end for

Algorithm 2 Interaction Protocol

Correctness

The correctness of Alg. 2 stems from its careful orchestration of cryptographic operations, secure communication, and systematic traversal of the binary tree. The algorithm’s adherence to the principles of homomorphic encryption ensures the preservation of data privacy throughout its execution.

At the commencement of the protocol, the server initiates the process by generating the intermediate binary tree t through pairwise multiplication of the target ciphertext $c_{t}$ . This operation respects the properties of homomorphic encryption, ensuring that computations on encrypted data accurately correspond to operations on plaintext data. The subsequent transmission of the root node value $t_{1}$ to the client establishes the foundation for secure interaction.

As the client enters the interaction phase, it decrypts $t_{1}$ to ascertain whether the decrypted value is non-zero. This step maintains the confidentiality of the underlying data, as the client cannot deduce information about the plaintext values from the encrypted data. Based on the decryption result, the client’s actions align with the protocol’s objectives, either indicating the absence of the target value or triggering further interaction.

The binary tree traversal, executed iteratively for each level of the tree, further underscores the algorithm’s correctness. The client and server exchange messages and values, ensuring that the interactions adhere to the security properties of homomorphic encryption. The encryption and decryption operations, when applied to the values exchanged, respect the principles of the encryption scheme, preserving the confidentiality of the data while enabling meaningful computation.

Furthermore, the client’s decision-making process during each iteration is based solely on the decrypted values and follows a deterministic logic that aligns with the encrypted data. The use of $pivot$ values to navigate the binary tree, alongside the systematic left and right child index calculations, guarantees consistent and accurate traversal.

Complexity

In the initialization phase, the server initiates communication by sending a single message to the client. This message contains the value of the root node ( $t_{1}$ ) of the intermediate binary tree t. This initial exchange establishes the foundation for the subsequent interaction between the client and server.

The client interaction phase encompasses the iterative nature of the protocol. For each level of the binary tree (determined by the logarithm of the total number of ciphertexts), a set of messages are exchanged between the client and server. The client begins by sending a message to the server, conveying the current $pivot$ value. In response, the server sends a pair of messages back to the client. Each message in the pair contains the values of the left and right child nodes ( $t_{lchild}$ and $t_{rchild}$ ).

Considering the iterative nature of the protocol, the number of iterations equals the depth of the binary tree, which is $\log n$ for $n$ ciphertexts. Thus, the communication complexity of Alg. 2 can be expressed as the sum of messages exchanged during both the initialization phase and the client interaction phase. This yields a total of $1+2\log n$ or $\mathcal{O}(\log n)$ messages exchanged between the client and server.

Discussions

Multi-Party Secure Computation

One possible extension involves extending the algorithm to support multi-party secure computation. In the current protocol, interaction is limited to a client-server relationship. However, in scenarios where multiple parties collaborate in federated learning, enabling secure communication among all participants becomes crucial. The extension could involve enhancing the protocol to accommodate secure interactions between multiple clients and a server, ensuring privacy-preserving collaboration among a larger set of participants. This extension would necessitate the development of cryptographic techniques to enable secure communication and computation among multiple parties.

Dynamic Tree Structure

Another potential extension involves adapting the algorithm to handle a dynamic binary tree structure. The current algorithm assumes a static binary tree, but in real-world scenarios, the structure of the tree might change due to various factors. This extension could involve devising mechanisms to handle insertions and deletions of ciphertexts in the binary tree while maintaining the integrity of the search protocol. Additionally, the algorithm might need to account for changes in tree structure during the interaction phase, requiring careful consideration of encryption and decryption operations.

Batch Processing for Efficiency

To enhance efficiency and reduce communication overhead, an extension could introduce batch processing capabilities to the algorithm. Instead of exchanging messages for individual ciphertexts, participants could process multiple ciphertexts in a single batch, reducing the number of communication rounds. This extension would involve modifying the algorithm to handle batch processing, optimizing the use of encryption and decryption operations, and ensuring that the privacy-preserving properties are maintained. Batch processing could significantly improve the algorithm’s efficiency in scenarios involving a large number of ciphertexts.

Conclusion

In conclusion, the proposed Server Processing and Interaction Protocol algorithms collectively underscore the evolving landscape of privacy-preserving machine learning within federated learning scenarios. The server algorithm’s design demonstrates the fusion of homomorphic encryption and systematic aggregation, enabling the secure integration of clients’ encrypted updates while mitigating data exposure. Its incorporation of cryptographic operations to facilitate aggregation and model update processes signifies a significant stride toward maintaining data privacy in collaborative learning environments.

Complementing this, the interaction protocol introduces an innovative approach to securely search for target encrypted values through client-server interactions. By meticulously orchestrating decryption, encryption, and traversal steps within a binary tree structure, the protocol guarantees data privacy while enabling efficient data retrieval. The communication complexity analysis underscores its resource-efficient nature, crucial in federated learning settings where minimizing communication overhead is paramount.

References

Aharoni et al. (2023) Aharoni, E.; Adir, A.; Baruch, M.; Drucker, N.; Ezov, G.; Farkash, A.; Greenberg, L.; Masalha, R.; Moshkowich, G.; Murik, D.; Shaul, H.; and Soceanu, O. 2023. HeLayers: A Tile Tensors Framework for Large Neural Networks on Encrypted Data. Proc. Priv. Enhancing Technol., 2023(1): 325–342.
Alfeld, Zhu, and Barford (2016) Alfeld, S.; Zhu, X.; and Barford, P. 2016. Data poisoning attacks against autoregressive models. In Proceedings of the AAAI Conference on Artificial Intelligence.
Bagdasaryan et al. (2020) Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; and Shmatikov, V. 2020. How To Backdoor Federated Learning. In Chiappa, S.; and Calandra, R., eds., The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108 of Proceedings of Machine Learning Research, 2938–2948. PMLR.
Bhagoji et al. (2019) Bhagoji, A. N.; Chakraborty, S.; Mittal, P.; and Calo, S. 2019. Analyzing federated learning through an adversarial lens. In International Conference on Machine Learning, 634–643. PMLR.
Blanchard et al. (2017) Blanchard, P.; El Mhamdi, E. M.; Guerraoui, R.; and Stainer, J. 2017. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, 118–128. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781510860964.
Cheon et al. (2017) Cheon, J. H.; Kim, A.; Kim, M.; and Song, Y. S. 2017. Homomorphic Encryption for Arithmetic of Approximate Numbers. In Takagi, T.; and Peyrin, T., eds., Advances in Cryptology - ASIACRYPT 2017 - 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, December 3-7, 2017, Proceedings, Part I, volume 10624 of Lecture Notes in Computer Science, 409–437. Springer.
Cui et al. (2021) Cui, S.; Pan, W.; Jian Liang, C. Z.; and Wang, F. 2021. Addressing algorithmic disparity and performance inconsistency in federated learning. Advances in Neural Information Processing Systems.
Fan and Vercauteren (2012) Fan, J.; and Vercauteren, F. 2012. Somewhat Practical Fully Homomorphic Encryption. Cryptology ePrint Archive, Paper 2012/144. https://eprint.iacr.org/2012/144.
Gong et al. (2022) Gong, X.; Sharma, A.; Karanam, S.; Wu, Z.; Chen, T.; Doermann, D.; and Innanje, A. 2022. Preserving Privacy in Federated Learning with Ensemble Cross-Domain Knowledge Distillation. In Proceedings of the AAAI Conference on Artificial Intelligence.
Hamer, Mohri, and Suresh (2020) Hamer, J.; Mohri, M.; and Suresh, A. T. 2020. Fedboost: A communication-efficient algorithm for federated learning. In International Conference on Machine Learning.
Hardy et al. (2017) Hardy, S.; Henecka, W.; Ivey-Law, H.; Nock, R.; Patrini, G.; Smith, G.; and Thorne, B. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. CoRR, abs/1711.10677.
Li et al. (2020) Li, L.; Fan, Y.; Tse, M.; and Lin, K.-Y. 2020. A review of applications in federated learning. Computers & Industrial Engineering, 149: 106854.
McMahan et al. (2017) McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; and y Arcas, B. A. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Singh, A.; and Zhu, X. J., eds., Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTAT, volume 54 of Proceedings of Machine Learning Research, 1273–1282. PMLR.
Merkle (1988) Merkle, R. C. 1988. A Digital Signature Based on a Conventional Encryption Function. In Pomerance, C., ed., Advances in Cryptology — CRYPTO ’87, 369–378. Berlin, Heidelberg: Springer Berlin Heidelberg. ISBN 978-3-540-48184-3.
Paillier (1999) Paillier, P. 1999. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Proceedings of the 17th International Conference on Theory and Application of Cryptographic Techniques, EUROCRYPT’99, 223–238. Berlin, Heidelberg: Springer-Verlag. ISBN 3540658890.
Reisizadeh et al. (2020) Reisizadeh, A.; Mokhtari, A.; Hassani, H.; Jadbabaie, A.; and Pedarsani, R. 2020. Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization. In International Conference on Artificial Intelligence and Statistics.
Savvides, Khandelwal, and Eugster (2020) Savvides, S.; Khandelwal, D.; and Eugster, P. 2020. Efficient Confidentiality-Preserving Data Analytics over Symmetrically Encrypted Datasets. Proc. VLDB Endow., 13(8): 1290–1303.
Shejwalkar and Houmansadr (2021) Shejwalkar, V.; and Houmansadr, A. 2021. Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning. In Network and Distributed Systems Security (NDSS) Symposium 2021.
Sun et al. (2021) Sun, J.; Li, A.; DiValentin, L.; Hassanzadeh, A.; Chen, Y.; and Li, H. 2021. Fl-wbc: Enhancing robustness against model poisoning attacks in federated learning from a client perspective. Advances in Neural Information Processing Systems.
Sun et al. (2019) Sun, Z.; Kairouz, P.; Suresh, A. T.; and McMahan, H. B. 2019. Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963.
Xie et al. (2020) Xie, C.; Huang, K.; Chen, P.-Y.; and Li, B. 2020. DBA: Distributed Backdoor Attacks against Federated Learning. In International Conference on Learning Representations.
Yang et al. (2019) Yang, Q.; Liu, Y.; Chen, T.; and Tong, Y. 2019. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol., 10(2).
Yin et al. (2018) Yin, D.; Chen, Y.; Kannan, R.; and Bartlett, P. 2018. Byzantine-robust distributed learning: Towards optimal statistical rates. In International Conference on Machine Learning.
Zhang et al. (2020) Zhang, C.; Li, S.; Xia, J.; Wang, W.; Yan, F.; and Liu, Y. 2020. BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning. In 2020 USENIX Annual Technical Conference (USENIX ATC 20), 493–506. USENIX Association. ISBN 978-1-939133-14-4.