Quantum Federated Learning for Distributed
Quantum Networks

Kai Yu , Fei Gao , and Song Lin

\bullet

Kai Yu is with the College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350117, China. E-mail: [email protected].

\bullet

Fei Gao, the corresponding author, is with the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China. E-mail: [email protected].

\bullet

Song Lin, the corresponding author, is with the College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350117, China. E-mail: [email protected].

Abstract

Federated learning is a framework for learning from distributed networks. It attempts to build a global model based on virtual fusion data without sharing the actual data. Nevertheless, the traditional federated learning process encounters two challenges: high computational cost and message transmission security. To address that, we propose a quantum federated learning for distributed quantum networks by utilizing quantum characteristics. First, we give two methods to extract the data information to the quantum state. It can cope with different acquisition frequencies of data information. Next, a quantum gradient descent algorithm is provided to help clients in the distributed quantum networks to train local models in parallel. Compared with the classical counterpart, the proposed algorithm achieves exponential acceleration in dataset scale and quadratic speedup in data dimensionality. And, a quantum secure multi-party computation protocol with Chinese residual theorem is designed. It could avoid errors and overflow problems that may occur in the process of large number operation. Security analysis shows that the protocol can resist common external and internal attacks. Finally, to demonstrate the effectiveness of the proposed framework, we use it to train a federated linear regression model and simulate the essential computation steps on the IBM Qiskit simulator.

Index Terms:

Quantum algorithm, federated learning, distributed networks, quantum gradient descent, quantum secure multi-party computation.

I Introduction

With the development of information network, more and more data are generated and stored in distributed network system [1]. Integrating data from distributed networks enables the extraction of valuable information. Nevertheless, a significant proportion of the data contains sensitive and private information, making data owners hesitant to share it [2]. This situation has led to the emergence of federated learning (FL), which is a distributed machine learning (ML) method [3]. FL improves data privacy by localizing data and training it without sharing the raw data. This not only makes effective use of distributed data resources, but also facilitates the development of information network technology. However, the volume of locally trained data can be huge. At this point, the computing power of traditional computers will face great challenges. Furthermore, the transmission of training results poses a threat to user privacy, as it can provide an opportunity for attackers to infer sensitive information. While some classical passwords have been used to safeguard communication security, development in hardware poses a persistent threat to their security.

Quantum information processing (QIP) is an emerging field that explores the interaction between quantum mechanics and information technology. It sustains to show its charm, attracting the attention of scholars. In 1984, Bennett and Brassard proposed the famous BB84 protocol [4], which perfectly achieves a key distribution task between two remote parties. Subsequently, scholars utilized quantum information processing to ensure information security and proposed a series of quantum cryptography protocols. In contrast to the security of classical cryptography protocols that are based on the assumption of computational complexity, these protocols’ security relies on physical properties such as the Heisenberg uncertainty principle, which makes them unconditionally secure in theory. Quantum cryptography has developed as a significant application of quantum information processing, including quantum key distribution [5, 6, 7], quantum secret sharing [8, 9, 10], quantum secure direct communication [11, 12], and so on. Another exciting application of quantum information processing is quantum computing. It provided quantum speedup to certain classes of problems that are intractable on classical computers. For example, the factorization of large numbers via Shor algorithm [13] can provide exponential speedup. Furthermore, quantum computing has also made some advances in machine learning, such as the quantum linear systems solving algorithms [14, 15, 16], quantum regression [17, 18], quantum neural network [19, 20], variational quantum algorithms (VQA) [21, 22, 23], and so on.

Motivated by the advantages shown by quantum cryptography and quantum computing respectively in improving transmission security and computing speed, scholars have attempted to utilize QIP to address the challenges faced by FL. In 2021, Li et al. focused on the security issue of FL [24]. They proposed a private single-party delegated training protocol based on blind quantum computing for a variational quantum classifier, and then extended the protocol to quantum FL combined with differential privacy. This protocol can exploit the computing advantage of remote quantum servers with privacy of sensitive data. In 2024, Ren et al. proposed a quantum FL to solve the privacy preservation issue for the smart cyber-physical grid dynamic security assessment problem [25]. Moreover, Chen and Yoo proposed a quantum FL scheme with a hybrid quantum-classical machine learning model, focusing on improving the efficiency of the local training [26]. In their way, the classical convolutional network is used to extract data features and compress them into vectors which are input into variable quantum circuits for training. Compared with the classical process, this method can achieve the same level of accuracy more quickly. In 2022, Huang et al. utilized a variational quantum algorithm to estimate the gradient of the local model to avoid analyzing the gradient too costly [27]. As variational quantum algorithms approximate the target results by using circuits with variables, they are different from quantum algorithms that calculate the target results through the evolution of quantum gates. Therefore, we further explore the realization of FL with quantum resources.

In this paper, we focus on the quantum algorithm running on ordinary quantum computers and present a quantum federated learning based on gradient descent (QFLGD). It aims to provide a unified, secure, and effective gradient distribution estimation scheme with distributed quantum networks. In QFLGD, we propose two data preparation methods by analyzing the different acquisition frequencies of static data (the local training data) and dynamic data (the parameters that need to be updated during iteration). That can reduce the requirement of QFLGD on the performance of quantum random memory. At the same time, two main processes of FL are implemented in QFLGD, which exploit quantum properties. The first one is a quantum gradient descent (QGD) algorithm. It facilitates the acceleration of the training gradient for the client. QGD provides the client with a classical gradient at each iteration, which can be directly used to learn classical model parameters. Compared with the classical counterpart, this quantum process has exponential acceleration in terms of data scale and quadratic speedup in data dimensionality. The other is a quantum secure multi-party computation (QSMC) protocol, which allows the aggregation of gradients to securely be done with quantum communication networks. That is, the server is able to calculate the federated gradients without the client sharing the local gradients. Furthermore, the application of the Chinese remainder theorem in QSMC makes it possible to avoid errors and overflow problems that may occur during the calculation of large numbers. The proposed quantum federated learning framework can improve the local computing efficiency and data privacy of FL. We also apply QFLGD to train the federated linear regression (FLR) and give its numerical experiment to verify the correctness.

The remainder of this paper is organized as follows. The classical FL is reviewed in Sec. II. In Sec. III, we propose the framework for QFLGD. In Sec. IV, we analyze the time complexity and the security of QFLGD. Furthermore, an application to train the FLR and the numerical experiment are shown in Sec. V. In Sec. VI, we give the conclusion of our work.

II Review of classical FL

To clarify the framework of QFLGD in the distributed quantum networks, this section offers a overview of the fundamental ideas and processes of traditional FL. FL is a collaborative ML approach in which multiple clients train a shared model without exchanging raw data. A popular learning framework is FL based on gradient descent [3], which is depicted in Fig. 1. It mainly includes the following parts.

Refer to caption — Figure 1: Schematic diagram of the federated learning based on gradient descent. $\mathbf{w}^{j}(n)$ is denoted the $j$ th element of the parameter vector $\mathbf{w}(n)$ . $\bm{G}^{j}\left(\mathbf{w}(n)\right)$ is the $j$ th component of the global gradient. $\beta_{k}$ is the aggregation weight. $\bm{g}^{j}_{k}\left(\mathbf{w}(n)\right)$ is represented as the $j$ th element of local gradient of the client ${\rm Bob}_{k}$ . $\alpha$ is a learning rate.

A) Data preparation and model initialization. In the FL framework, data is derived from various clients in a distributed network, such as hospital medical information, preference options in business surveys, and other sensitive data [28]. We consider general federated learning with $K$ clients participating in the model training. The server (Alice) initializes a global model that requires training parameters $\mathbf{w}=\left(\omega_{0},\omega_{1},\cdots,\omega_{D-1}\right)$ to make it more efficient. And the server distributes it to clients. The client ${\rm Bob}_{k}$ ( $k=1,2,\cdots,K$ ) collects and preprocesses $M_{k}$ data samples $\left({\mathbf{x}_{0},y_{0}}\right),\left({\mathbf{x}_{1},y_{1}}\right),\cdots,\left({\mathbf{x}_{M_{k}-1},y_{M_{k}-1}}\right)$ , where $\mathbf{x}_{i}\in\mathbb{R}^{D}$ and $y_{i}$ is the corresponding label.

B) Local training. To train the model, clients use standard ML algorithms without sharing raw data. The trained ML model evaluation task is expressed as minimizing the cost function, such as minimizing mean square error (MSE) loss function

{\min\limits_{\mathbf{w}}E}=\frac{1}{2M_{k}}{\sum\limits_{i=0}^{M_{k}-1}\left[{f\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)-y_{i}}\right]^{2}},

(1)

where $f$ is the activation function. This function is usually expressed as minimizing the difference between the model output and the expected output. In this case, the model optimization is to find the gradient of $E$ with respect to $\mathbf{w}$ to adjust the model parameters. For the client ${\rm Bob}_{k}$ ( $k=1,2,\cdots,K$ ), he can obtain

\bm{g}_{k}^{j}\left(\mathbf{w}\right)=\frac{1}{M_{k}}{\sum\limits_{i=0}^{M_{k}-1}{F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)}}\mathbf{x}_{i}^{j},\quad j=0,1,\cdots,D-1,

(2)

with his data. Here, $F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)$ is represented as $\frac{\partial(f({\mathbf{x}_{i}\cdot\mathbf{w}}))}{\partial({\mathbf{x}_{i}\cdot\mathbf{w}})}\left[{f\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)-y_{i}}\right]$ , $\bm{g}_{k}^{j}\left(\mathbf{w}\right)$ is denoted the $j$ th element of the local gradient $\bm{g}_{k}\left(\mathbf{w}\right)$ , and $\mathbf{x}_{i}^{j}$ is labeled the $j$ th element of the sample $\mathbf{x}_{i}$ .

C) Model aggregation and update. The server (Alice) collects the gradients trained by all clients, and calculates the federated gradient

\bm{G}^{j}\left(\mathbf{w}\right)={\sum\limits_{k=1}^{K}{\beta_{k}}{\bm{g}_{k}^{j}\left(\mathbf{w}\right)}},

(3)

where ${\beta_{k}}={M_{k}}/({\sum_{k=1}^{K}{M_{k}}})$ and $\bm{G}^{j}\left(\mathbf{w}\right)$ is represented as the $j$ th element of the federated gradient $\bm{G}\left(\mathbf{w}\right)$ . Then, Alice updates the global model parameters. Specifically, she adjusts the parameter to

\mathbf{w}^{j}\left({n+1}\right)=\mathbf{w}^{j}(n)-\alpha\times\bm{G}^{j}\left({\mathbf{w}(n)}\right),

(4)

for $j=0,1,\cdots,D-1$ , in the $n$ th iteration. In Eq.(4), $\alpha$ is a learning rate [29].

D) Model evaluation and distribution. The server (Alice) evaluates the performance of the global model and sends the global model parameters to the clients for further local training if it has not yet converged (i.e., ${\sum\limits_{j=0}^{D-1}\left[{\bm{G}^{j}\left(\mathbf{w}(n)\right)}\right]^{2}}>\varepsilon$ where $\varepsilon$ is a threshold about gradient). Once the condition is satisfied, Alice announces that the training stops and distributes the model.

It is notable that the time consumption of FL is mainly in the calculation of the local gradient. For the client, he computes a gradient element takes $O(D)$ time to estimate the inner product ${\mathbf{x}_{i}\cdot\mathbf{w}}$ and $O(M)$ time to sum. In general, it takes $D$ repetitions to estimate all elements of the local gradient. Therefore, ${\rm Bob}_{k}$ takes $O(MD^{2})$ time to calculate all the elements of the local gradient on a classical computer. In the era of big data, this is surely a very expensive calculation. Moreover, the security of federation learning may be compromised during local gradient aggregation. Traditional encryption methods can improve the security of this process. However, with the development of quantum technology, there are threats to traditional encryption methods.

III Quantum Federated Learning based on Gradient Descent Algorithm

In this section, we present the QFLGD, which focuses on the parallel and private computing architectures for data in distributed quantum networks. This distributed quantum network typically consists of a server and several clients with quantum computing capabilities. We first give ways to extract the data information to the quantum state. Subsequently, we propose a QGD algorithm that clients use to estimate the gradient locally. A QSMC protocol is designed to perform a private calculation of the global gradient when the server aggregates the training results of clients. Finally, the server updates the global parameters and shares the results with clients. The schematic diagram of QFLGD framework is presented in Fig. 2.

III-A Quantum data preparation and model initialization

Similar to the classical FL, a dataset $\mathbf{X}=\left[\mathbf{x}_{0},\mathbf{x}_{1},\cdots,\mathbf{x}_{M-1}\right]$ is chosen by a client in quantum FL, where $\mathbf{x}_{i}\in\mathbb{R}^{D}$ . $\mathbf{y}=\left(y_{0},y_{1},\cdots,y_{M-1}\right)$ is the corresponding label of each sample of $\mathbf{X}$ , respectively. For convenience, assuming that $D=2^{L}$ for some $L$ ; otherwise, some zeros are inserted into the vector. Furthermore, the server initializes an ML model with parameters. The learnable parameters of the model are represented by a vector $\mathbf{W}\in\mathbb{R}^{D}$ , which can be optimized using gradient descent. The ability of quantum computers to effectively solve practical problems depends on encoding this information into quantum states as input to a quantum algorithm. Here, we give methods to extract the data and parameters information to quantum states.

Considering the quantum oracles

O_{X}:\left|i\right\rangle\left|j\right\rangle\left|0\right\rangle\longrightarrow\left|i\right\rangle\left|j\right\rangle|\mathbf{x}_{i}^{j}\rangle,

(5)

and

O_{y}:\left|i\right\rangle\left|0\right\rangle\longrightarrow\left|i\right\rangle|y_{i}\rangle,

(6)

are provided, where $\mathbf{x}_{i}^{j}$ represents the $j$ th element of the $i$ th vector of the data set $\mathbf{X}$ . These two oracles can respectively access the entries of $\mathbf{x}_{i}$ , $\mathbf{y}$ in time $O(\text{polylog}(MD))$ and $O(\text{polylog}(M))$ [17, 30], when the data are stored in quantum random access memory (QRAM) [31] with an appropriate data structure [32]. In addition, the operation

U_{nf}:\left|i\right\rangle\left|0\right\rangle\longrightarrow\left|i\right\rangle\left|\left\|\mathbf{x}_{i}\right\|\right\rangle.

(7)

is required, which could access the $2$ -norm of the vector $\mathbf{x}_{i}$ . Inspired by Ref. [33], $U_{nf}$ can be implemented in time $O\left({{\text{polylog}(D)}/\epsilon_{m}}\right)$ employing controlled rotation [34] and quantum phase estimation (QPE) [35]. The details are shown in appendix A. According to these assumptions, the processes of the quantum data preparation are described as follows.

$(A1)$ In this step, the data information is extracted to the state $\left|{\phi\left(\mathbf{x}_{i}\right)}\right\rangle$ . Firstly, three quantum registers are prepared in state $|i\rangle_{1}|0^{\otimes{\log{D}}}\rangle_{2}|0^{\otimes{q}}\rangle_{3}$ , where the subscript numbers denote different registers. The $q$ is labeled as the qubits that are enough to store the information about the elements of data, i.e., $2^{q}-1>{\max\limits_{i,j}|\mathbf{x}_{i}^{j}|}$ . After that, $H^{\otimes{\log{D}}}$ is applied on the second register to generate a state

\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}\left|i\right\rangle_{1}}|j\rangle_{2}|0^{\otimes q}\rangle_{3}.

(8)

Secondly, the quantum oracle $O_{X}$ is performed on the three registers. These registers are in a state

\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}|i\rangle_{1}}|j\rangle_{2}|\mathbf{x}_{i}^{j}\rangle_{3}.

(9)

Subsequently, a qubit in the state $|0\rangle$ is added and rotated to $\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}\left|0\right\rangle+c_{1}\mathbf{x}_{i}^{j}\left|1\right\rangle$ controlled on $|\mathbf{x}_{i}^{j}\rangle$ , where $c_{1}={1/{\max\limits_{i,j}|\mathbf{x}_{i}^{j}|}}$ . The system becomes

\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}|i\rangle_{1}}|j\rangle_{2}|\mathbf{x}_{i}^{j}\rangle_{3}\left[\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}|0\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle\right]_{4}.

(10)

Finally, the inverse $O_{X}$ operation is applied on the third register. The quantum state

|i\rangle|{\phi\left(\mathbf{x}_{i}\right)}\rangle=|i\rangle_{1}\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}|j\rangle_{2}\left[\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}|0\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle\right]_{4}},

(11)

could be obtained via discarding the third register. The process is denoted as $U_{\mathbf{x}_{i}}$ , which generates the state $\left|{\phi\left(\mathbf{x}_{i}\right)}\right\rangle$ in time $O(\text{polylog}(D)+q)$ .

$(A2)$ In order to train the gradient, the parameter $\mathbf{w}(n)$ should be introduced in the $n$ th iteration. Thus, it is necessary to generate a quantum state, which contains the information of $\mathbf{w}(n)$ . Depending on the fact that the parameter is different in each iteration, there are two methods to prepare the quantum state.

One way is based on the assumption that QRAM is allowed to read and write frequently. For the information of $\mathbf{w}^{j}(n)$ ( $j=0,1,\cdots,D-1$ ) are written in QRAM timely, the quantum state

\frac{1}{\sqrt{D}}\sum\limits_{j=0}^{D-1}|j\rangle\left[\sqrt{1-({c_{2}\mathbf{w}^{j}(n)})^{2}}\left|0\right\rangle+c_{2}\mathbf{w}^{j}(n)|1\rangle\right]

(12)

can be produced by the processes similar to step $(A1)$ with the help of the oracle $O_{\mathbf{w}}$ ( $O_{\mathbf{w}}\left|j\right\rangle\left|0\right\rangle\longrightarrow\left|j\right\rangle|{\mathbf{w}^{j}(n)}\rangle$ ), where $c_{2}={1/\left\|\mathbf{w}(n)\right\|}$ and $\mathbf{w}^{j}(n)$ is denoted as the $j$ th element of the parameter vector in the $n$ th iteration. This way can be implemented in time $O(\text{polylog}(D)+q)$ .

For another, the parameter is extracted into the quantum state based on the operation $R(\vartheta)=\cos{(\vartheta)}|0\rangle\langle 0|-\sin{(\vartheta)}|0\rangle\langle 1|+\sin{(\vartheta)}|1\rangle\langle 0|+\cos{(\vartheta)}|1\rangle\langle 1|$ , which is inspired by Ref. [36]. In this way, the $\mathbf{w}(n)$ is not required to be written in QRAM. The following are described as the processes.

Assuming that is easy to get $2^{L}-1$ ( $L=\log(D)$ ) angle parameters $\bm{\vartheta}_{t}=(\vartheta_{t}^{0},\cdots,\vartheta_{t}^{2^{t-1}-1})$ ( $t=1,2,\cdots,L$ ) from the updated $\mathbf{w}(n)$ after the last iteration. The angle $\vartheta_{t}^{j}$ satisfies

{\cos(\vartheta^{j}_{t})=\frac{h_{t}^{2j}}{h^{j}_{t-1}}},~{}~{}{\sin(\vartheta^{j}_{t})=\frac{h^{2j+1}_{t}}{h^{j}_{t-1}}},

(13)

for $t=1,\cdots,L$ , where $h^{j}_{t-1}=\sqrt{(h^{2j}_{t})^{2}+(h^{2j+1}_{t})^{2}}$ and $j=0,\cdots,2^{t-1}-1$ . In particular, $h^{j}_{L}=\mathbf{w}^{j}(n)$ for $j=0,1,\cdots,D-1$ . And there are defined

U(\bm{\vartheta}_{t})=\begin{cases}\sum\limits_{j=0}^{2^{t-1}-1}{|j\rangle\langle j|\otimes R({\vartheta}^{j}_{t})\otimes I\{{L-t}\}},&t=2,\cdots,L\\ R({\vartheta}^{j}_{t})\otimes I\{{L-t}\},&t=1\end{cases},

(14)

where $I\{{L-t}\}$ is represented as the gate $I$ applied on $(L-t)$ qubits.

After that, a quantum state

U(\bm{\vartheta}_{L})\cdots U(\bm{\vartheta}_{2})U(\bm{\vartheta}_{1})|0^{\otimes\log{D}}\rangle={\sum_{j=0}^{D-1}{c_{2}\mathbf{w}^{j}(n)\left|j\right\rangle}},

(15)

is generated in time $O(D)$ by applying the operation $U(\bm{\vartheta}_{t})$ for $t=1,2,\cdots,L$ . Furthermore, a register in state $|1\rangle$ is appended. The overall system is in the state

\left|{\phi(\mathbf{w}(n))}\right\rangle=\sum_{j=0}^{D-1}{c_{2}\mathbf{w}^{j}(n)\left|j\right\rangle}|1\rangle.

(16)

To further interpret this method, an example is given in the appendix B.

According to Eq. (16), the state in Eq. (12) can be rewritten as

\frac{1}{\sqrt{D}}\left[\sum\limits_{j=0}^{D-1}|j\rangle\sqrt{1-({c_{2}\mathbf{w}^{j}(n)})^{2}}|0\rangle+|{\phi(\mathbf{w}(n))}\rangle\right].

(17)

It means that the above two methods both allow us to extract the parameter $\mathbf{w}(n)$ information into the quantum state $\left|{\phi(\mathbf{w}(n))}\right\rangle$ . On the basis of the current quantum technology, we choose the second method which is more feasible, and denote the process as $U_{\mathbf{w}}$ .

III-B Local training by quantum parallel computing (QGD algorithm)

Now, we propose a QGD algorithm. It enables clients to estimate the gradient of the model in parallel based on their respective local data. According to Eq. (2), with the help of the two operations $U_{\mathbf{x}_{i}}$ and $U_{\mathbf{w}}$ of quantum data preparation, the process of the QGD algorithm is described as follows.

$(B1)$ Generate an intermediate quantum state.

The task of computing the gradient involves an inner product computation, which needs $O(MD)$ in classical computers. In the era of big data, this time is costly. Here, we generate an intermediate state that contains the information of $\mathbf{x}_{i}\cdot\mathbf{w}$ . This state facilitates subsequent parallel estimation.

$(B1.1)$ A quantum state is initialized as

\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}|i\rangle_{1}}|0^{\otimes{\log{D}}}\rangle_{2}|0^{\otimes q}\rangle_{3}|0\rangle_{4}|0\rangle_{5}.

(18)

$(B1.2)$ The Hadamard gate is performed on the fifth register. Then, a controlled operation $\left|i\right\rangle\left\langle i\right|\otimes U_{\mathbf{x}_{i}}\otimes\left|0\right\rangle\left\langle 0\right|+I\otimes U_{\mathbf{w}}\otimes\left|1\right\rangle\left\langle 1\right|$ is applied to produce a state

\frac{1}{\sqrt{2M}}{\sum\limits_{i=0}^{M-1}|i\rangle_{1}}\left[{|{\phi(\mathbf{x}_{i})}\rangle_{24}\left|0\right\rangle_{5}+|{\phi(\mathbf{w}(n))}\rangle_{24}|1\rangle_{5}}\right].

(19)

$(B1.3)$ Subsequently, the Hadamard gate is implemented on the fifth register to get

\left|\psi_{1}\right\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{\left|i\right\rangle_{1}\left|\Psi_{i,n}\right\rangle_{245}}},

(20)

where

\begin{split}&|\Psi_{i,n}\rangle=\\ &\frac{1}{2}\left[(|{\phi(\mathbf{x}_{i})}\rangle+|{\phi(\mathbf{w}(n))}\rangle)|0\rangle+(|{\phi(\mathbf{x}_{i})}\rangle-|{\phi(\mathbf{w}(n))}\rangle)|1\rangle\right].\end{split}

(21)

The state $|\Psi_{i,n}\rangle$ can be rewritten as

\begin{split}&|\Psi_{i,n}\rangle=\cos\theta_{i}|\psi_{i}^{0}\rangle+\sin\theta_{i}|\psi_{i}^{1}\rangle\\ &=(\cdots)_{245}^{\bot}+\frac{1}{2\sqrt{D}}({{\sum\limits_{j=0}^{D-1}{c_{1}\mathbf{x}_{i}^{j}}}|j\rangle-{\sum\limits_{j^{\prime}=0}^{D-1}{c_{2}^{\prime}\mathbf{w}^{j^{\prime}}(n)}}|j^{\prime}\rangle})_{2}|11\rangle_{45},\end{split}

(22)

where $\theta_{i}\in\left[0,\frac{\pi}{2}\right]$ . It is easy to verify that

\sin^{2}\theta_{i}=\frac{c_{1}^{2}\|\mathbf{x}_{i}\|^{2}+c_{2}^{{\prime}2}\|\mathbf{w}\|^{2}-2c_{1}c_{2}^{\prime}(\mathbf{x}_{i}\cdot\mathbf{w})}{4D}

(23)

and $c_{2}^{{\prime}}=\sqrt{D}{c_{2}}$ . By observing Eq. (22) and Eq. (23), it can be found that the essential information is provided by the system when its fourth and fifth registers are both in state $|1\rangle$ . It means that the superposition of $|0\rangle$ also does not affect the extraction of the required information if choosing the state of Eq. (17). Thus, the first method (in step $(A2)$ ) is also suitable for our algorithm, which $c_{2}^{{\prime}}=c_{2}$ .

$(B2)$ Calculate the $F(\mathbf{x}_{i}\cdot\mathbf{w})$ in parallel.

The approximation of $F(\mathbf{x}_{i}\cdot\mathbf{w})$ should be estimated and stored in a quantum state. To achieve this goal, the $\theta_{i}$ is needed to estimate via quantum phase estimation which the unitary operation is defined as

Q_{i}=-\mathcal{A}_{i}S_{00}\mathcal{A}_{i}^{\dagger}S_{11},

(24)

where $\mathcal{A}_{i}\left|{0}^{\otimes{[\log{(D)}+2]}}\right\rangle=\left|\Psi_{i,n}\right\rangle$ , $S_{00}=I^{\otimes{[\log{(D)}+2]}}-2{\left|0^{\otimes{[\log{(D)}+2]}}\right\rangle\left\langle 0^{\otimes{[\log{(D)}+2]}}\right|}$ and $S_{11}=I^{\otimes{\log{(D)}}}\otimes\left(I^{\otimes{2}}-2|1^{\otimes{2}}\rangle\langle 1^{\otimes{2}}|\right)$ . Mathematically, the eigenvalues of $Q_{i}$ are $e^{\pm 2\mathbf{i}\theta_{i}}$ $(\mathbf{i}=\sqrt{-1})$ and the corresponding eigenvectors are $\left|\Psi_{i,n}^{\pm}\right\rangle=\frac{1}{\sqrt{2}}\left({\left|\psi_{i}^{0}\right\rangle\pm\mathbf{i}\left|\psi_{i}^{1}\right\rangle}\right)$ ), respectively. Based on the set of its eigenvectors, $\left|\Psi_{i,n}\right\rangle$ can be rewritten as $\left|\Psi_{i,n}\right\rangle=-\frac{\mathbf{i}}{\sqrt{2}}\left({e^{\mathbf{i}\theta_{i}}\left|\Psi_{i,n}^{+}\right\rangle-e^{-\mathbf{i}\theta_{i}}\left|\Psi_{i,n}^{-}\right\rangle}\right)$ . The procedure of estimating the $F(\mathbf{x}_{i}\cdot\mathbf{w})$ is displayed as follows.

$(B2.1)$ Performing the QPE on $Q_{i}$ with the state $\left|\psi_{1}\right\rangle\left|0^{\otimes l}\right\rangle$ for some $l=O\left(\text{log}{\epsilon_{\theta}^{-1}}\right)$ , an approximate state

\begin{split}&\left|\psi_{2}\right\rangle=\\ &\frac{-\mathbf{i}}{\sqrt{2M}}{\sum\limits_{i=0}^{M-1}\left|i\right\rangle_{1}}\left(e^{\mathbf{i}\theta_{i}}\left|\Psi_{i,n}^{+}\right\rangle\left|{\widetilde{\theta}}_{i}\right\rangle-e^{-\mathbf{i}\theta_{i}}\left|\Psi_{i,n}^{-}\right\rangle{\left|-{\widetilde{\theta}}_{i}\right\rangle}\right)_{2456}\end{split}

(25)

is obtained, where ${\widetilde{\theta}}_{i}\in\mathbb{Z}_{2^{l}}$ satisfies $\left|{\theta_{i}-{\widetilde{\theta}}_{i}\pi/2^{l}}\right|\leq\epsilon_{\theta}$ . Then, the quantum state

\left|\left.\psi_{3}\right\rangle\right.=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{\left|i\right\rangle_{1}\left|\Psi_{i,n}\right\rangle_{245}|{{\sin}^{2}({\widetilde{\theta}}_{i})}\rangle_{6}}}

(26)

is generated by using the sine gate. It holds for the fact that ${{\sin}^{2}({\widetilde{\theta}}_{i})}={{\sin}^{2}(-{\widetilde{\theta}}_{i})}$ .

$(B2.2)$ According to Eq. (23), it is needed to access $\left\|\mathbf{x}_{i}\right\|$ to compute $\mathbf{x}_{i}\cdot\mathbf{w}$ . Combining with the operation $U_{nf}$ and the quantum arithmetic operations [37], we can get

\left|\psi_{4}\right\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}\left|i\right\rangle_{1}}\left|\Psi_{i,n}\right\rangle_{245}\left|{\mathbf{x}_{i}\cdot\mathbf{w}}\right\rangle_{6}\left|\left\|\mathbf{x}_{i}\right\|\right\rangle_{7}\left|\left\|\mathbf{w}\right\|\right\rangle_{8}.

(27)

$(B2.3)$ An oracle $O_{F}$ is supposed to achieve any function which has a convergent Taylor series [34]. Combining with $O_{y}$ , the function $F(*)$ could be implemented (a simple example is described in Sec. V ). The state becomes

\left|\psi_{5}\right\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=1}^{M}\left|i\right\rangle_{1}}\left|\Psi_{i,n}\right\rangle_{245}\left|F({\mathbf{x}_{i}\cdot\mathbf{w}})\right\rangle_{6}\left|\left\|\mathbf{x}_{i}\right\|\right\rangle_{7}\left|\left\|\mathbf{w}\right\|\right\rangle_{8}.

(28)

$(B2.4)$ Next, a register in the state $|0\rangle$ is appended as the last register and rotated it to $\left|\phi_{i}\right\rangle=c_{3}F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)\left|0\right\rangle+\sqrt{1-\left(c_{3}F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)\right)^{2}}\left|1\right\rangle$ in a controlled manner, where $c_{3}={1/{\max\limits_{i}\left|{F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)}\right|}}$ . This results in the overall state

\begin{split}&\left|\psi_{6}\right\rangle=\\ &\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}\left|i\right\rangle_{1}}\left|\Psi_{i,n}\right\rangle_{245}\left|F({\mathbf{x}_{i}\cdot\mathbf{w}})\right\rangle_{6}\left|\left\|\mathbf{x}_{i}\right\|\right\rangle_{7}\left|\left\|\mathbf{w}\right\|\right\rangle_{8}{\left|\phi_{i}\right\rangle_{9}}.\end{split}

(29)

$(B2.5)$ The inverse operations of steps $(B1.2)-(B2.3)$ are performed on $\left|\psi_{6}\right\rangle$ . Afterwards, a register in the state $|1\rangle$ is added to obtain

\left|\psi\right\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{\left|i\right\rangle_{1}\left|\phi_{i}\right\rangle_{9}}}\left|1\right\rangle_{10}.

(30)

For convenience, the $\mathcal{A}_{\psi}$ is marked as the operations which achieve $\mathcal{A}_{\psi}|00\cdots 0\rangle=|\psi\rangle$ . Its schematic quantum circuit is given in Fig. 3.

$(B3)$ Estimate the gradient $\bm{g}^{j}\left({\mathbf{w}(n)}\right)$ with swap test.

$(B3.1)$ Three registers in state $\frac{1}{\sqrt{M}}{\sum_{i=0}^{M-1}{\left|i\right\rangle_{1}\left|j\right\rangle_{2}}}\left|0\right\rangle_{3}$ are prepared. Performing $O_{X}$ on it to generate the state

\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{|i\rangle_{1}|j\rangle_{2}}}|\mathbf{x}_{i}^{j}\rangle_{3}.

(31)

$(B3.2)$ The controlled rotation operation ( $|0\rangle\rightarrow\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}\left|0\right\rangle+c_{1}\mathbf{x}_{i}^{j}\left|1\right\rangle$ ) is implemented to get

\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{|i\rangle_{1}\left|j\right\rangle_{2}}}|\mathbf{x}_{i}^{j}\rangle_{3}\left[{\sqrt{1-({c_{1}{\mathbf{x}}_{i}^{j}})^{2}}\left|0\right\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle}\right]_{4}.

(32)

$(B3.3)$ The inverse operation of $O_{X}$ is performed. After that, we can obtain the state

|\chi^{j}\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{|i\rangle_{1}|0\rangle_{3}}}\left[{\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}|0\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle}\right]_{4},

(33)

via undoing the register $|j\rangle$ .

$(B3.4)$ In order to obtain the gradient, the technology of swap test [38] is utilized. Combining the processes of generating the states $\left|\psi\right\rangle$ and $\left|\chi^{j}\right\rangle$ , a quantum state $\frac{1}{\sqrt{2}}\left(\left|0\right\rangle\left|\psi\right\rangle+\left|1\right\rangle\left|\chi^{j}\right\rangle\right)$ can be constructed. Then, measuring the first register to see whether it is in the state $\left|+\right\rangle=\frac{1}{\sqrt{2}}\left(\left|0\right\rangle+\left|1\right\rangle\right)$ . The measurement has the success probability

P=\frac{1}{2}+\frac{1}{2}\left\langle\psi\middle|\chi^{j}\right\rangle.

(34)

According to Eq. (2), the $\bm{g}^{j}\left({\mathbf{w}(n)}\right)={(2P-1)}/{({c_{1}c_{3}})}$ can be calculated. Hence, it is possible for ${\rm Bob}_{k}$ to obtain the local gradient $\bm{g}_{k}\left(\mathbf{w}\right)=\left({\bm{g}_{k}^{0}\left({\mathbf{w}(n)}\right),\bm{g}_{k}^{1}\left({\mathbf{w}(n)}\right),\cdots,\bm{g}_{k}^{D-1}\left({\mathbf{w}(n)}\right)}\right)^{T}$ by repeating the steps of the above algorithm with his data.

III-C Model aggregation with QSMC protocol and update

We will design a protocol to safely compute the federated gradients $\bm{G}\left(\mathbf{w}\right)={\sum_{k=1}^{K}{\beta_{k}}{\bm{g}_{k}\left(\mathbf{w}\right)}}$ in this section. That is, calculating $\bm{G}\left(\mathbf{w}\right)$ without revealing the local gradient ${\bm{g}_{k}\left(\mathbf{w}\right)}$ . To do it, the server Alice is assumed to be semi-honest who may misbehave on her own but cannot conspire with others. Moreover, the federated gradients are needed to be accurate to ${\gamma}^{-1}$ . This means that $\gamma{\beta_{k}}{\bm{g}_{k}^{j}(\mathbf{w})}\geq 0$ . Simply, the $\gamma{\beta_{k}}{\bm{g}_{k}^{j}(\mathbf{w})}$ is marked as $\mu_{k}^{j}$ . And supposing that ${\sum_{k=1}^{K}\mu_{k}^{j}}<S$ . The further details are described as follows.

$(C1)$ Preparation for multi-party quantum communication.

Alice announces ${\gamma}$ and the global dataset scale $(\sum_{k=1}^{K}{M_{k}})$ . At same time, the participants (server and clients) choose $m$ numbers $d_{i}$ ( $i=1,2,\cdots,m$ ) which are mutually prime and satisfy $d_{1}\times d_{2}\times\cdots\times d_{m}=S$ . Subsequently, ${\rm Bob}_{k}$ $(k=1,2,\cdots,K)$ calculates his secret

s_{k,i}^{j}=\mu_{k}^{j}~{}\text{mod}~{}d_{i}.

(35)

Alice produces a $d_{i}$ -level $(K+1)$ -particle GHZ state

\left|\Psi\right\rangle=\frac{1}{\sqrt{d_{i}}}{\sum\limits_{q=0}^{d_{i}-1}|q\rangle^{\bigotimes(K+1)}},

(36)

and marks the $(K+1)$ particles by $Q_{0},Q_{1},\cdots,Q_{K}$ .

$(C2)$ Distribution of quantum pairs.

For the sake of checking the presence of eavesdroppers, Alice prepares $K$ sets of $\delta$ decoy states, where each decoy photon randomly is in one of the states from the set $V_{1}=\left\{\left|p\right\rangle\right\}_{p=0}^{d_{i}-1}$ and $V_{2}=\left\{QFT\left|p\right\rangle\right\}_{p=0}^{d_{i}-1}$ , where $QFT$ is represented as the quantum Fourier transform [39]. These sets are denoted as $E_{1},E_{2},\cdots,E_{K}$ , respectively. Then Alice inserts $Q_{k}$ into $E_{k}$ at a random position, and sends them to ${\rm Bob}_{k}$ for $k=1,2,\cdots,K$ .

$(C3)$ Security checking of quantum channel.

After receiving $\delta+1$ particles, ${\rm Bob}_{k}$ sends acknowledgements to Alice. Subsequently, the positions and the bases of the decoy photons are announced to ${\rm Bob}_{k}$ by Alice. ${\rm Bob}_{k}$ measures the decoy photons and returns the measurement results to Alice who then calculates the error rate by comparing the measurement results with initial states. If the error rate is higher than the threshold determined by the channel noise, Alice cancels this protocol and restarts it. Otherwise, the protocol is continued.

$(C4)$ Measurement of particles and encoding of transmission information.

${\rm Bob}_{k}$ extracts all the decoy photons and discards them. Then, server and clients perform a measurement $\left\{QFT\left|p\right\rangle\right\}_{p=0}^{d_{i}-1}$ on the remaining particles, respectively. The measurement results record as $o_{s,i},o_{1,i},\cdots,o_{K,i}$ and these satisfy $o_{s,i}+o_{1,i}+\cdots+o_{K,i}=0~{}\text{mod}~{}d_{i}$ . Subsequently, ${\rm Bob}_{k}$ encodes his data $s_{k,i}^{\prime j}=s_{k,i}^{j}+o_{k,i}$ and sends it to Alice.

$(C5)$ Computation of federated gradient by server.

At this stage, Alice accumulates all the results $s_{k,i}^{\prime j}$ to compute

\begin{split}&\left({o_{s}+o_{1,i}+s_{1,i}^{j}+o_{2,i}+s_{2,i}^{j}\cdots+o_{K,i}+s_{K,i}^{j}}\right)~{}\text{mod}~{}d_{i}\\ =&\left({\mu_{1}^{j}}+{\mu_{2}^{j}}+\cdots+{\mu_{K}^{j}}\right)~{}\text{mod}~{}d_{i}.\end{split}

(37)

For $i=1,2,\cdots,m$ , Alice can obtain $m$ equations such as Eq. (37). According to the Chinese remainder theorem, Alice compute the summation

\left({\sum\limits_{k=1}^{K}{\mu_{k}^{j}}}\right)~{}\text{mod}~{}S=~{}{\sum\limits_{k=1}^{K}{\mu_{k}^{j}}}.

(38)

And it is easy to get the federated gradient

\bm{G}^{j}\left(\mathbf{w}\right)=\frac{1}{\gamma}\sum\limits_{k=1}^{K}{\mu_{k}^{j}}={\sum\limits_{k=1}^{K}{\beta_{k}}{\bm{g}_{k}^{j}(\mathbf{w})}}.

(39)

After the similar processes, the federated gradient $(\bm{G}^{0}(\mathbf{w}),\bm{G}^{1}(\mathbf{w}),\cdots,\bm{G}^{D-1}(\mathbf{w}))$ could be obtained by Alice. And she updates the global model parameters $\mathbf{w}({n+1})=\mathbf{w}(n)-\alpha\times\bm{G}\left({\mathbf{w}(n)}\right)$ . In order to exhibit the process of model aggregation more clearly, a concrete example is presented in the appendix C.

III-D Model evaluation and distribution via classical communication networks

The server (Alice) needs to evaluate whether the model should be further optimized after one round of training in QFLGD. Similar to classical FL, Alice utilizes the smoothness of the gradient to evaluate the model performance. Specifically, the server sends a termination training signal and announces the global parameters when ${\sum\limits_{j=0}^{D-1}\left[{\bm{G}^{j}\left(\mathbf{w}(n)\right)}\right]^{2}}\leq\varepsilon$ . Otherwise, she distributes the updated model parameters to clients for new training.

IV Analysis

In this section, we provide a brief analysis of the proposed framework. As discussed previously, the QGD algorithm (shown in Sec. III-B) enables clients to accelerate the training gradients on a local quantum computer. The QSMC protocol (shown in Sec. III-C) gives a method to securely update the federated parameters to protect the privacy of clients’ data. Therefore, two main aspects are considered in the analysis. One is the time complexity of local training (the QGD algorithm). The other is the security of model aggregation (the QSMC protocol).

IV-A Time complexity of local training (the QGD algorithm)

In the QFLGD framework, assuming that $M_{1}\leq M_{2}\leq\cdots\leq M_{K}\leq M$ . Namely, the dataset scale is at most $M$ . And all clients need to accomplish the gradient training before calculating the federated gradient. Thus the waiting time for the distributed training gradient is the time consumed to train the dataset which scale is $M$ . In the following, the time complexity of the QGD algorithm is analyzed with the $M$ -scale dataset.

In the data preparation period (the Sec. III-A), the time consumption is caused by the processes of $U_{\mathbf{x}_{i}}$ and $U_{\mathbf{w}}$ , which generate the states $\left|{\phi(\mathbf{x}_{i})}\right\rangle$ and $\left|{\phi(\mathbf{w}(n))}\right\rangle$ about data information. It could be implemented in time $O(\text{polylog}(MD)+D+q)$ with the help of the $O_{X}$ , $U(\bm{\vartheta}_{t})$ and the controlled rotation operation [14, 30]. The $q$ is represented as the number of qubits which store the data information. Afterwards, $U_{\mathbf{x}_{i}}$ and $U_{\mathbf{w}}$ are applied to produce the state $\left|\psi_{1}\right\rangle$ in step $(B1)$ of local training (the Sec. III-B). Hence, step $(B1)$ can be implemented in time $O(\text{polylog}(MD)+D+q)$ .

In step $(B2)$ , we first consider the complexity of the unitary operation $\mathcal{A}_{i}$ . It contains $H$ , $U_{\mathbf{x}_{i}}$ , and $U_{\mathbf{w}}$ which take time $O(\text{polylog}(MD)+D+q)$ . Then, the QPE block needs $O\left({1/\epsilon_{\theta}}\right)$ applications of $Q_{i}$ to estimate the $\theta_{i}$ within error $\epsilon_{\theta}$ [35]. Therefore, the time complexity of step $(B2.1)$ is $O[(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}]$ . The runtime $O(\text{log}({1/\epsilon_{\theta}}))$ [37] of implementing the sine gate can be ignored, which is much smaller than the QPE.

Next, the time complexity of step $(B2.2)$ and step $(B2.3)$ are discussed. The main operation of the two steps includes $U_{nf}$ , $O_{y}$ , and the quantum arithmetic operation, which are performed to calculate $|F({\mathbf{x}}\cdot{\mathbf{w}})\rangle$ in time $O[{(\text{polylog}(D))/{\epsilon_{m}}}+\text{polylog}(M)+q]$ . In step $(B2.4)$ , the time complexity of the controlled rotation is $O(q)$ . Step $(B2.5)$ takes time $O[{(\text{polylog}(D))/{\epsilon_{m}}}+{(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}}]$ to implement the inverse operations of steps $(B1.2)$ - $(B2.3)$ . Putting all the steps together to get the time complexity of step $(B2)$ as $O[{(\text{polylog}(D))/{\epsilon_{m}}}+{(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}}]$ .

In step $(B3)$ , the processes of generating the $|\chi^{j}\rangle$ (described in steps $(B3.1)$ - $(B3.3)$ ) are accomplished in time $O(\text{polylog}(MD)+q)$ . According to step $(B2)$ , a copy of the quantum state $|\psi\rangle$ is produced in time $O[{(\text{polylog}(D))/{\epsilon_{m}}}+{(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}}]$ . The swap test is applied $O\left({{P\left({1-P}\right)}/\epsilon_{P}^{2}}\right)=O\left({1/\epsilon_{P}^{2}}\right)$ times to get the result $P$ within error $\epsilon_{P}$ in step $(B3.4)$ [40]. And each swap test should prepare a copy of $|\chi^{j}\rangle$ and $|\psi\rangle$ . Therefore, the runtime is $\left\{[{(\text{polylog}(D))/{\epsilon_{m}}}+{(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}}]\epsilon_{P}^{-2}\right\}$ in step $(B3)$ , that is the complexity of obtaining the desired result.

For convenience, we assume that $\mathbf{w}^{j}$ , $\mathbf{x}_{i}^{j}=O(1)$ , then $\|\mathbf{w}\|$ , ${\max\limits_{i}\left\|\mathbf{x}_{i}\right\|}=O(\sqrt{D})$ . Therefore, $q={\text{polylog}(D)}$ could fulfill the number of qubit required to store data information. In addition, taking $\epsilon_{m}$ , $\epsilon_{\theta}$ , and $\epsilon_{P}$ equaling to $\epsilon$ . After that, the complexity of the entire quantum algorithm to get $\bm{g}^{j}\left(\mathbf{w}\right)$ $(j=0,1,\cdots,D-1)$ in each iteration can further simplify into

O\left\{{D\left[{\left(\text{polylog}\left({MD}\right)+D\right)/\epsilon^{3}}\right]}\right\}.

(40)

This means that the time complexity of training gradient is $O(D^{2}\text{polylog}(MD))$ when $\epsilon^{-1}=\text{log}(MD)$ , achieving exponential acceleration on the number of data samples. Furthermore, the elements of $\mathbf{w}$ can also be accessed in time $O(\text{polylog}(D))$ if they are timely writing in QRAM. In this case, the proposed algorithm has exponential acceleration on the number $M$ and the quadratic speedup in the dimensionality $D$ , compared with the classical algorithm whose runtime is $O(MD^{2})$ .

IV-B Security analysis of model aggregation (the QSMC protocol)

In this section, the security of model aggregation (the QSMC protocol) will be analyzed. For the secure multi-party computing, the attacks from outside and all participants are the challenges, which have to deal with. In the following, we will show these attacks are invalid to our protocol.

Firstly, the outside attacks are discussed. In this protocol, the decoy photons is used to prevent the eavesdropping. This idea is derived from the BB84 protocol [4], which has been proved unconditionally safe. Here, we take the intercept-resend attack as an example to demonstrate. If an outside eavesdropper Eve attempts to intercept the particles sent from Alice and replaces them with his own fake particles, he will introduce extra error rate $1-\left(\frac{d_{i}+1}{2d_{i}}\right)^{\delta}$ . Therefore, Eve will be detected in step $(C3)$ through security checking analysis.

Secondly, the participant attacks are analyzed. In the proposed protocol, the participants include the server (Alice) and clients ( ${\rm Bob}_{k}$ , $k=1,2,\cdots,K$ ) who can access more information. Therefore, the participant attacks from dishonest clients or server should be considered.

For the participant attack from dishonest clients, only the extreme case of $K-1$ clients ${\rm Bob}_{1},\cdots,{\rm Bob}_{k-1},{\rm Bob}_{k+1},\cdots,{\rm Bob}_{K}$ colluding to steal the secret from ${\rm Bob}_{k}$ is considered here, because $K-1$ clients have the most powerful strength. In this case, even if the dishonest clients share their information, they cannot deduce $o_{k}$ without the help of Alice. That means they cannot obtain the secret $s_{k,i}^{j}$ by $s_{k,i}^{\prime j}=s_{k,i}^{j}+o_{k}$ . Thus, our algorithm can resist the collusion attack of dishonest ${\rm Bob}_{k}$ .

For the attack from Alice, the semi-honest Alice may steal the private information of ${\rm Bob}_{k}$ without conspiring with any one. In step (C4), Alice collects $s_{k,i}^{\prime j}$ for $k=1,2,\cdots,K$ . However, she still cannot learn $s_{k,i}^{j}$ due to the lack of knowledge about $o_{k}$ which from clients.

V Application: Training the Federated Linear Regression Model

V-A Quantum federated linear regression algorithm

Linear regression (LR) is an important supervised learning algorithm, which establishes a model of the relationship between the variable $\mathbf{x}_{i}$ and the observation $y_{i}$ . It has wide application in the scientific fields of biology, finance, and so on [41]. LR models are also usually fitted by minimizing the function in Eq. (1) and choosing ${f\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)}={\mathbf{x}_{i}\cdot\mathbf{w}}+{b}$ ( $b$ is a migration parameter).

In this section, we apply the QFLGD framework to train the LR model. In the training process, we need to implement the function

F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)=\mathbf{x}_{i}\cdot\mathbf{w}+b-y_{i}.

(41)

The state $|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle$ about ${\mathbf{x}_{i}\cdot\mathbf{w}}$ can be generated according to the QGD. Then, the state $|\mathbf{x}_{i}\cdot\mathbf{w}+b-y_{i}\rangle$ is produced in the following steps.

(S1) The oracle $O_{y}$ is applied on the state ${|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle}_{A}|0\rangle_{B}^{\otimes{q}}$ to get

{|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle}_{A}|y_{i}\rangle_{B}^{\otimes{q}},

(42)

in time $O(\text{log}(M))$ .

(S2) After obtaining ${|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle}={|e_{q},e_{q-1},\cdots,e_{1}\rangle}$ and ${|y_{i}\rangle}={|t_{q},t_{q-1},\cdots,t_{1}\rangle}$ , we implement the $QFT$ on $|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle_{A}$ to result in

\left[{\left|{\phi_{1}(e)}\right\rangle\otimes\left|{\phi_{2}(e)}\right\rangle\otimes\cdots\otimes\left|{\phi_{q}(e)}\right\rangle}\right]_{A}\left|y_{i}\right\rangle_{B},

(43)

where $\left|{\phi_{j}(e)}\right\rangle=\frac{1}{\sqrt{2}}\left(\left|0\right\rangle+e^{2\pi\mathbf{i}0.e_{j}e_{j-1}\cdots e_{1}}\left|1\right\rangle\right)$ for $j=1,2,\cdots,q$ .

(S3) Subsequently, the controlled rotation operation ${I\otimes|0\rangle\langle 0|}+{R_{j^{\prime}}\otimes|1\rangle\langle 1|}$ ( $j^{\prime}=1,\cdots,{j}$ ) are performed on $\left|{\phi_{j}(e)}\right\rangle$ and $|t_{q-j+1}\rangle$ ( $j=1,2,\cdots,q$ ), we can get

\left[{\left|{\phi_{1}(e-t)}\right\rangle\otimes\left|{\phi_{2}(e-t)}\right\rangle\otimes\cdots\otimes\left|{\phi_{q}(e-t)}\right\rangle}\right]_{A}\left|y_{i}\right\rangle_{B},

(44)

where the $R_{j^{\prime}}$ is defined as $|0\rangle\langle 0|+e^{{-2\pi\mathbf{i}}/2^{j^{\prime}}}|1\rangle\langle 1|$ and $\left|{\phi_{j}(e-j)}\right\rangle=\frac{1}{\sqrt{2}}\left(|0\rangle+e^{2\pi\mathbf{i}(0.e_{j}\cdots e_{1}-0.t_{j}\cdots t_{1})}|1\rangle\right)$ .

(S4) Inverse $QFT$ is applied on the register $A$ , the state becomes

\left[{\left|{e_{q}-t_{q}}\right\rangle\otimes\left|{e_{q-1}-t_{q-1}}\right\rangle\otimes\cdots\otimes\left|{e_{1}-t_{1}}\right\rangle}\right]_{A}\left|y_{i}\right\rangle_{B}.

(45)

Thus, the state $|\mathbf{x}_{i}\cdot\mathbf{w}-y_{i}\rangle$ can be obtained from the register A. Similarly, we can implement addition by changing the operation of step (S3) to ${I\otimes|0\rangle\langle 0|}+{R_{j^{\prime}}^{\dagger}\otimes|1\rangle\langle 1|}$ . Thus, the state $|\mathbf{x}_{i}\cdot\mathbf{w}-y_{i}+b\rangle$ could be obtained and its quantum circuit is presented in Fig. 4. The operations of these processes are labeled as $U_{s}$ , which are implemented in time $O(\text{log}(M)+q^{2})$ . Combining with the QFLGD framework, the quantum federated linear regression (QFLR) model could be constructed by algorithm 1.

Input: The variate

\mathbf{X}\in\mathbb{R}^{M\times D}

, the observed vector

\mathbf{y}\in\mathbb{R}^{M}

, the initial parameter

\mathbf{w}(0)\in\mathbb{R}^{D}

, the migration parameter

b

, the learning rate

\alpha

and the preset values

c_{i}

(i=1,2,3)

;

Output: The parameter

\mathbf{w}

and the model

y=\mathbf{w}^{T}\mathbf{x}_{i}+b

;

for ${\sum\limits_{j=0}^{D-1}\left({\bm{G}^{j}\left({\mathbf{w}(n-1)}\right)}\right)^{2}}$ $>$ $\varepsilon$ do

Step 1:

K

clients prepare quantum states with their sensitive data according to the methods in Sec. III-A;

Step 2: The clients apply the QGD Algorithm (in Sec. III-B) and

U_{s}

to local training

\bm{g}_{k}\left({\mathbf{w}(n)}\right)

(

k=1,2,\cdots,K

) respectively;

Step 3: The clients and the server use the QSMC protocol (in Sec. III-C) to secure calculate the federated gradient

\bm{G}\left(\mathbf{w}(n)\right)=\left(\bm{G}^{0}\left({\mathbf{w}(n)}\right),\bm{G}^{1}\left({\mathbf{w}(n)}\right),\cdots,\bm{G}^{D-1}\left({\mathbf{w}(n)}\right)\right)

, and the server updates global model parameter

\mathbf{w}({n+1})=\mathbf{w}(n)-\alpha\times\bm{G}\left({\mathbf{w}(n)}\right)

;

Step 4: The server evaluates model performance and sends the updated model parameter to

K

clients;

end for

Algorithm 1 Quantum federated LR algorithm

V-B Numerical Simulation

In this section, the numerical simulation of the QFLR algorithm will be presented. In our simulation, two clients ( ${\rm Bob}_{1}$ , ${\rm Bob}_{2}$ ) trained the QFLR model with a sever (Alice). The experiment is implemented on the IBM Qiskit simulator. The initial weight $\mathbf{w}(0)=(0.866,0.5)$ and the migration parameter $b=0$ are selected by Alice. ${\rm Bob}_{1}$ chooses an input vector $\mathbf{x}=(2,3.464)$ which corresponds the observation $y=2.464$ . Another client ${\rm Bob}_{2}$ selects an input vector $\mathbf{x}^{\prime}=(2.5,4.33)$ and the corresponding observation $y^{\prime}=2.33$ .

In the process of training the federated linear regression model, the main is to achieve the perfect $F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)$ calculation. That is, quantum computing $F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)$ values are required to be able to be stored in quantum registers with small error. An experiment of this step is presented with the data of ${\rm Bob}_{1}$ . For convenience, setting $c_{1}=1/4$ , $c_{2}=1$ , and the error $\epsilon_{\theta}=0.0001$ of quantum phase estimation. By substituting these into Eq. (41), the result $F\left({\mathbf{x}\cdot\mathbf{w}}\right)=1$ could be obtained. It can also be computed by

\begin{split}F\left({\mathbf{x}\cdot\mathbf{w}}\right)=4-16{\sin}^{2}\left({\widetilde{\theta}\pi/2^{4}}\right)+0-2.464,\end{split}

(46)

according to Eq. (23).

With the fact of ${\sin}^{2}({\widetilde{\theta}\pi/2^{4}})\approx\widetilde{\theta}^{2}/26$ and the most probable result $|0001\rangle$ (see Fig. 6(a)) from the QPE, Eq. (46) can be rewritten as

6.5\times F\left({\mathbf{x}\cdot\mathbf{w}}\right)\approx 26-4\widetilde{\theta}-16.

(47)

Its circuit is designed as exhibited and encoded via Qiskit (see Fig. 5). In Fig. 5(e), the matrix form of $U(\gamma,\phi,\lambda)$ is

U\left({\gamma,\phi,\lambda}\right)=\begin{bmatrix}{\cos\left({\gamma/2}\right)}&{-e^{\mathbf{i}\lambda}\sin\left({\gamma/2}\right)}\\ {e^{\mathbf{i}\phi}\sin\left({\gamma/2}\right)}&{e^{\mathbf{i}(\phi+\lambda)}\cos\left({\gamma/2}\right)}\\ \end{bmatrix}.

(48)

With the help of the IBM’s simulator (aer $\underline{~{}}$ simulator), the measurement results can be obtained which are shown in Fig. 6(b). From Fig. 6(b), two values ( $|00110\rangle$ and $|01110\rangle$ ) stand out, which have a much higher probability of measurement than the rest. Based on the analysis of the phase estimation results, selecting result $|00110\rangle$ with a high probability of $0.510$ . It means $F\left({\mathbf{x}\cdot\mathbf{w}}\right)\approx 0.923$ . Compared with the theoretical result (shown in Eq. (46)), the experimental result has an error of $0.077$ which is tolerable. Subsequently, ${\rm Bob}_{1}$ can estimate $\bm{g}_{1}^{1}\approx 1.846$ and $\bm{g}_{1}^{2}\approx 3.197$ by performing swap test. At same time, ${\rm Bob}_{2}$ estimates $F\left({\mathbf{x}^{\prime}\cdot\mathbf{w}}\right)\approx 2.115$ , $\bm{g}_{2}^{1}\approx 3.197$ , and $\bm{g}_{2}^{2}\approx 9.157$ of his data via similar experiment.

As the analogous process of the example shown in the appendix C, Alice calculates the federated gradient $G=(3.57,6.18)$ via the QSMC protocol. Theoretical analysis shows that the error is within $2\%$ of the actual solution $(3.5,6.06)$ which is obtained in the example. Thus, the training algorithm is found to be successful.

VI Conclusions

This work focuses on the design of the QFLGD for distributed quantum networks that can securely implement FL over an exponentially large data set. We first gave two methods of quantum data preparation, which can extract static data information and dynamic parameter information into logarithmic qubits. Then, we put forth the QGD algorithm to allow the time-consuming gradient calculation to be done on a quantum computer. In this way, the clients can estimate some urgently needed results of gradient training in parallel based on quantum superposition. The time complexity analysis is shown that our algorithm is exponentially faster than its classical counterpart on the number of data samples when the error $1/\epsilon=\text{log}(MD)$ . Furthermore, the QGD algorithm could also achieve quadratic speedup on the dimensionality of the data sample if the parameters $\mathbf{w}$ are stored in QRAM timely. And, we proposed a QSMC protocol to calculate the federated gradient securely. The evidence is demonstrated that the proposed protocol could resist some common outside and participant attacks, such as the intercept-resend attack. Finally, we indicated how to apply it to train a federated linear regression model and simulated some steps with the help of the IBM Qiskit simulator. The results also showed the effectiveness of QFLGD. In summary, the presented framework demonstrates the intriguing potential of achieving large-scale private distributed learning with quantum technologies and provides a valuable guide for exploring quantum advantages in real-life machine learning applications from the security perspective.

We hope the proposed framework can further be realized on a quantum platform with the gradual maturity of quantum technology. For example, how to implement the whole QFLGD process on the noisy intermediate-scale quantum (NISQ) devices is worth further exploration, and we will make more efforts.

Acknowledgments

This work was supported by National Natural Science Foundation of China (Grants No. 62171131, 61976053, and 61772134), Fujian Province Natural Science Foundation (Grant No. 2022J01186 and 2023J01533), and Innovation Program for Quantum Science and Technology (Grant No. 2021ZD0302901).

Appendix A Implement the Unitary Operation $U_{nf}$

In this appendix, we describe the implementation of a unitary operation $U_{nf}$ , which could generate a state about the $2$ -norm of $\mathbf{x}_{i}$ . Its steps as shown in the following.

(1) A quantum state is initialized as

\left|\varphi_{1}\right\rangle=\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}\left|i\right\rangle_{1}}\left|j\right\rangle_{2}\left|0\right\rangle_{3}.

(49)

(2) The oracle $O_{X}$ is performed to obtain

\left|\varphi_{2}\right\rangle=\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}\left|i\right\rangle_{1}}\left|j\right\rangle_{2}\left|\mathbf{x}_{i}^{j}\right\rangle_{3}.

(50)

(3) A register in the state $|0\rangle$ is appended as the last register and rotated to $\sqrt{1-(c_{1}\mathbf{x}_{i}^{j})^{2}}\left|0\right\rangle+c_{1}\mathbf{x}_{i}^{j}\left|1\right\rangle$ . After that, the system becomes

|\varphi_{3}\rangle=\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}|i\rangle_{1}}|j\rangle_{2}|\mathbf{x}_{i}^{j}\rangle_{3}\left[{\sqrt{1-(c_{1}\mathbf{x}_{i}^{j})^{2}}|0\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle}\right]_{4},

(51)

where $c_{1}={1/{\max_{i,j}|\mathbf{x}_{i}^{j}|}}$ . We can observe the ancilla register in the state $|1\rangle$ with probability $P_{1}={{c_{1}^{2}\left\|\mathbf{x}_{i}\right\|^{2}}/D}$ . The state $\left|\varphi_{3}\right\rangle$ can be rewritten as

\left|i\right\rangle_{1}\otimes\left({\sqrt{1-P_{1}}\left|g\right\rangle\left|0\right\rangle+\sqrt{P_{1}}\left|a\right\rangle\left|1\right\rangle}\right)_{234},

(52)

where

\left|g\right\rangle={\sum\limits_{j=0}^{D-1}{\sqrt{\frac{1-\left(c_{1}\mathbf{x}_{i}^{j}\right)^{2}}{D-c_{1}^{2}\left\|\mathbf{x}_{i}\right\|^{2}}}\left|j\right\rangle}}\left|\mathbf{x}_{i}^{j}\right\rangle

(53)

and

\left|a\right\rangle=\frac{1}{\left\|\mathbf{x}_{i}\right\|}{\sum\limits_{j=0}^{D-1}{\mathbf{x}_{i}^{j}\left|j\right\rangle}}\left|\mathbf{x}_{i}^{j}\right\rangle.

(54)

(4) Appending a register in state $\left|0\right\rangle^{\otimes\log(\epsilon_{m}^{-1})}$ . Then, the quantum phase estimation of $-U\left(\varphi_{3}\right)S_{0}U^{\dagger}\left(\varphi_{3}\right)S_{1}$ is performed to obtain

\left|\varphi_{4}\right\rangle=\left|i\right\rangle_{1}\otimes\left({\sqrt{1-P_{1}}\left|g\right\rangle\left|0\right\rangle+\sqrt{P_{1}}\left|a\right\rangle\left|1\right\rangle}\right)_{234}\otimes\left|\left\|\mathbf{x}_{i}\right\|\right\rangle_{5},

(55)

with the help of the square root circuit [42]. We denote the $\epsilon_{m}$ is a tolerance error of QPE, $U\left(\varphi_{3}\right)\left|0\right\rangle_{1234}=\left|\varphi_{3}\right\rangle$ , $S_{0}=I_{1234}-2\left|0\right\rangle_{1234}\left\langle 0\right|_{1234}$ and $S_{1}=I_{123}\otimes\left(I-2\left|0\right\rangle\left\langle 0\right|\right)_{4}$ .

(5) The inverse operations of steps (2)-(3) are applied to generated the state

\left|\varphi_{5}\right\rangle=\left|i\right\rangle\left|\left\|\mathbf{x}_{i}\right\|\right\rangle.

(56)

Therefore, the $U_{nf}:\left|i\right\rangle\left|0\right\rangle\rightarrow\left|i\right\rangle\left|\left\|\mathbf{x}_{i}\right\|\right\rangle$ could be implemented in the above steps. And its running time is mainly caused by the quantum phase estimation in step (4), which takes time $O\left({{\text{polylog}(D)}/\epsilon_{m}}\right)$ . Moreover, $\left\|\mathbf{w}\right\|$ could be estimated similarly.

Appendix B An example of extracting
the parameter $\mathbf{w}(n)$ information

In Sec. III-A, a way to prepare a quantum state of $\mathbf{w}$ without the help of QRAM is shown in step $(A2)$ . To further demonstrate it, an example is given in this appendix.

For convenience, supposing that $\mathbf{w}(n)=\mathbf{w}\in\mathbb{R}^{4}$ . Then,we can get angle parameters $\vartheta^{0}_{1}$ , $\vartheta^{0}_{2}$ , and $\vartheta^{1}_{2}$ which are satisfied

\begin{split}&{\cos(\vartheta^{0}_{1})=\frac{h^{0}_{1}}{h^{0}_{0}}},~{}~{}~{}~{}{\sin(\vartheta^{0}_{1})=\frac{h^{1}_{1}}{h^{0}_{0}}},\\ &{\cos(\vartheta^{0}_{2})=\frac{h^{0}_{2}}{h^{0}_{1}}},~{}~{}~{}~{}{\sin(\vartheta^{0}_{2})=\frac{h^{1}_{2}}{h^{0}_{1}}},\\ &{\cos(\vartheta^{1}_{2})=\frac{h^{2}_{2}}{h^{1}_{1}}},~{}~{}~{}~{}{\sin(\vartheta^{1}_{2})=\frac{h^{3}_{2}}{h^{1}_{1}}},\end{split}

(57)

according to Eq. (13). The values of $h^{j}_{t}$ ( $t=0,1,2$ ) are shown in Fig. 7, such as $h^{j}_{2}=\mathbf{w}^{j}$ for $j=0,1,2,3$ .

Then, the operations are defined as

\begin{split}&U(\bm{\vartheta}_{1})=R({\vartheta}^{0}_{1})\otimes I,\\ &U(\bm{\vartheta}_{2})=|0\rangle\langle 0|\otimes R({\vartheta}^{0}_{2})+|1\rangle\langle 1|\otimes R({\vartheta}^{1}_{2}),\end{split}

(58)

based on $R(\vartheta)=\begin{bmatrix}\cos{(\vartheta)}&-\sin{(\vartheta)}\\ \sin{(\vartheta)}&\cos{(\vartheta)}\\ \end{bmatrix}$ . It is easy to verify that

U(\bm{\vartheta}_{2})U(\bm{\vartheta}_{1})|00\rangle=\frac{1}{\left\|\mathbf{w}\right\|}(\mathbf{w}^{0}|00\rangle+\mathbf{w}^{1}|01\rangle+\mathbf{w}^{2}|10\rangle+\mathbf{w}^{3}|11\rangle).

(59)

Thus, the quantum state of $\mathbf{w}$ can be obtained.

Appendix C An example of the model aggregation

In this appendix, an example is presented to exhibit the model aggregation. Considering the model is trained by two clients ( ${\rm Bob}_{1}$ , ${\rm Bob_{2}}$ ) who respectively have a $1$ -scale dataset, with the help of a server (Alice). The gradients $g_{1}^{1}=2$ , $g_{1}^{2}=3.46$ , $g_{2}^{1}=5$ , and $g_{2}^{2}=8.66$ are assumed to be gained in the QGD algorithm. Simply, the eavesdropping check phase is ignored.

Firstly, Alice announces the accuracy of parameters is $\gamma^{-1}=1/100$ and the global dataset scale is $M_{1}+M_{2}=2$ . She chooses $d_{1}=23$ and $d_{2}=29$ with clients. After that, ${\rm Bob}_{1}$ calculates his secret

\begin{split}s_{1,1}^{1}&=~{}\mu_{1}^{1}\mod~{}23\\ &=8,\end{split}

(60)

$s_{1,2}^{1}=13$ , $s_{1,1}^{2}=12$ , and $s_{1,2}^{2}=28$ . At same time, ${\rm Bob}_{2}$ can get $s_{2,1}^{1}=20$ , $s_{2,2}^{1}=18$ , $s_{2,2}^{1}=19$ , and $s_{2,2}^{2}=27$ .

Secondly, Alice prepares a $23$ -level- $3$ particle GHZ state $\left|\Psi\right\rangle=\frac{1}{\sqrt{23}}{\sum\limits_{q=0}^{22}|q\rangle|q\rangle|q\rangle}$ for $d_{1}=23$ and gives a particle to each client respectively. Then these participants perform the measurement to get $o_{s,1}=7$ , $o_{1,1}=6$ , and $o_{2,1}=10$ . ${\rm Bob}_{1}$ ( ${\rm Bob}_{2}$ ) encodes his secret by using $o_{1,1}$ ( $o_{2,1}$ ) and sets to Alice. The result

\begin{split}&(7+8+6+20+10)\mod~{}23\\ =&(\mu_{1}^{1}+\mu_{2}^{1})\mod~{}23=5,\end{split}

(61)

could be computed by Alice.

Finally, the equations

\begin{split}&(\mu_{1}^{1}+\mu_{2}^{1})\mod~{}23=5,\\ &(\mu_{1}^{1}+\mu_{2}^{1})\mod~{}29=2,\end{split}

(62)

and

\begin{split}&(\mu_{1}^{2}+\mu_{2}^{2})~{}\mod~{}23=8,\\ &(\mu_{1}^{2}+\mu_{2}^{2})~{}\mod~{}29=26,\end{split}

(63)

could be obtained through a similar procedure. According to the Chinese remainder theorem, the federated gradient (3.5,6.06) is easy to get.

References

[1] Q. Jia, L. Guo, Y. Fang, and G. Wang, “Efficient privacy-preserving machine learning in hierarchical distributed system,” IEEE Transactions on Network Science and Engineering, vol. 6, pp. 599–612, 2019.
[2] B. Gu, A. Xu, Z. Huo, C. Deng, and H. Huang, “Privacy-preserving asynchronous vertical federated learning algorithms for multiparty collaborative learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, pp. 6103–6115, 2022.
[3] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, 2017, pp. 1273–1282.
[4] C. H. Bennett and G. Brassard, “Quantum cryptography: Public key distribution and coin tossing,” in Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing. IEEE New York, 1984, pp. 175–179.
[5] N. Gisin, G. Ribordy, W. Tittel, and H. Zbinden, “Quantum cryptography,” Reviews of Modern Physics, vol. 74, pp. 145–195, 2002.
[6] V. Scarani, H. Bechmann-Pasquinucci, N. J. Cerf, M. Dušek, N. Lütkenhaus, and M. Peev, “The security of practical quantum key distribution,” Reviews of modern physics, vol. 81, p. 1301, 2009.
[7] R. Schwonnek, K. T. Goh, I. W. Primaatmaja, E. Y.-Z. Tan, R. Wolf, V. Scarani, and C. C.-W. Lim, “Device-independent quantum key distribution with random key basis,” Nature communications, vol. 12, pp. 1–8, 2021.
[8] M. Hillery, V. Bužek, and A. Berthiaume, “Quantum secret sharing,” Physical Review A, vol. 59, pp. 1829–1834, 1999.
[9] A. Karlsson, M. Koashi, and N. Imoto, “Quantum entanglement for secret sharing and secret splitting,” Physical Review A, vol. 59, pp. 162–168, 1999.
[10] R. Cleve, D. Gottesman, and H.-K. Lo, “How to share a quantum secret,” Physical Review Letters, vol. 83, pp. 648–651, 1999.
[11] K. Boström and T. Felbinger, “Deterministic secure direct communication using entanglement,” Physical Review Letters, vol. 89, p. 187902, 2002.
[12] F. G. Deng, G. L. Long, and X. S. Liu, “Two-step quantum direct communication protocol using the einstein-podolsky-rosen pair block,” Physical Review A, vol. 68, p. 042317, 2003.
[13] P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,” SIAM Review, vol. 41, pp. 303–332, 1999.
[14] A. W. Harrow, A. Hassidim, and S. Lloyd, “Quantum algorithm for linear systems of equations,” Physical Review Letters, vol. 103, p. 150502, 2009.
[15] L.-C. Wan, C.-H. Yu, S.-J. Pan, S.-J. Qin, F. Gao, and Q.-Y. Wen, “Block-encoding-based quantum algorithm for linear systems with displacement structures,” Physical Review A, vol. 104, p. 062414, 2021.
[16] H.-L. Liu, L.-C. Wan, C.-H. Yu, S.-J. Pan, S.-J. Qin, F. Gao, and Q.-Y. Wen, “A quantum algorithm for solving eigenproblem of the laplacian matrix of a fully connected weighted graph,” Advanced Quantum Technologies, vol. 6, p. 2300031, 2023. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/qute.202300031
[17] C.-H. Yu, F. Gao, and Q.-Y. Wen, “An improved quantum algorithm for ridge regression,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, pp. 858–866, 2021.
[18] M.-H. Chen, C.-H. Yu, J.-L. Gao, K. Yu, S. Lin, G.-D. Guo, and J. Li, “Quantum algorithm for gaussian process regression,” Physical Review A, vol. 106, p. 012406, 2022.
[19] F. Scala, A. Ceschini, M. Panella, and D. Gerace, “A general approach to dropout in quantum neural networks,” Advanced Quantum Technologies, p. 2300220, 2023. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/qute.202300220
[20] Y.-D. Wu, Y. Zhu, G. Bai, Y. Wang, and G. Chiribella, “Quantum similarity testing with convolutional neural networks,” Physical Review Letters, vol. 130, p. 210601, May 2023.
[21] R. LaRose, A. Tikku, É. O’Neel-Judy, L. Cincio, and P. J. Coles, “Variational quantum state diagonalization,” npj Quantum Information, vol. 5, p. 57, 2019.
[22] H.-L. Liu, Y.-S. Wu, L.-C. Wan, S.-J. Pan, S.-J. Qin, F. Gao, and Q.-Y. Wen, “Variational quantum algorithm for the poisson equation,” Physical Review A, vol. 104, p. 022418, 2021.
[23] S.-X. Zhang, Z.-Q. Wan, C.-Y. Hsieh, H. Yao, and S. Zhang, “Variational quantum-neural hybrid error mitigation,” Advanced Quantum Technologies, vol. 6, p. 2300147, 2023. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/qute.202300147
[24] W. Li, S. Lu, and D. L. Deng, “Quantum federated learning through blind quantum computing,” Science China Physics, Mechanics $\&$ Astronomy, vol. 64, pp. 1–8, 2021.
[25] C. Ren, R. Yan, M. Xu, H. Yu, Y. Xu, D. Niyato, and Z. Y. Dong, “Qfdsa: A quantum-secured federated learning system for smart grid dynamic security assessment,” IEEE Internet of Things Journal, vol. 11, pp. 8414–8426, 2024.
[26] S. Y.-C. Chen and S. Yoo, “Federated quantum machine learning,” Entropy, vol. 23, p. 460, 2021.
[27] R. Huang, X. Tan, and Q. Xu, “Quantum federated learning with decentralized data,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 28, pp. 1–10, 2022.
[28] S. Wang, L. Huang, Y. Nie, X. Zhang, P. Wang, H. Xu, and W. Yang, “Local differential private data aggregation for discrete distribution estimation,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, pp. 2046–2059, 2019.
[29] R. Xue, K. Xue, B. Zhu, X. Luo, T. Zhang, Q. Sun, and J. Lu, “Differentially private federated learning with an adaptive noise mechanism,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 74–87, 2024.
[30] L. Wossnig, Z. Zhao, and A. Prakash, “Quantum linear system algorithm for dense matrices,” Physical Review Letters, vol. 120, p. 050502, 2018.
[31] V. Giovannetti, S. Lloyd, and L. Maccone, “Quantum random access memory,” Physical Review Letters, vol. 100, p. 160501, 2008.
[32] I. Kerenidis and A. Prakash, “Quantum recommendation systems,” arXiv preprint arXiv:1603.08675, 2016.
[33] K. Mitarai, M. Kitagawa, and K. Fujii, “Quantum analog-digital conversion,” Physical Review A, vol. 99, p. 012301, 2019.
[34] I. Cong and L. Duan, “Quantum discriminant analysis for dimensionality reduction and classification,” New Journal of Physics, vol. 18, p. 073011, 2016.
[35] G. Brassard, P. Høyer, M. Mosca, and A. Tapp, “Quantum amplitude amplification and estimation,” Contemporary Mathematics, vol. 305, pp. 53–74, 2002.
[36] C. P. Shao, “Fast variational quantum algorithms for training neural networks and solving convex optimizations,” Physical Review A, vol. 99, p. 042325, 2019.
[37] S. S. Zhou, T. Loke, J. A. Izaac, and J. Wang, “Quantum fourier transform in computational basis,” Quantum Information Processing, vol. 16, pp. 1–19, 2017.
[38] H. Buhrman, R. Cleve, J. Watrous, and R. De Wolf, “Quantum fingerprinting,” Physical Review Letters, vol. 87, p. 167902, 2001.
[39] M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information. Cambridge University Press, 2010.
[40] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector machine for big data classification,” Physical Review Letters, vol. 113, p. 130503, 2014.
[41] A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media Inc, 2022.
[42] M. K. Bhaskar, S. Hadfield, A. Papageorgiou, and I. Petras, “Quantum algorithms and circuits for scientific computing,” arXiv preprint arXiv:1511.08253, 2015.

Quantum Federated Learning for Distributed Quantum Networks