This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantum Federated Learning for Distributed
Quantum Networks

Kai Yu İD , Fei Gao İD , and Song Lin İD \bullet Kai Yu is with the College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350117, China. E-mail: [email protected].\bullet Fei Gao, the corresponding author, is with the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China. E-mail: [email protected].\bullet Song Lin, the corresponding author, is with the College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350117, China. E-mail: [email protected].
Abstract

Federated learning is a framework for learning from distributed networks. It attempts to build a global model based on virtual fusion data without sharing the actual data. Nevertheless, the traditional federated learning process encounters two challenges: high computational cost and message transmission security. To address that, we propose a quantum federated learning for distributed quantum networks by utilizing quantum characteristics. First, we give two methods to extract the data information to the quantum state. It can cope with different acquisition frequencies of data information. Next, a quantum gradient descent algorithm is provided to help clients in the distributed quantum networks to train local models in parallel. Compared with the classical counterpart, the proposed algorithm achieves exponential acceleration in dataset scale and quadratic speedup in data dimensionality. And, a quantum secure multi-party computation protocol with Chinese residual theorem is designed. It could avoid errors and overflow problems that may occur in the process of large number operation. Security analysis shows that the protocol can resist common external and internal attacks. Finally, to demonstrate the effectiveness of the proposed framework, we use it to train a federated linear regression model and simulate the essential computation steps on the IBM Qiskit simulator.

Index Terms:
Quantum algorithm, federated learning, distributed networks, quantum gradient descent, quantum secure multi-party computation.

I Introduction

With the development of information network, more and more data are generated and stored in distributed network system [1]. Integrating data from distributed networks enables the extraction of valuable information. Nevertheless, a significant proportion of the data contains sensitive and private information, making data owners hesitant to share it [2]. This situation has led to the emergence of federated learning (FL), which is a distributed machine learning (ML) method [3]. FL improves data privacy by localizing data and training it without sharing the raw data. This not only makes effective use of distributed data resources, but also facilitates the development of information network technology. However, the volume of locally trained data can be huge. At this point, the computing power of traditional computers will face great challenges. Furthermore, the transmission of training results poses a threat to user privacy, as it can provide an opportunity for attackers to infer sensitive information. While some classical passwords have been used to safeguard communication security, development in hardware poses a persistent threat to their security.

Quantum information processing (QIP) is an emerging field that explores the interaction between quantum mechanics and information technology. It sustains to show its charm, attracting the attention of scholars. In 1984, Bennett and Brassard proposed the famous BB84 protocol [4], which perfectly achieves a key distribution task between two remote parties. Subsequently, scholars utilized quantum information processing to ensure information security and proposed a series of quantum cryptography protocols. In contrast to the security of classical cryptography protocols that are based on the assumption of computational complexity, these protocols’ security relies on physical properties such as the Heisenberg uncertainty principle, which makes them unconditionally secure in theory. Quantum cryptography has developed as a significant application of quantum information processing, including quantum key distribution [5, 6, 7], quantum secret sharing [8, 9, 10], quantum secure direct communication [11, 12], and so on. Another exciting application of quantum information processing is quantum computing. It provided quantum speedup to certain classes of problems that are intractable on classical computers. For example, the factorization of large numbers via Shor algorithm [13] can provide exponential speedup. Furthermore, quantum computing has also made some advances in machine learning, such as the quantum linear systems solving algorithms [14, 15, 16], quantum regression [17, 18], quantum neural network [19, 20], variational quantum algorithms (VQA) [21, 22, 23], and so on.

Motivated by the advantages shown by quantum cryptography and quantum computing respectively in improving transmission security and computing speed, scholars have attempted to utilize QIP to address the challenges faced by FL. In 2021, Li et al. focused on the security issue of FL [24]. They proposed a private single-party delegated training protocol based on blind quantum computing for a variational quantum classifier, and then extended the protocol to quantum FL combined with differential privacy. This protocol can exploit the computing advantage of remote quantum servers with privacy of sensitive data. In 2024, Ren et al. proposed a quantum FL to solve the privacy preservation issue for the smart cyber-physical grid dynamic security assessment problem [25]. Moreover, Chen and Yoo proposed a quantum FL scheme with a hybrid quantum-classical machine learning model, focusing on improving the efficiency of the local training [26]. In their way, the classical convolutional network is used to extract data features and compress them into vectors which are input into variable quantum circuits for training. Compared with the classical process, this method can achieve the same level of accuracy more quickly. In 2022, Huang et al. utilized a variational quantum algorithm to estimate the gradient of the local model to avoid analyzing the gradient too costly [27]. As variational quantum algorithms approximate the target results by using circuits with variables, they are different from quantum algorithms that calculate the target results through the evolution of quantum gates. Therefore, we further explore the realization of FL with quantum resources.

In this paper, we focus on the quantum algorithm running on ordinary quantum computers and present a quantum federated learning based on gradient descent (QFLGD). It aims to provide a unified, secure, and effective gradient distribution estimation scheme with distributed quantum networks. In QFLGD, we propose two data preparation methods by analyzing the different acquisition frequencies of static data (the local training data) and dynamic data (the parameters that need to be updated during iteration). That can reduce the requirement of QFLGD on the performance of quantum random memory. At the same time, two main processes of FL are implemented in QFLGD, which exploit quantum properties. The first one is a quantum gradient descent (QGD) algorithm. It facilitates the acceleration of the training gradient for the client. QGD provides the client with a classical gradient at each iteration, which can be directly used to learn classical model parameters. Compared with the classical counterpart, this quantum process has exponential acceleration in terms of data scale and quadratic speedup in data dimensionality. The other is a quantum secure multi-party computation (QSMC) protocol, which allows the aggregation of gradients to securely be done with quantum communication networks. That is, the server is able to calculate the federated gradients without the client sharing the local gradients. Furthermore, the application of the Chinese remainder theorem in QSMC makes it possible to avoid errors and overflow problems that may occur during the calculation of large numbers. The proposed quantum federated learning framework can improve the local computing efficiency and data privacy of FL. We also apply QFLGD to train the federated linear regression (FLR) and give its numerical experiment to verify the correctness.

The remainder of this paper is organized as follows. The classical FL is reviewed in Sec. II. In Sec. III, we propose the framework for QFLGD. In Sec. IV, we analyze the time complexity and the security of QFLGD. Furthermore, an application to train the FLR and the numerical experiment are shown in Sec. V. In Sec. VI, we give the conclusion of our work.

II Review of classical FL

To clarify the framework of QFLGD in the distributed quantum networks, this section offers a overview of the fundamental ideas and processes of traditional FL. FL is a collaborative ML approach in which multiple clients train a shared model without exchanging raw data. A popular learning framework is FL based on gradient descent [3], which is depicted in Fig. 1. It mainly includes the following parts.

Refer to caption
Figure 1: Schematic diagram of the federated learning based on gradient descent. 𝐰j(n)\mathbf{w}^{j}(n) is denoted the jjth element of the parameter vector 𝐰(n)\mathbf{w}(n). 𝑮j(𝐰(n))\bm{G}^{j}\left(\mathbf{w}(n)\right) is the jjth component of the global gradient. βk\beta_{k} is the aggregation weight. 𝒈kj(𝐰(n))\bm{g}^{j}_{k}\left(\mathbf{w}(n)\right) is represented as the jjth element of local gradient of the client Bobk{\rm Bob}_{k}. α\alpha is a learning rate.

A) Data preparation and model initialization. In the FL framework, data is derived from various clients in a distributed network, such as hospital medical information, preference options in business surveys, and other sensitive data [28]. We consider general federated learning with KK clients participating in the model training. The server (Alice) initializes a global model that requires training parameters 𝐰=(ω0,ω1,,ωD1)\mathbf{w}=\left(\omega_{0},\omega_{1},\cdots,\omega_{D-1}\right) to make it more efficient. And the server distributes it to clients. The client Bobk{\rm Bob}_{k} (k=1,2,,Kk=1,2,\cdots,K) collects and preprocesses MkM_{k} data samples (𝐱0,y0),(𝐱1,y1),,(𝐱Mk1,yMk1)\left({\mathbf{x}_{0},y_{0}}\right),\left({\mathbf{x}_{1},y_{1}}\right),\cdots,\left({\mathbf{x}_{M_{k}-1},y_{M_{k}-1}}\right), where 𝐱iD\mathbf{x}_{i}\in\mathbb{R}^{D} and yiy_{i} is the corresponding label.

B) Local training. To train the model, clients use standard ML algorithms without sharing raw data. The trained ML model evaluation task is expressed as minimizing the cost function, such as minimizing mean square error (MSE) loss function

min𝐰E=12Mki=0Mk1[f(𝐱i𝐰)yi]2,{\min\limits_{\mathbf{w}}E}=\frac{1}{2M_{k}}{\sum\limits_{i=0}^{M_{k}-1}\left[{f\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)-y_{i}}\right]^{2}}, (1)

where ff is the activation function. This function is usually expressed as minimizing the difference between the model output and the expected output. In this case, the model optimization is to find the gradient of EE with respect to 𝐰\mathbf{w} to adjust the model parameters. For the client Bobk{\rm Bob}_{k} (k=1,2,,Kk=1,2,\cdots,K), he can obtain

𝒈kj(𝐰)=1Mki=0Mk1F(𝐱i𝐰)𝐱ij,j=0,1,,D1,\bm{g}_{k}^{j}\left(\mathbf{w}\right)=\frac{1}{M_{k}}{\sum\limits_{i=0}^{M_{k}-1}{F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)}}\mathbf{x}_{i}^{j},\quad j=0,1,\cdots,D-1, (2)

with his data. Here, F(𝐱i𝐰)F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right) is represented as (f(𝐱i𝐰))(𝐱i𝐰)[f(𝐱i𝐰)yi]\frac{\partial(f({\mathbf{x}_{i}\cdot\mathbf{w}}))}{\partial({\mathbf{x}_{i}\cdot\mathbf{w}})}\left[{f\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)-y_{i}}\right], 𝒈kj(𝐰)\bm{g}_{k}^{j}\left(\mathbf{w}\right) is denoted the jjth element of the local gradient 𝒈k(𝐰)\bm{g}_{k}\left(\mathbf{w}\right), and 𝐱ij\mathbf{x}_{i}^{j} is labeled the jjth element of the sample 𝐱i\mathbf{x}_{i}.

C) Model aggregation and update. The server (Alice) collects the gradients trained by all clients, and calculates the federated gradient

𝑮j(𝐰)=k=1Kβk𝒈kj(𝐰),\bm{G}^{j}\left(\mathbf{w}\right)={\sum\limits_{k=1}^{K}{\beta_{k}}{\bm{g}_{k}^{j}\left(\mathbf{w}\right)}}, (3)

where βk=Mk/(k=1KMk){\beta_{k}}={M_{k}}/({\sum_{k=1}^{K}{M_{k}}}) and 𝑮j(𝐰)\bm{G}^{j}\left(\mathbf{w}\right) is represented as the jjth element of the federated gradient 𝑮(𝐰)\bm{G}\left(\mathbf{w}\right). Then, Alice updates the global model parameters. Specifically, she adjusts the parameter to

𝐰j(n+1)=𝐰j(n)α×𝑮j(𝐰(n)),\mathbf{w}^{j}\left({n+1}\right)=\mathbf{w}^{j}(n)-\alpha\times\bm{G}^{j}\left({\mathbf{w}(n)}\right), (4)

for j=0,1,,D1j=0,1,\cdots,D-1, in the nnth iteration. In Eq.(4), α\alpha is a learning rate [29].

D) Model evaluation and distribution. The server (Alice) evaluates the performance of the global model and sends the global model parameters to the clients for further local training if it has not yet converged (i.e., j=0D1[𝑮j(𝐰(n))]2>ε{\sum\limits_{j=0}^{D-1}\left[{\bm{G}^{j}\left(\mathbf{w}(n)\right)}\right]^{2}}>\varepsilon where ε\varepsilon is a threshold about gradient). Once the condition is satisfied, Alice announces that the training stops and distributes the model.

It is notable that the time consumption of FL is mainly in the calculation of the local gradient. For the client, he computes a gradient element takes O(D)O(D) time to estimate the inner product 𝐱i𝐰{\mathbf{x}_{i}\cdot\mathbf{w}} and O(M)O(M) time to sum. In general, it takes DD repetitions to estimate all elements of the local gradient. Therefore, Bobk{\rm Bob}_{k} takes O(MD2)O(MD^{2}) time to calculate all the elements of the local gradient on a classical computer. In the era of big data, this is surely a very expensive calculation. Moreover, the security of federation learning may be compromised during local gradient aggregation. Traditional encryption methods can improve the security of this process. However, with the development of quantum technology, there are threats to traditional encryption methods.

III Quantum Federated Learning based on Gradient Descent Algorithm

In this section, we present the QFLGD, which focuses on the parallel and private computing architectures for data in distributed quantum networks. This distributed quantum network typically consists of a server and several clients with quantum computing capabilities. We first give ways to extract the data information to the quantum state. Subsequently, we propose a QGD algorithm that clients use to estimate the gradient locally. A QSMC protocol is designed to perform a private calculation of the global gradient when the server aggregates the training results of clients. Finally, the server updates the global parameters and shares the results with clients. The schematic diagram of QFLGD framework is presented in Fig. 2.

Refer to caption
Figure 2: Schematic illustration of QFLGD.

III-A Quantum data preparation and model initialization

Similar to the classical FL, a dataset 𝐗=[𝐱0,𝐱1,,𝐱M1]\mathbf{X}=\left[\mathbf{x}_{0},\mathbf{x}_{1},\cdots,\mathbf{x}_{M-1}\right] is chosen by a client in quantum FL, where 𝐱iD\mathbf{x}_{i}\in\mathbb{R}^{D}. 𝐲=(y0,y1,,yM1)\mathbf{y}=\left(y_{0},y_{1},\cdots,y_{M-1}\right) is the corresponding label of each sample of 𝐗\mathbf{X}, respectively. For convenience, assuming that D=2LD=2^{L} for some LL; otherwise, some zeros are inserted into the vector. Furthermore, the server initializes an ML model with parameters. The learnable parameters of the model are represented by a vector 𝐖D\mathbf{W}\in\mathbb{R}^{D}, which can be optimized using gradient descent. The ability of quantum computers to effectively solve practical problems depends on encoding this information into quantum states as input to a quantum algorithm. Here, we give methods to extract the data and parameters information to quantum states.

Considering the quantum oracles

OX:|i|j|0|i|j|𝐱ij,O_{X}:\left|i\right\rangle\left|j\right\rangle\left|0\right\rangle\longrightarrow\left|i\right\rangle\left|j\right\rangle|\mathbf{x}_{i}^{j}\rangle, (5)

and

Oy:|i|0|i|yi,O_{y}:\left|i\right\rangle\left|0\right\rangle\longrightarrow\left|i\right\rangle|y_{i}\rangle, (6)

are provided, where 𝐱ij\mathbf{x}_{i}^{j} represents the jjth element of the iith vector of the data set 𝐗\mathbf{X}. These two oracles can respectively access the entries of 𝐱i\mathbf{x}_{i}, 𝐲\mathbf{y} in time O(polylog(MD))O(\text{polylog}(MD)) and O(polylog(M))O(\text{polylog}(M)) [17, 30], when the data are stored in quantum random access memory (QRAM) [31] with an appropriate data structure [32]. In addition, the operation

Unf:|i|0|i|𝐱i.U_{nf}:\left|i\right\rangle\left|0\right\rangle\longrightarrow\left|i\right\rangle\left|\left\|\mathbf{x}_{i}\right\|\right\rangle. (7)

is required, which could access the 22-norm of the vector 𝐱i\mathbf{x}_{i}. Inspired by Ref. [33], UnfU_{nf} can be implemented in time O(polylog(D)/ϵm)O\left({{\text{polylog}(D)}/\epsilon_{m}}\right) employing controlled rotation [34] and quantum phase estimation (QPE) [35]. The details are shown in appendix A. According to these assumptions, the processes of the quantum data preparation are described as follows.

(A1)(A1) In this step, the data information is extracted to the state |ϕ(𝐱i)\left|{\phi\left(\mathbf{x}_{i}\right)}\right\rangle. Firstly, three quantum registers are prepared in state |i1|0logD2|0q3|i\rangle_{1}|0^{\otimes{\log{D}}}\rangle_{2}|0^{\otimes{q}}\rangle_{3}, where the subscript numbers denote different registers. The qq is labeled as the qubits that are enough to store the information about the elements of data, i.e., 2q1>maxi,j|𝐱ij|2^{q}-1>{\max\limits_{i,j}|\mathbf{x}_{i}^{j}|}. After that, HlogDH^{\otimes{\log{D}}} is applied on the second register to generate a state

1Dj=0D1|i1|j2|0q3.\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}\left|i\right\rangle_{1}}|j\rangle_{2}|0^{\otimes q}\rangle_{3}. (8)

Secondly, the quantum oracle OXO_{X} is performed on the three registers. These registers are in a state

1Dj=0D1|i1|j2|𝐱ij3.\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}|i\rangle_{1}}|j\rangle_{2}|\mathbf{x}_{i}^{j}\rangle_{3}. (9)

Subsequently, a qubit in the state |0|0\rangle is added and rotated to 1(c1𝐱ij)2|0+c1𝐱ij|1\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}\left|0\right\rangle+c_{1}\mathbf{x}_{i}^{j}\left|1\right\rangle controlled on |𝐱ij|\mathbf{x}_{i}^{j}\rangle, where c1=1/maxi,j|𝐱ij|c_{1}={1/{\max\limits_{i,j}|\mathbf{x}_{i}^{j}|}}. The system becomes

1Dj=0D1|i1|j2|𝐱ij3[1(c1𝐱ij)2|0+c1𝐱ij|1]4.\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}|i\rangle_{1}}|j\rangle_{2}|\mathbf{x}_{i}^{j}\rangle_{3}\left[\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}|0\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle\right]_{4}. (10)

Finally, the inverse OXO_{X} operation is applied on the third register. The quantum state

|i|ϕ(𝐱i)=|i11Dj=0D1|j2[1(c1𝐱ij)2|0+c1𝐱ij|1]4,|i\rangle|{\phi\left(\mathbf{x}_{i}\right)}\rangle=|i\rangle_{1}\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}|j\rangle_{2}\left[\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}|0\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle\right]_{4}}, (11)

could be obtained via discarding the third register. The process is denoted as U𝐱iU_{\mathbf{x}_{i}}, which generates the state |ϕ(𝐱i)\left|{\phi\left(\mathbf{x}_{i}\right)}\right\rangle in time O(polylog(D)+q)O(\text{polylog}(D)+q).

(A2)(A2) In order to train the gradient, the parameter 𝐰(n)\mathbf{w}(n) should be introduced in the nnth iteration. Thus, it is necessary to generate a quantum state, which contains the information of 𝐰(n)\mathbf{w}(n). Depending on the fact that the parameter is different in each iteration, there are two methods to prepare the quantum state.

One way is based on the assumption that QRAM is allowed to read and write frequently. For the information of 𝐰j(n)\mathbf{w}^{j}(n) (j=0,1,,D1j=0,1,\cdots,D-1) are written in QRAM timely, the quantum state

1Dj=0D1|j[1(c2𝐰j(n))2|0+c2𝐰j(n)|1]\frac{1}{\sqrt{D}}\sum\limits_{j=0}^{D-1}|j\rangle\left[\sqrt{1-({c_{2}\mathbf{w}^{j}(n)})^{2}}\left|0\right\rangle+c_{2}\mathbf{w}^{j}(n)|1\rangle\right] (12)

can be produced by the processes similar to step (A1)(A1) with the help of the oracle O𝐰O_{\mathbf{w}} (O𝐰|j|0|j|𝐰j(n)O_{\mathbf{w}}\left|j\right\rangle\left|0\right\rangle\longrightarrow\left|j\right\rangle|{\mathbf{w}^{j}(n)}\rangle), where c2=1/𝐰(n)c_{2}={1/\left\|\mathbf{w}(n)\right\|} and 𝐰j(n)\mathbf{w}^{j}(n) is denoted as the jjth element of the parameter vector in the nnth iteration. This way can be implemented in time O(polylog(D)+q)O(\text{polylog}(D)+q).

For another, the parameter is extracted into the quantum state based on the operation R(ϑ)=cos(ϑ)|00|sin(ϑ)|01|+sin(ϑ)|10|+cos(ϑ)|11|R(\vartheta)=\cos{(\vartheta)}|0\rangle\langle 0|-\sin{(\vartheta)}|0\rangle\langle 1|+\sin{(\vartheta)}|1\rangle\langle 0|+\cos{(\vartheta)}|1\rangle\langle 1|, which is inspired by Ref. [36]. In this way, the 𝐰(n)\mathbf{w}(n) is not required to be written in QRAM. The following are described as the processes.

Assuming that is easy to get 2L12^{L}-1 (L=log(D)L=\log(D)) angle parameters ϑt=(ϑt0,,ϑt2t11)\bm{\vartheta}_{t}=(\vartheta_{t}^{0},\cdots,\vartheta_{t}^{2^{t-1}-1}) (t=1,2,,Lt=1,2,\cdots,L) from the updated 𝐰(n)\mathbf{w}(n) after the last iteration. The angle ϑtj\vartheta_{t}^{j} satisfies

cos(ϑtj)=ht2jht1j,sin(ϑtj)=ht2j+1ht1j,{\cos(\vartheta^{j}_{t})=\frac{h_{t}^{2j}}{h^{j}_{t-1}}},~{}~{}{\sin(\vartheta^{j}_{t})=\frac{h^{2j+1}_{t}}{h^{j}_{t-1}}}, (13)

for t=1,,Lt=1,\cdots,L, where ht1j=(ht2j)2+(ht2j+1)2h^{j}_{t-1}=\sqrt{(h^{2j}_{t})^{2}+(h^{2j+1}_{t})^{2}} and j=0,,2t11j=0,\cdots,2^{t-1}-1. In particular, hLj=𝐰j(n)h^{j}_{L}=\mathbf{w}^{j}(n) for j=0,1,,D1j=0,1,\cdots,D-1. And there are defined

U(ϑt)={j=02t11|jj|R(ϑtj)I{Lt},t=2,,LR(ϑtj)I{Lt},t=1,U(\bm{\vartheta}_{t})=\begin{cases}\sum\limits_{j=0}^{2^{t-1}-1}{|j\rangle\langle j|\otimes R({\vartheta}^{j}_{t})\otimes I\{{L-t}\}},&t=2,\cdots,L\\ R({\vartheta}^{j}_{t})\otimes I\{{L-t}\},&t=1\end{cases}, (14)

where I{Lt}I\{{L-t}\} is represented as the gate II applied on (Lt)(L-t) qubits.

After that, a quantum state

U(ϑL)U(ϑ2)U(ϑ1)|0logD=j=0D1c2𝐰j(n)|j,U(\bm{\vartheta}_{L})\cdots U(\bm{\vartheta}_{2})U(\bm{\vartheta}_{1})|0^{\otimes\log{D}}\rangle={\sum_{j=0}^{D-1}{c_{2}\mathbf{w}^{j}(n)\left|j\right\rangle}}, (15)

is generated in time O(D)O(D) by applying the operation U(ϑt)U(\bm{\vartheta}_{t}) for t=1,2,,Lt=1,2,\cdots,L. Furthermore, a register in state |1|1\rangle is appended. The overall system is in the state

|ϕ(𝐰(n))=j=0D1c2𝐰j(n)|j|1.\left|{\phi(\mathbf{w}(n))}\right\rangle=\sum_{j=0}^{D-1}{c_{2}\mathbf{w}^{j}(n)\left|j\right\rangle}|1\rangle. (16)

To further interpret this method, an example is given in the appendix B.

According to Eq. (16), the state in Eq. (12) can be rewritten as

1D[j=0D1|j1(c2𝐰j(n))2|0+|ϕ(𝐰(n))].\frac{1}{\sqrt{D}}\left[\sum\limits_{j=0}^{D-1}|j\rangle\sqrt{1-({c_{2}\mathbf{w}^{j}(n)})^{2}}|0\rangle+|{\phi(\mathbf{w}(n))}\rangle\right]. (17)

It means that the above two methods both allow us to extract the parameter 𝐰(n)\mathbf{w}(n) information into the quantum state |ϕ(𝐰(n))\left|{\phi(\mathbf{w}(n))}\right\rangle. On the basis of the current quantum technology, we choose the second method which is more feasible, and denote the process as U𝐰U_{\mathbf{w}}.

III-B Local training by quantum parallel computing (QGD algorithm)

Now, we propose a QGD algorithm. It enables clients to estimate the gradient of the model in parallel based on their respective local data. According to Eq. (2), with the help of the two operations U𝐱iU_{\mathbf{x}_{i}} and U𝐰U_{\mathbf{w}} of quantum data preparation, the process of the QGD algorithm is described as follows.

(B1)(B1) Generate an intermediate quantum state.

The task of computing the gradient involves an inner product computation, which needs O(MD)O(MD) in classical computers. In the era of big data, this time is costly. Here, we generate an intermediate state that contains the information of 𝐱i𝐰\mathbf{x}_{i}\cdot\mathbf{w}. This state facilitates subsequent parallel estimation.

(B1.1)(B1.1) A quantum state is initialized as

1Mi=0M1|i1|0logD2|0q3|04|05.\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}|i\rangle_{1}}|0^{\otimes{\log{D}}}\rangle_{2}|0^{\otimes q}\rangle_{3}|0\rangle_{4}|0\rangle_{5}. (18)

(B1.2)(B1.2) The Hadamard gate is performed on the fifth register. Then, a controlled operation |ii|U𝐱i|00|+IU𝐰|11|\left|i\right\rangle\left\langle i\right|\otimes U_{\mathbf{x}_{i}}\otimes\left|0\right\rangle\left\langle 0\right|+I\otimes U_{\mathbf{w}}\otimes\left|1\right\rangle\left\langle 1\right| is applied to produce a state

12Mi=0M1|i1[|ϕ(𝐱i)24|05+|ϕ(𝐰(n))24|15].\frac{1}{\sqrt{2M}}{\sum\limits_{i=0}^{M-1}|i\rangle_{1}}\left[{|{\phi(\mathbf{x}_{i})}\rangle_{24}\left|0\right\rangle_{5}+|{\phi(\mathbf{w}(n))}\rangle_{24}|1\rangle_{5}}\right]. (19)

(B1.3)(B1.3) Subsequently, the Hadamard gate is implemented on the fifth register to get

|ψ1=1Mi=0M1|i1|Ψi,n245,\left|\psi_{1}\right\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{\left|i\right\rangle_{1}\left|\Psi_{i,n}\right\rangle_{245}}}, (20)

where

|Ψi,n=12[(|ϕ(𝐱i)+|ϕ(𝐰(n)))|0+(|ϕ(𝐱i)|ϕ(𝐰(n)))|1].\begin{split}&|\Psi_{i,n}\rangle=\\ &\frac{1}{2}\left[(|{\phi(\mathbf{x}_{i})}\rangle+|{\phi(\mathbf{w}(n))}\rangle)|0\rangle+(|{\phi(\mathbf{x}_{i})}\rangle-|{\phi(\mathbf{w}(n))}\rangle)|1\rangle\right].\end{split} (21)

The state |Ψi,n|\Psi_{i,n}\rangle can be rewritten as

|Ψi,n=cosθi|ψi0+sinθi|ψi1=()245+12D(j=0D1c1𝐱ij|jj=0D1c2𝐰j(n)|j)2|1145,\begin{split}&|\Psi_{i,n}\rangle=\cos\theta_{i}|\psi_{i}^{0}\rangle+\sin\theta_{i}|\psi_{i}^{1}\rangle\\ &=(\cdots)_{245}^{\bot}+\frac{1}{2\sqrt{D}}({{\sum\limits_{j=0}^{D-1}{c_{1}\mathbf{x}_{i}^{j}}}|j\rangle-{\sum\limits_{j^{\prime}=0}^{D-1}{c_{2}^{\prime}\mathbf{w}^{j^{\prime}}(n)}}|j^{\prime}\rangle})_{2}|11\rangle_{45},\end{split} (22)

where θi[0,π2]\theta_{i}\in\left[0,\frac{\pi}{2}\right]. It is easy to verify that

sin2θi=c12𝐱i2+c22𝐰22c1c2(𝐱i𝐰)4D\sin^{2}\theta_{i}=\frac{c_{1}^{2}\|\mathbf{x}_{i}\|^{2}+c_{2}^{{\prime}2}\|\mathbf{w}\|^{2}-2c_{1}c_{2}^{\prime}(\mathbf{x}_{i}\cdot\mathbf{w})}{4D} (23)

and c2=Dc2c_{2}^{{\prime}}=\sqrt{D}{c_{2}}. By observing Eq. (22) and Eq. (23), it can be found that the essential information is provided by the system when its fourth and fifth registers are both in state |1|1\rangle. It means that the superposition of |0|0\rangle also does not affect the extraction of the required information if choosing the state of Eq. (17). Thus, the first method (in step (A2)(A2)) is also suitable for our algorithm, which c2=c2c_{2}^{{\prime}}=c_{2}.

(B2)(B2) Calculate the F(𝐱i𝐰)F(\mathbf{x}_{i}\cdot\mathbf{w}) in parallel.

The approximation of F(𝐱i𝐰)F(\mathbf{x}_{i}\cdot\mathbf{w}) should be estimated and stored in a quantum state. To achieve this goal, the θi\theta_{i} is needed to estimate via quantum phase estimation which the unitary operation is defined as

Qi=𝒜iS00𝒜iS11,Q_{i}=-\mathcal{A}_{i}S_{00}\mathcal{A}_{i}^{\dagger}S_{11}, (24)

where 𝒜i|0[log(D)+2]=|Ψi,n\mathcal{A}_{i}\left|{0}^{\otimes{[\log{(D)}+2]}}\right\rangle=\left|\Psi_{i,n}\right\rangle, S00=I[log(D)+2]2|0[log(D)+2]0[log(D)+2]|S_{00}=I^{\otimes{[\log{(D)}+2]}}-2{\left|0^{\otimes{[\log{(D)}+2]}}\right\rangle\left\langle 0^{\otimes{[\log{(D)}+2]}}\right|} and S11=Ilog(D)(I22|1212|)S_{11}=I^{\otimes{\log{(D)}}}\otimes\left(I^{\otimes{2}}-2|1^{\otimes{2}}\rangle\langle 1^{\otimes{2}}|\right). Mathematically, the eigenvalues of QiQ_{i} are e±2𝐢θie^{\pm 2\mathbf{i}\theta_{i}} (𝐢=1)(\mathbf{i}=\sqrt{-1}) and the corresponding eigenvectors are |Ψi,n±=12(|ψi0±𝐢|ψi1)\left|\Psi_{i,n}^{\pm}\right\rangle=\frac{1}{\sqrt{2}}\left({\left|\psi_{i}^{0}\right\rangle\pm\mathbf{i}\left|\psi_{i}^{1}\right\rangle}\right)), respectively. Based on the set of its eigenvectors, |Ψi,n\left|\Psi_{i,n}\right\rangle can be rewritten as |Ψi,n=𝐢2(e𝐢θi|Ψi,n+e𝐢θi|Ψi,n)\left|\Psi_{i,n}\right\rangle=-\frac{\mathbf{i}}{\sqrt{2}}\left({e^{\mathbf{i}\theta_{i}}\left|\Psi_{i,n}^{+}\right\rangle-e^{-\mathbf{i}\theta_{i}}\left|\Psi_{i,n}^{-}\right\rangle}\right). The procedure of estimating the F(𝐱i𝐰)F(\mathbf{x}_{i}\cdot\mathbf{w}) is displayed as follows.

(B2.1)(B2.1) Performing the QPE on QiQ_{i} with the state |ψ1|0l\left|\psi_{1}\right\rangle\left|0^{\otimes l}\right\rangle for some l=O(logϵθ1)l=O\left(\text{log}{\epsilon_{\theta}^{-1}}\right), an approximate state

|ψ2=𝐢2Mi=0M1|i1(e𝐢θi|Ψi,n+|θ~ie𝐢θi|Ψi,n|θ~i)2456\begin{split}&\left|\psi_{2}\right\rangle=\\ &\frac{-\mathbf{i}}{\sqrt{2M}}{\sum\limits_{i=0}^{M-1}\left|i\right\rangle_{1}}\left(e^{\mathbf{i}\theta_{i}}\left|\Psi_{i,n}^{+}\right\rangle\left|{\widetilde{\theta}}_{i}\right\rangle-e^{-\mathbf{i}\theta_{i}}\left|\Psi_{i,n}^{-}\right\rangle{\left|-{\widetilde{\theta}}_{i}\right\rangle}\right)_{2456}\end{split} (25)

is obtained, where θ~i2l{\widetilde{\theta}}_{i}\in\mathbb{Z}_{2^{l}} satisfies |θiθ~iπ/2l|ϵθ\left|{\theta_{i}-{\widetilde{\theta}}_{i}\pi/2^{l}}\right|\leq\epsilon_{\theta}. Then, the quantum state

|ψ3=1Mi=0M1|i1|Ψi,n245|sin2(θ~i)6\left|\left.\psi_{3}\right\rangle\right.=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{\left|i\right\rangle_{1}\left|\Psi_{i,n}\right\rangle_{245}|{{\sin}^{2}({\widetilde{\theta}}_{i})}\rangle_{6}}} (26)

is generated by using the sine gate. It holds for the fact that sin2(θ~i)=sin2(θ~i){{\sin}^{2}({\widetilde{\theta}}_{i})}={{\sin}^{2}(-{\widetilde{\theta}}_{i})}.

(B2.2)(B2.2) According to Eq. (23), it is needed to access 𝐱i\left\|\mathbf{x}_{i}\right\| to compute 𝐱i𝐰\mathbf{x}_{i}\cdot\mathbf{w}. Combining with the operation UnfU_{nf} and the quantum arithmetic operations [37], we can get

|ψ4=1Mi=0M1|i1|Ψi,n245|𝐱i𝐰6|𝐱i7|𝐰8.\left|\psi_{4}\right\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}\left|i\right\rangle_{1}}\left|\Psi_{i,n}\right\rangle_{245}\left|{\mathbf{x}_{i}\cdot\mathbf{w}}\right\rangle_{6}\left|\left\|\mathbf{x}_{i}\right\|\right\rangle_{7}\left|\left\|\mathbf{w}\right\|\right\rangle_{8}. (27)

(B2.3)(B2.3) An oracle OFO_{F} is supposed to achieve any function which has a convergent Taylor series [34]. Combining with OyO_{y}, the function F()F(*) could be implemented (a simple example is described in Sec. V ). The state becomes

|ψ5=1Mi=1M|i1|Ψi,n245|F(𝐱i𝐰)6|𝐱i7|𝐰8.\left|\psi_{5}\right\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=1}^{M}\left|i\right\rangle_{1}}\left|\Psi_{i,n}\right\rangle_{245}\left|F({\mathbf{x}_{i}\cdot\mathbf{w}})\right\rangle_{6}\left|\left\|\mathbf{x}_{i}\right\|\right\rangle_{7}\left|\left\|\mathbf{w}\right\|\right\rangle_{8}. (28)

(B2.4)(B2.4) Next, a register in the state |0|0\rangle is appended as the last register and rotated it to |ϕi=c3F(𝐱i𝐰)|0+1(c3F(𝐱i𝐰))2|1\left|\phi_{i}\right\rangle=c_{3}F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)\left|0\right\rangle+\sqrt{1-\left(c_{3}F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)\right)^{2}}\left|1\right\rangle in a controlled manner, where c3=1/maxi|F(𝐱i𝐰)|c_{3}={1/{\max\limits_{i}\left|{F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)}\right|}}. This results in the overall state

|ψ6=1Mi=0M1|i1|Ψi,n245|F(𝐱i𝐰)6|𝐱i7|𝐰8|ϕi9.\begin{split}&\left|\psi_{6}\right\rangle=\\ &\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}\left|i\right\rangle_{1}}\left|\Psi_{i,n}\right\rangle_{245}\left|F({\mathbf{x}_{i}\cdot\mathbf{w}})\right\rangle_{6}\left|\left\|\mathbf{x}_{i}\right\|\right\rangle_{7}\left|\left\|\mathbf{w}\right\|\right\rangle_{8}{\left|\phi_{i}\right\rangle_{9}}.\end{split} (29)

(B2.5)(B2.5) The inverse operations of steps (B1.2)(B2.3)(B1.2)-(B2.3) are performed on |ψ6\left|\psi_{6}\right\rangle. Afterwards, a register in the state |1|1\rangle is added to obtain

|ψ=1Mi=0M1|i1|ϕi9|110.\left|\psi\right\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{\left|i\right\rangle_{1}\left|\phi_{i}\right\rangle_{9}}}\left|1\right\rangle_{10}. (30)

For convenience, the 𝒜ψ\mathcal{A}_{\psi} is marked as the operations which achieve 𝒜ψ|000=|ψ\mathcal{A}_{\psi}|00\cdots 0\rangle=|\psi\rangle. Its schematic quantum circuit is given in Fig. 3.

Refer to caption
Figure 3: Quantum circuit diagram of 𝒜ψ\mathcal{A}_{\psi}. Here, qq is the number of qubits required to adequately store the data information and ϵθ\epsilon_{\theta} is the tolerance error for estimating θi\theta_{i}. Furthermore, QAOQAO denotes the quantum arithmetic operations, CR3CR_{3} represents the controlled rotation operation about |ϕi|\phi_{i}\rangle and UU^{\dagger} labels the inverse operations in the step (B2.5)(B2.5).

(B3)(B3) Estimate the gradient 𝐠j(𝐰(n))\bm{g}^{j}\left({\mathbf{w}(n)}\right) with swap test.

(B3.1)(B3.1) Three registers in state 1Mi=0M1|i1|j2|03\frac{1}{\sqrt{M}}{\sum_{i=0}^{M-1}{\left|i\right\rangle_{1}\left|j\right\rangle_{2}}}\left|0\right\rangle_{3} are prepared. Performing OXO_{X} on it to generate the state

1Mi=0M1|i1|j2|𝐱ij3.\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{|i\rangle_{1}|j\rangle_{2}}}|\mathbf{x}_{i}^{j}\rangle_{3}. (31)

(B3.2)(B3.2) The controlled rotation operation (|01(c1𝐱ij)2|0+c1𝐱ij|1|0\rangle\rightarrow\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}\left|0\right\rangle+c_{1}\mathbf{x}_{i}^{j}\left|1\right\rangle) is implemented to get

1Mi=0M1|i1|j2|𝐱ij3[1(c1𝐱ij)2|0+c1𝐱ij|1]4.\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{|i\rangle_{1}\left|j\right\rangle_{2}}}|\mathbf{x}_{i}^{j}\rangle_{3}\left[{\sqrt{1-({c_{1}{\mathbf{x}}_{i}^{j}})^{2}}\left|0\right\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle}\right]_{4}. (32)

(B3.3)(B3.3) The inverse operation of OXO_{X} is performed. After that, we can obtain the state

|χj=1Mi=0M1|i1|03[1(c1𝐱ij)2|0+c1𝐱ij|1]4,|\chi^{j}\rangle=\frac{1}{\sqrt{M}}{\sum\limits_{i=0}^{M-1}{|i\rangle_{1}|0\rangle_{3}}}\left[{\sqrt{1-({c_{1}\mathbf{x}_{i}^{j}})^{2}}|0\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle}\right]_{4}, (33)

via undoing the register |j|j\rangle.

(B3.4)(B3.4) In order to obtain the gradient, the technology of swap test [38] is utilized. Combining the processes of generating the states |ψ\left|\psi\right\rangle and |χj\left|\chi^{j}\right\rangle, a quantum state 12(|0|ψ+|1|χj)\frac{1}{\sqrt{2}}\left(\left|0\right\rangle\left|\psi\right\rangle+\left|1\right\rangle\left|\chi^{j}\right\rangle\right) can be constructed. Then, measuring the first register to see whether it is in the state |+=12(|0+|1)\left|+\right\rangle=\frac{1}{\sqrt{2}}\left(\left|0\right\rangle+\left|1\right\rangle\right). The measurement has the success probability

P=12+12ψ|χj.P=\frac{1}{2}+\frac{1}{2}\left\langle\psi\middle|\chi^{j}\right\rangle. (34)

According to Eq. (2), the 𝒈j(𝐰(n))=(2P1)/(c1c3)\bm{g}^{j}\left({\mathbf{w}(n)}\right)={(2P-1)}/{({c_{1}c_{3}})} can be calculated. Hence, it is possible for Bobk{\rm Bob}_{k} to obtain the local gradient 𝒈k(𝐰)=(𝒈k0(𝐰(n)),𝒈k1(𝐰(n)),,𝒈kD1(𝐰(n)))T\bm{g}_{k}\left(\mathbf{w}\right)=\left({\bm{g}_{k}^{0}\left({\mathbf{w}(n)}\right),\bm{g}_{k}^{1}\left({\mathbf{w}(n)}\right),\cdots,\bm{g}_{k}^{D-1}\left({\mathbf{w}(n)}\right)}\right)^{T} by repeating the steps of the above algorithm with his data.

III-C Model aggregation with QSMC protocol and update

We will design a protocol to safely compute the federated gradients 𝑮(𝐰)=k=1Kβk𝒈k(𝐰)\bm{G}\left(\mathbf{w}\right)={\sum_{k=1}^{K}{\beta_{k}}{\bm{g}_{k}\left(\mathbf{w}\right)}} in this section. That is, calculating 𝑮(𝐰)\bm{G}\left(\mathbf{w}\right) without revealing the local gradient 𝒈k(𝐰){\bm{g}_{k}\left(\mathbf{w}\right)}. To do it, the server Alice is assumed to be semi-honest who may misbehave on her own but cannot conspire with others. Moreover, the federated gradients are needed to be accurate to γ1{\gamma}^{-1}. This means that γβk𝒈kj(𝐰)0\gamma{\beta_{k}}{\bm{g}_{k}^{j}(\mathbf{w})}\geq 0. Simply, the γβk𝒈kj(𝐰)\gamma{\beta_{k}}{\bm{g}_{k}^{j}(\mathbf{w})} is marked as μkj\mu_{k}^{j}. And supposing that k=1Kμkj<S{\sum_{k=1}^{K}\mu_{k}^{j}}<S. The further details are described as follows.

(C1)(C1) Preparation for multi-party quantum communication.

Alice announces γ{\gamma} and the global dataset scale (k=1KMk)(\sum_{k=1}^{K}{M_{k}}). At same time, the participants (server and clients) choose mm numbers did_{i} (i=1,2,,mi=1,2,\cdots,m) which are mutually prime and satisfy d1×d2××dm=Sd_{1}\times d_{2}\times\cdots\times d_{m}=S. Subsequently, Bobk{\rm Bob}_{k} (k=1,2,,K)(k=1,2,\cdots,K) calculates his secret

sk,ij=μkjmoddi.s_{k,i}^{j}=\mu_{k}^{j}~{}\text{mod}~{}d_{i}. (35)

Alice produces a did_{i}-level (K+1)(K+1)-particle GHZ state

|Ψ=1diq=0di1|q(K+1),\left|\Psi\right\rangle=\frac{1}{\sqrt{d_{i}}}{\sum\limits_{q=0}^{d_{i}-1}|q\rangle^{\bigotimes(K+1)}}, (36)

and marks the (K+1)(K+1) particles by Q0,Q1,,QKQ_{0},Q_{1},\cdots,Q_{K}.

(C2)(C2) Distribution of quantum pairs.

For the sake of checking the presence of eavesdroppers, Alice prepares KK sets of δ\delta decoy states, where each decoy photon randomly is in one of the states from the set V1={|p}p=0di1V_{1}=\left\{\left|p\right\rangle\right\}_{p=0}^{d_{i}-1} and V2={QFT|p}p=0di1V_{2}=\left\{QFT\left|p\right\rangle\right\}_{p=0}^{d_{i}-1}, where QFTQFT is represented as the quantum Fourier transform [39]. These sets are denoted as E1,E2,,EKE_{1},E_{2},\cdots,E_{K}, respectively. Then Alice inserts QkQ_{k} into EkE_{k} at a random position, and sends them to Bobk{\rm Bob}_{k} for k=1,2,,Kk=1,2,\cdots,K.

(C3)(C3) Security checking of quantum channel.

After receiving δ+1\delta+1 particles, Bobk{\rm Bob}_{k} sends acknowledgements to Alice. Subsequently, the positions and the bases of the decoy photons are announced to Bobk{\rm Bob}_{k} by Alice. Bobk{\rm Bob}_{k} measures the decoy photons and returns the measurement results to Alice who then calculates the error rate by comparing the measurement results with initial states. If the error rate is higher than the threshold determined by the channel noise, Alice cancels this protocol and restarts it. Otherwise, the protocol is continued.

(C4)(C4) Measurement of particles and encoding of transmission information.

Bobk{\rm Bob}_{k} extracts all the decoy photons and discards them. Then, server and clients perform a measurement {QFT|p}p=0di1\left\{QFT\left|p\right\rangle\right\}_{p=0}^{d_{i}-1} on the remaining particles, respectively. The measurement results record as os,i,o1,i,,oK,io_{s,i},o_{1,i},\cdots,o_{K,i} and these satisfy os,i+o1,i++oK,i=0moddio_{s,i}+o_{1,i}+\cdots+o_{K,i}=0~{}\text{mod}~{}d_{i}. Subsequently, Bobk{\rm Bob}_{k} encodes his data sk,ij=sk,ij+ok,is_{k,i}^{\prime j}=s_{k,i}^{j}+o_{k,i} and sends it to Alice.

(C5)(C5) Computation of federated gradient by server.

At this stage, Alice accumulates all the results sk,ijs_{k,i}^{\prime j} to compute

(os+o1,i+s1,ij+o2,i+s2,ij+oK,i+sK,ij)moddi=(μ1j+μ2j++μKj)moddi.\begin{split}&\left({o_{s}+o_{1,i}+s_{1,i}^{j}+o_{2,i}+s_{2,i}^{j}\cdots+o_{K,i}+s_{K,i}^{j}}\right)~{}\text{mod}~{}d_{i}\\ =&\left({\mu_{1}^{j}}+{\mu_{2}^{j}}+\cdots+{\mu_{K}^{j}}\right)~{}\text{mod}~{}d_{i}.\end{split} (37)

For i=1,2,,mi=1,2,\cdots,m, Alice can obtain mm equations such as Eq. (37). According to the Chinese remainder theorem, Alice compute the summation

(k=1Kμkj)modS=k=1Kμkj.\left({\sum\limits_{k=1}^{K}{\mu_{k}^{j}}}\right)~{}\text{mod}~{}S=~{}{\sum\limits_{k=1}^{K}{\mu_{k}^{j}}}. (38)

And it is easy to get the federated gradient

𝑮j(𝐰)=1γk=1Kμkj=k=1Kβk𝒈kj(𝐰).\bm{G}^{j}\left(\mathbf{w}\right)=\frac{1}{\gamma}\sum\limits_{k=1}^{K}{\mu_{k}^{j}}={\sum\limits_{k=1}^{K}{\beta_{k}}{\bm{g}_{k}^{j}(\mathbf{w})}}. (39)

After the similar processes, the federated gradient (𝑮0(𝐰),𝑮1(𝐰),,𝑮D1(𝐰))(\bm{G}^{0}(\mathbf{w}),\bm{G}^{1}(\mathbf{w}),\cdots,\bm{G}^{D-1}(\mathbf{w})) could be obtained by Alice. And she updates the global model parameters 𝐰(n+1)=𝐰(n)α×𝑮(𝐰(n))\mathbf{w}({n+1})=\mathbf{w}(n)-\alpha\times\bm{G}\left({\mathbf{w}(n)}\right). In order to exhibit the process of model aggregation more clearly, a concrete example is presented in the appendix C.

III-D Model evaluation and distribution via classical communication networks

The server (Alice) needs to evaluate whether the model should be further optimized after one round of training in QFLGD. Similar to classical FL, Alice utilizes the smoothness of the gradient to evaluate the model performance. Specifically, the server sends a termination training signal and announces the global parameters when j=0D1[𝑮j(𝐰(n))]2ε{\sum\limits_{j=0}^{D-1}\left[{\bm{G}^{j}\left(\mathbf{w}(n)\right)}\right]^{2}}\leq\varepsilon. Otherwise, she distributes the updated model parameters to clients for new training.

IV Analysis

In this section, we provide a brief analysis of the proposed framework. As discussed previously, the QGD algorithm (shown in Sec. III-B) enables clients to accelerate the training gradients on a local quantum computer. The QSMC protocol (shown in Sec. III-C) gives a method to securely update the federated parameters to protect the privacy of clients’ data. Therefore, two main aspects are considered in the analysis. One is the time complexity of local training (the QGD algorithm). The other is the security of model aggregation (the QSMC protocol).

IV-A Time complexity of local training (the QGD algorithm)

In the QFLGD framework, assuming that M1M2MKMM_{1}\leq M_{2}\leq\cdots\leq M_{K}\leq M. Namely, the dataset scale is at most MM. And all clients need to accomplish the gradient training before calculating the federated gradient. Thus the waiting time for the distributed training gradient is the time consumed to train the dataset which scale is MM. In the following, the time complexity of the QGD algorithm is analyzed with the MM-scale dataset.

In the data preparation period (the Sec. III-A), the time consumption is caused by the processes of U𝐱iU_{\mathbf{x}_{i}} and U𝐰U_{\mathbf{w}}, which generate the states |ϕ(𝐱i)\left|{\phi(\mathbf{x}_{i})}\right\rangle and |ϕ(𝐰(n))\left|{\phi(\mathbf{w}(n))}\right\rangle about data information. It could be implemented in time O(polylog(MD)+D+q)O(\text{polylog}(MD)+D+q) with the help of the OXO_{X}, U(ϑt)U(\bm{\vartheta}_{t}) and the controlled rotation operation [14, 30]. The qq is represented as the number of qubits which store the data information. Afterwards, U𝐱iU_{\mathbf{x}_{i}} and U𝐰U_{\mathbf{w}} are applied to produce the state |ψ1\left|\psi_{1}\right\rangle in step (B1)(B1) of local training (the Sec. III-B). Hence, step (B1)(B1) can be implemented in time O(polylog(MD)+D+q)O(\text{polylog}(MD)+D+q).

In step (B2)(B2), we first consider the complexity of the unitary operation 𝒜i\mathcal{A}_{i}. It contains HH, U𝐱iU_{\mathbf{x}_{i}}, and U𝐰U_{\mathbf{w}} which take time O(polylog(MD)+D+q)O(\text{polylog}(MD)+D+q). Then, the QPE block needs O(1/ϵθ)O\left({1/\epsilon_{\theta}}\right) applications of QiQ_{i} to estimate the θi\theta_{i} within error ϵθ\epsilon_{\theta} [35]. Therefore, the time complexity of step (B2.1)(B2.1) is O[(polylog(MD)+D+q)/ϵθ]O[(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}]. The runtime O(log(1/ϵθ))O(\text{log}({1/\epsilon_{\theta}})) [37] of implementing the sine gate can be ignored, which is much smaller than the QPE.

Next, the time complexity of step (B2.2)(B2.2) and step (B2.3)(B2.3) are discussed. The main operation of the two steps includes UnfU_{nf}, OyO_{y}, and the quantum arithmetic operation, which are performed to calculate |F(𝐱𝐰)|F({\mathbf{x}}\cdot{\mathbf{w}})\rangle in time O[(polylog(D))/ϵm+polylog(M)+q]O[{(\text{polylog}(D))/{\epsilon_{m}}}+\text{polylog}(M)+q]. In step (B2.4)(B2.4), the time complexity of the controlled rotation is O(q)O(q). Step (B2.5)(B2.5) takes time O[(polylog(D))/ϵm+(polylog(MD)+D+q)/ϵθ]O[{(\text{polylog}(D))/{\epsilon_{m}}}+{(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}}] to implement the inverse operations of steps (B1.2)(B1.2)-(B2.3)(B2.3). Putting all the steps together to get the time complexity of step (B2)(B2) as O[(polylog(D))/ϵm+(polylog(MD)+D+q)/ϵθ]O[{(\text{polylog}(D))/{\epsilon_{m}}}+{(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}}].

In step (B3)(B3), the processes of generating the |χj|\chi^{j}\rangle (described in steps (B3.1)(B3.1)-(B3.3)(B3.3)) are accomplished in time O(polylog(MD)+q)O(\text{polylog}(MD)+q). According to step (B2)(B2), a copy of the quantum state |ψ|\psi\rangle is produced in time O[(polylog(D))/ϵm+(polylog(MD)+D+q)/ϵθ]O[{(\text{polylog}(D))/{\epsilon_{m}}}+{(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}}]. The swap test is applied O(P(1P)/ϵP2)=O(1/ϵP2)O\left({{P\left({1-P}\right)}/\epsilon_{P}^{2}}\right)=O\left({1/\epsilon_{P}^{2}}\right) times to get the result PP within error ϵP\epsilon_{P} in step (B3.4)(B3.4) [40]. And each swap test should prepare a copy of |χj|\chi^{j}\rangle and |ψ|\psi\rangle. Therefore, the runtime is {[(polylog(D))/ϵm+(polylog(MD)+D+q)/ϵθ]ϵP2}\left\{[{(\text{polylog}(D))/{\epsilon_{m}}}+{(\text{polylog}(MD)+D+q)/{\epsilon_{\theta}}}]\epsilon_{P}^{-2}\right\} in step (B3)(B3), that is the complexity of obtaining the desired result.

For convenience, we assume that 𝐰j\mathbf{w}^{j}, 𝐱ij=O(1)\mathbf{x}_{i}^{j}=O(1), then 𝐰\|\mathbf{w}\|, maxi𝐱i=O(D){\max\limits_{i}\left\|\mathbf{x}_{i}\right\|}=O(\sqrt{D}). Therefore, q=polylog(D)q={\text{polylog}(D)} could fulfill the number of qubit required to store data information. In addition, taking ϵm\epsilon_{m}, ϵθ\epsilon_{\theta}, and ϵP\epsilon_{P} equaling to ϵ\epsilon. After that, the complexity of the entire quantum algorithm to get 𝒈j(𝐰)\bm{g}^{j}\left(\mathbf{w}\right) (j=0,1,,D1)(j=0,1,\cdots,D-1) in each iteration can further simplify into

O{D[(polylog(MD)+D)/ϵ3]}.O\left\{{D\left[{\left(\text{polylog}\left({MD}\right)+D\right)/\epsilon^{3}}\right]}\right\}. (40)

This means that the time complexity of training gradient is O(D2polylog(MD))O(D^{2}\text{polylog}(MD)) when ϵ1=log(MD)\epsilon^{-1}=\text{log}(MD), achieving exponential acceleration on the number of data samples. Furthermore, the elements of 𝐰\mathbf{w} can also be accessed in time O(polylog(D))O(\text{polylog}(D)) if they are timely writing in QRAM. In this case, the proposed algorithm has exponential acceleration on the number MM and the quadratic speedup in the dimensionality DD, compared with the classical algorithm whose runtime is O(MD2)O(MD^{2}).

IV-B Security analysis of model aggregation (the QSMC protocol)

In this section, the security of model aggregation (the QSMC protocol) will be analyzed. For the secure multi-party computing, the attacks from outside and all participants are the challenges, which have to deal with. In the following, we will show these attacks are invalid to our protocol.

Firstly, the outside attacks are discussed. In this protocol, the decoy photons is used to prevent the eavesdropping. This idea is derived from the BB84 protocol [4], which has been proved unconditionally safe. Here, we take the intercept-resend attack as an example to demonstrate. If an outside eavesdropper Eve attempts to intercept the particles sent from Alice and replaces them with his own fake particles, he will introduce extra error rate 1(di+12di)δ1-\left(\frac{d_{i}+1}{2d_{i}}\right)^{\delta}. Therefore, Eve will be detected in step (C3)(C3) through security checking analysis.

Secondly, the participant attacks are analyzed. In the proposed protocol, the participants include the server (Alice) and clients (Bobk{\rm Bob}_{k}, k=1,2,,Kk=1,2,\cdots,K) who can access more information. Therefore, the participant attacks from dishonest clients or server should be considered.

For the participant attack from dishonest clients, only the extreme case of K1K-1 clients Bob1,,Bobk1,Bobk+1,,BobK{\rm Bob}_{1},\cdots,{\rm Bob}_{k-1},{\rm Bob}_{k+1},\cdots,{\rm Bob}_{K} colluding to steal the secret from Bobk{\rm Bob}_{k} is considered here, because K1K-1 clients have the most powerful strength. In this case, even if the dishonest clients share their information, they cannot deduce oko_{k} without the help of Alice. That means they cannot obtain the secret sk,ijs_{k,i}^{j} by sk,ij=sk,ij+oks_{k,i}^{\prime j}=s_{k,i}^{j}+o_{k}. Thus, our algorithm can resist the collusion attack of dishonest Bobk{\rm Bob}_{k}.

For the attack from Alice, the semi-honest Alice may steal the private information of Bobk{\rm Bob}_{k} without conspiring with any one. In step (C4), Alice collects sk,ijs_{k,i}^{\prime j} for k=1,2,,Kk=1,2,\cdots,K. However, she still cannot learn sk,ijs_{k,i}^{j} due to the lack of knowledge about oko_{k} which from clients.

V Application: Training the Federated Linear Regression Model

V-A Quantum federated linear regression algorithm

Linear regression (LR) is an important supervised learning algorithm, which establishes a model of the relationship between the variable 𝐱i\mathbf{x}_{i} and the observation yiy_{i}. It has wide application in the scientific fields of biology, finance, and so on [41]. LR models are also usually fitted by minimizing the function in Eq. (1) and choosing f(𝐱i𝐰)=𝐱i𝐰+b{f\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)}={\mathbf{x}_{i}\cdot\mathbf{w}}+{b} (bb is a migration parameter).

In this section, we apply the QFLGD framework to train the LR model. In the training process, we need to implement the function

F(𝐱i𝐰)=𝐱i𝐰+byi.F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right)=\mathbf{x}_{i}\cdot\mathbf{w}+b-y_{i}. (41)

The state |𝐱i𝐰|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle about 𝐱i𝐰{\mathbf{x}_{i}\cdot\mathbf{w}} can be generated according to the QGD. Then, the state |𝐱i𝐰+byi|\mathbf{x}_{i}\cdot\mathbf{w}+b-y_{i}\rangle is produced in the following steps.

(S1) The oracle OyO_{y} is applied on the state |𝐱i𝐰A|0Bq{|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle}_{A}|0\rangle_{B}^{\otimes{q}} to get

|𝐱i𝐰A|yiBq,{|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle}_{A}|y_{i}\rangle_{B}^{\otimes{q}}, (42)

in time O(log(M))O(\text{log}(M)).

(S2) After obtaining |𝐱i𝐰=|eq,eq1,,e1{|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle}={|e_{q},e_{q-1},\cdots,e_{1}\rangle} and |yi=|tq,tq1,,t1{|y_{i}\rangle}={|t_{q},t_{q-1},\cdots,t_{1}\rangle}, we implement the QFTQFT on |𝐱i𝐰A|{\mathbf{x}_{i}\cdot\mathbf{w}}\rangle_{A} to result in

[|ϕ1(e)|ϕ2(e)|ϕq(e)]A|yiB,\left[{\left|{\phi_{1}(e)}\right\rangle\otimes\left|{\phi_{2}(e)}\right\rangle\otimes\cdots\otimes\left|{\phi_{q}(e)}\right\rangle}\right]_{A}\left|y_{i}\right\rangle_{B}, (43)

where |ϕj(e)=12(|0+e2π𝐢0.ejej1e1|1)\left|{\phi_{j}(e)}\right\rangle=\frac{1}{\sqrt{2}}\left(\left|0\right\rangle+e^{2\pi\mathbf{i}0.e_{j}e_{j-1}\cdots e_{1}}\left|1\right\rangle\right) for j=1,2,,qj=1,2,\cdots,q.

(S3) Subsequently, the controlled rotation operation I|00|+Rj|11|{I\otimes|0\rangle\langle 0|}+{R_{j^{\prime}}\otimes|1\rangle\langle 1|} (j=1,,jj^{\prime}=1,\cdots,{j}) are performed on |ϕj(e)\left|{\phi_{j}(e)}\right\rangle and |tqj+1|t_{q-j+1}\rangle (j=1,2,,qj=1,2,\cdots,q), we can get

[|ϕ1(et)|ϕ2(et)|ϕq(et)]A|yiB,\left[{\left|{\phi_{1}(e-t)}\right\rangle\otimes\left|{\phi_{2}(e-t)}\right\rangle\otimes\cdots\otimes\left|{\phi_{q}(e-t)}\right\rangle}\right]_{A}\left|y_{i}\right\rangle_{B}, (44)

where the RjR_{j^{\prime}} is defined as |00|+e2π𝐢/2j|11||0\rangle\langle 0|+e^{{-2\pi\mathbf{i}}/2^{j^{\prime}}}|1\rangle\langle 1| and |ϕj(ej)=12(|0+e2π𝐢(0.eje10.tjt1)|1)\left|{\phi_{j}(e-j)}\right\rangle=\frac{1}{\sqrt{2}}\left(|0\rangle+e^{2\pi\mathbf{i}(0.e_{j}\cdots e_{1}-0.t_{j}\cdots t_{1})}|1\rangle\right).

(S4) Inverse QFTQFT is applied on the register AA, the state becomes

[|eqtq|eq1tq1|e1t1]A|yiB.\left[{\left|{e_{q}-t_{q}}\right\rangle\otimes\left|{e_{q-1}-t_{q-1}}\right\rangle\otimes\cdots\otimes\left|{e_{1}-t_{1}}\right\rangle}\right]_{A}\left|y_{i}\right\rangle_{B}. (45)

Thus, the state |𝐱i𝐰yi|\mathbf{x}_{i}\cdot\mathbf{w}-y_{i}\rangle can be obtained from the register A. Similarly, we can implement addition by changing the operation of step (S3) to I|00|+Rj|11|{I\otimes|0\rangle\langle 0|}+{R_{j^{\prime}}^{\dagger}\otimes|1\rangle\langle 1|}. Thus, the state |𝐱i𝐰yi+b|\mathbf{x}_{i}\cdot\mathbf{w}-y_{i}+b\rangle could be obtained and its quantum circuit is presented in Fig. 4. The operations of these processes are labeled as UsU_{s}, which are implemented in time O(log(M)+q2)O(\text{log}(M)+q^{2}). Combining with the QFLGD framework, the quantum federated linear regression (QFLR) model could be constructed by algorithm 1.

Input: The variate 𝐗M×D\mathbf{X}\in\mathbb{R}^{M\times D}, the observed vector 𝐲M\mathbf{y}\in\mathbb{R}^{M}, the initial parameter 𝐰(0)D\mathbf{w}(0)\in\mathbb{R}^{D}, the migration parameter bb, the learning rate α\alpha and the preset values cic_{i} (i=1,2,3)(i=1,2,3);
Output: The parameter 𝐰\mathbf{w} and the model y=𝐰T𝐱i+by=\mathbf{w}^{T}\mathbf{x}_{i}+b;
for j=0D1(𝐆j(𝐰(n1)))2{\sum\limits_{j=0}^{D-1}\left({\bm{G}^{j}\left({\mathbf{w}(n-1)}\right)}\right)^{2}} >> ε\varepsilon do
         Step 1: KK clients prepare quantum states with their sensitive data according to the methods in Sec. III-A;
         Step 2: The clients apply the QGD Algorithm (in Sec. III-B) and UsU_{s} to local training 𝒈k(𝐰(n))\bm{g}_{k}\left({\mathbf{w}(n)}\right) (k=1,2,,Kk=1,2,\cdots,K) respectively;
         Step 3: The clients and the server use the QSMC protocol (in Sec. III-C) to secure calculate the federated gradient 𝑮(𝐰(n))=(𝑮0(𝐰(n)),𝑮1(𝐰(n)),,𝑮D1(𝐰(n)))\bm{G}\left(\mathbf{w}(n)\right)=\left(\bm{G}^{0}\left({\mathbf{w}(n)}\right),\bm{G}^{1}\left({\mathbf{w}(n)}\right),\cdots,\bm{G}^{D-1}\left({\mathbf{w}(n)}\right)\right), and the server updates global model parameter 𝐰(n+1)=𝐰(n)α×𝑮(𝐰(n))\mathbf{w}({n+1})=\mathbf{w}(n)-\alpha\times\bm{G}\left({\mathbf{w}(n)}\right);
         Step 4: The server evaluates model performance and sends the updated model parameter to KK clients;
      
end for
Algorithm 1 Quantum federated LR algorithm
Refer to caption
Figure 4: Quantum circuit of the quantum computing |𝐱i𝐰yi+b|\mathbf{x}_{i}\cdot\mathbf{w}-y_{i}+b\rangle. The left side of the dotted line is the circuit of subtraction, and the right side is the circuit of addition. It is worth noting that the QFTQFT=IQFT^{\dagger}QFT=I, so we omitted them left and right of the dotted line.

V-B Numerical Simulation

In this section, the numerical simulation of the QFLR algorithm will be presented. In our simulation, two clients (Bob1{\rm Bob}_{1}, Bob2{\rm Bob}_{2}) trained the QFLR model with a sever (Alice). The experiment is implemented on the IBM Qiskit simulator. The initial weight 𝐰(0)=(0.866,0.5)\mathbf{w}(0)=(0.866,0.5) and the migration parameter b=0b=0 are selected by Alice. Bob1{\rm Bob}_{1} chooses an input vector 𝐱=(2,3.464)\mathbf{x}=(2,3.464) which corresponds the observation y=2.464y=2.464. Another client Bob2{\rm Bob}_{2} selects an input vector 𝐱=(2.5,4.33)\mathbf{x}^{\prime}=(2.5,4.33) and the corresponding observation y=2.33y^{\prime}=2.33.

In the process of training the federated linear regression model, the main is to achieve the perfect F(𝐱i𝐰)F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right) calculation. That is, quantum computing F(𝐱i𝐰)F\left({\mathbf{x}_{i}\cdot\mathbf{w}}\right) values are required to be able to be stored in quantum registers with small error. An experiment of this step is presented with the data of Bob1{\rm Bob}_{1}. For convenience, setting c1=1/4c_{1}=1/4, c2=1c_{2}=1, and the error ϵθ=0.0001\epsilon_{\theta}=0.0001 of quantum phase estimation. By substituting these into Eq. (41), the result F(𝐱𝐰)=1F\left({\mathbf{x}\cdot\mathbf{w}}\right)=1 could be obtained. It can also be computed by

F(𝐱𝐰)=416sin2(θ~π/24)+02.464,\begin{split}F\left({\mathbf{x}\cdot\mathbf{w}}\right)=4-16{\sin}^{2}\left({\widetilde{\theta}\pi/2^{4}}\right)+0-2.464,\end{split} (46)

according to Eq. (23).

With the fact of sin2(θ~π/24)θ~2/26{\sin}^{2}({\widetilde{\theta}\pi/2^{4}})\approx\widetilde{\theta}^{2}/26 and the most probable result |0001|0001\rangle (see Fig. 6(a)) from the QPE, Eq. (46) can be rewritten as

6.5×F(𝐱𝐰)264θ~16.6.5\times F\left({\mathbf{x}\cdot\mathbf{w}}\right)\approx 26-4\widetilde{\theta}-16. (47)

Its circuit is designed as exhibited and encoded via Qiskit (see Fig. 5). In Fig. 5(e), the matrix form of U(γ,ϕ,λ)U(\gamma,\phi,\lambda) is

U(γ,ϕ,λ)=[cos(γ/2)e𝐢λsin(γ/2)e𝐢ϕsin(γ/2)e𝐢(ϕ+λ)cos(γ/2)].U\left({\gamma,\phi,\lambda}\right)=\begin{bmatrix}{\cos\left({\gamma/2}\right)}&{-e^{\mathbf{i}\lambda}\sin\left({\gamma/2}\right)}\\ {e^{\mathbf{i}\phi}\sin\left({\gamma/2}\right)}&{e^{\mathbf{i}(\phi+\lambda)}\cos\left({\gamma/2}\right)}\\ \end{bmatrix}. (48)

With the help of the IBM’s simulator (aer¯\underline{~{}}simulator), the measurement results can be obtained which are shown in Fig. 6(b). From Fig. 6(b), two values (|00110|00110\rangle and |01110|01110\rangle) stand out, which have a much higher probability of measurement than the rest. Based on the analysis of the phase estimation results, selecting result |00110|00110\rangle with a high probability of 0.5100.510. It means F(𝐱𝐰)0.923F\left({\mathbf{x}\cdot\mathbf{w}}\right)\approx 0.923. Compared with the theoretical result (shown in Eq. (46)), the experimental result has an error of 0.0770.077 which is tolerable. Subsequently, Bob1{\rm Bob}_{1} can estimate 𝒈111.846\bm{g}_{1}^{1}\approx 1.846 and 𝒈123.197\bm{g}_{1}^{2}\approx 3.197 by performing swap test. At same time, Bob2{\rm Bob}_{2} estimates F(𝐱𝐰)2.115F\left({\mathbf{x}^{\prime}\cdot\mathbf{w}}\right)\approx 2.115, 𝒈213.197\bm{g}_{2}^{1}\approx 3.197, and 𝒈229.157\bm{g}_{2}^{2}\approx 9.157 of his data via similar experiment.

As the analogous process of the example shown in the appendix C, Alice calculates the federated gradient G=(3.57,6.18)G=(3.57,6.18) via the QSMC protocol. Theoretical analysis shows that the error is within 2%2\% of the actual solution (3.5,6.06)(3.5,6.06) which is obtained in the example. Thus, the training algorithm is found to be successful.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 5: The construction of the gates for solving 𝐱i𝐰{\mathbf{x}_{i}\cdot\mathbf{w}}. (a) The optimized circuit for solving Eq. (46). (b) The circuit of the operation QQ. (c) The construction of 2|000000|I2|000\rangle\langle 000|-I. (d) The circuit of the operation I2|1111|I-2|11\rangle\langle 11|. (e) The construction of 𝒜\mathcal{A}.
Refer to caption
(a)
Refer to caption
(b)
Figure 6: Experimental results. (a) The resulting diagram of the eigenvalue registers measurement for QPE. These two significant values correspond to e𝐢θe^{\mathbf{i}{\theta}} and e𝐢θe^{-\mathbf{i}{\theta}}. (b) The proportional results of the estimation of 𝐱𝐰{\mathbf{x}\cdot\mathbf{w}}.

VI Conclusions

This work focuses on the design of the QFLGD for distributed quantum networks that can securely implement FL over an exponentially large data set. We first gave two methods of quantum data preparation, which can extract static data information and dynamic parameter information into logarithmic qubits. Then, we put forth the QGD algorithm to allow the time-consuming gradient calculation to be done on a quantum computer. In this way, the clients can estimate some urgently needed results of gradient training in parallel based on quantum superposition. The time complexity analysis is shown that our algorithm is exponentially faster than its classical counterpart on the number of data samples when the error 1/ϵ=log(MD)1/\epsilon=\text{log}(MD). Furthermore, the QGD algorithm could also achieve quadratic speedup on the dimensionality of the data sample if the parameters 𝐰\mathbf{w} are stored in QRAM timely. And, we proposed a QSMC protocol to calculate the federated gradient securely. The evidence is demonstrated that the proposed protocol could resist some common outside and participant attacks, such as the intercept-resend attack. Finally, we indicated how to apply it to train a federated linear regression model and simulated some steps with the help of the IBM Qiskit simulator. The results also showed the effectiveness of QFLGD. In summary, the presented framework demonstrates the intriguing potential of achieving large-scale private distributed learning with quantum technologies and provides a valuable guide for exploring quantum advantages in real-life machine learning applications from the security perspective.

We hope the proposed framework can further be realized on a quantum platform with the gradual maturity of quantum technology. For example, how to implement the whole QFLGD process on the noisy intermediate-scale quantum (NISQ) devices is worth further exploration, and we will make more efforts.

Acknowledgments

This work was supported by National Natural Science Foundation of China (Grants No. 62171131, 61976053, and 61772134), Fujian Province Natural Science Foundation (Grant No. 2022J01186 and 2023J01533), and Innovation Program for Quantum Science and Technology (Grant No. 2021ZD0302901).

Appendix A Implement the Unitary Operation UnfU_{nf}

In this appendix, we describe the implementation of a unitary operation UnfU_{nf}, which could generate a state about the 22-norm of 𝐱i\mathbf{x}_{i}. Its steps as shown in the following.

(1) A quantum state is initialized as

|φ1=1Dj=0D1|i1|j2|03.\left|\varphi_{1}\right\rangle=\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}\left|i\right\rangle_{1}}\left|j\right\rangle_{2}\left|0\right\rangle_{3}. (49)

(2) The oracle OXO_{X} is performed to obtain

|φ2=1Dj=0D1|i1|j2|𝐱ij3.\left|\varphi_{2}\right\rangle=\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}\left|i\right\rangle_{1}}\left|j\right\rangle_{2}\left|\mathbf{x}_{i}^{j}\right\rangle_{3}. (50)

(3) A register in the state |0|0\rangle is appended as the last register and rotated to 1(c1𝐱ij)2|0+c1𝐱ij|1\sqrt{1-(c_{1}\mathbf{x}_{i}^{j})^{2}}\left|0\right\rangle+c_{1}\mathbf{x}_{i}^{j}\left|1\right\rangle. After that, the system becomes

|φ3=1Dj=0D1|i1|j2|𝐱ij3[1(c1𝐱ij)2|0+c1𝐱ij|1]4,|\varphi_{3}\rangle=\frac{1}{\sqrt{D}}{\sum\limits_{j=0}^{D-1}|i\rangle_{1}}|j\rangle_{2}|\mathbf{x}_{i}^{j}\rangle_{3}\left[{\sqrt{1-(c_{1}\mathbf{x}_{i}^{j})^{2}}|0\rangle+c_{1}\mathbf{x}_{i}^{j}|1\rangle}\right]_{4}, (51)

where c1=1/maxi,j|𝐱ij|c_{1}={1/{\max_{i,j}|\mathbf{x}_{i}^{j}|}}. We can observe the ancilla register in the state |1|1\rangle with probability P1=c12𝐱i2/DP_{1}={{c_{1}^{2}\left\|\mathbf{x}_{i}\right\|^{2}}/D}. The state |φ3\left|\varphi_{3}\right\rangle can be rewritten as

|i1(1P1|g|0+P1|a|1)234,\left|i\right\rangle_{1}\otimes\left({\sqrt{1-P_{1}}\left|g\right\rangle\left|0\right\rangle+\sqrt{P_{1}}\left|a\right\rangle\left|1\right\rangle}\right)_{234}, (52)

where

|g=j=0D11(c1𝐱ij)2Dc12𝐱i2|j|𝐱ij\left|g\right\rangle={\sum\limits_{j=0}^{D-1}{\sqrt{\frac{1-\left(c_{1}\mathbf{x}_{i}^{j}\right)^{2}}{D-c_{1}^{2}\left\|\mathbf{x}_{i}\right\|^{2}}}\left|j\right\rangle}}\left|\mathbf{x}_{i}^{j}\right\rangle (53)

and

|a=1𝐱ij=0D1𝐱ij|j|𝐱ij.\left|a\right\rangle=\frac{1}{\left\|\mathbf{x}_{i}\right\|}{\sum\limits_{j=0}^{D-1}{\mathbf{x}_{i}^{j}\left|j\right\rangle}}\left|\mathbf{x}_{i}^{j}\right\rangle. (54)

(4) Appending a register in state |0log(ϵm1)\left|0\right\rangle^{\otimes\log(\epsilon_{m}^{-1})}. Then, the quantum phase estimation of U(φ3)S0U(φ3)S1-U\left(\varphi_{3}\right)S_{0}U^{\dagger}\left(\varphi_{3}\right)S_{1} is performed to obtain

|φ4=|i1(1P1|g|0+P1|a|1)234|𝐱i5,\left|\varphi_{4}\right\rangle=\left|i\right\rangle_{1}\otimes\left({\sqrt{1-P_{1}}\left|g\right\rangle\left|0\right\rangle+\sqrt{P_{1}}\left|a\right\rangle\left|1\right\rangle}\right)_{234}\otimes\left|\left\|\mathbf{x}_{i}\right\|\right\rangle_{5}, (55)

with the help of the square root circuit [42]. We denote the ϵm\epsilon_{m} is a tolerance error of QPE, U(φ3)|01234=|φ3U\left(\varphi_{3}\right)\left|0\right\rangle_{1234}=\left|\varphi_{3}\right\rangle, S0=I12342|012340|1234S_{0}=I_{1234}-2\left|0\right\rangle_{1234}\left\langle 0\right|_{1234} and S1=I123(I2|00|)4S_{1}=I_{123}\otimes\left(I-2\left|0\right\rangle\left\langle 0\right|\right)_{4}.

(5) The inverse operations of steps (2)-(3) are applied to generated the state

|φ5=|i|𝐱i.\left|\varphi_{5}\right\rangle=\left|i\right\rangle\left|\left\|\mathbf{x}_{i}\right\|\right\rangle. (56)

Therefore, the Unf:|i|0|i|𝐱iU_{nf}:\left|i\right\rangle\left|0\right\rangle\rightarrow\left|i\right\rangle\left|\left\|\mathbf{x}_{i}\right\|\right\rangle could be implemented in the above steps. And its running time is mainly caused by the quantum phase estimation in step (4), which takes time O(polylog(D)/ϵm)O\left({{\text{polylog}(D)}/\epsilon_{m}}\right). Moreover, 𝐰\left\|\mathbf{w}\right\| could be estimated similarly.

Appendix B An example of extracting
the parameter 𝐰(n)\mathbf{w}(n) information

In Sec. III-A, a way to prepare a quantum state of 𝐰\mathbf{w} without the help of QRAM is shown in step (A2)(A2). To further demonstrate it, an example is given in this appendix.

For convenience, supposing that 𝐰(n)=𝐰4\mathbf{w}(n)=\mathbf{w}\in\mathbb{R}^{4}. Then,we can get angle parameters ϑ10\vartheta^{0}_{1}, ϑ20\vartheta^{0}_{2}, and ϑ21\vartheta^{1}_{2} which are satisfied

cos(ϑ10)=h10h00,sin(ϑ10)=h11h00,cos(ϑ20)=h20h10,sin(ϑ20)=h21h10,cos(ϑ21)=h22h11,sin(ϑ21)=h23h11,\begin{split}&{\cos(\vartheta^{0}_{1})=\frac{h^{0}_{1}}{h^{0}_{0}}},~{}~{}~{}~{}{\sin(\vartheta^{0}_{1})=\frac{h^{1}_{1}}{h^{0}_{0}}},\\ &{\cos(\vartheta^{0}_{2})=\frac{h^{0}_{2}}{h^{0}_{1}}},~{}~{}~{}~{}{\sin(\vartheta^{0}_{2})=\frac{h^{1}_{2}}{h^{0}_{1}}},\\ &{\cos(\vartheta^{1}_{2})=\frac{h^{2}_{2}}{h^{1}_{1}}},~{}~{}~{}~{}{\sin(\vartheta^{1}_{2})=\frac{h^{3}_{2}}{h^{1}_{1}}},\end{split} (57)

according to Eq. (13). The values of htjh^{j}_{t} (t=0,1,2t=0,1,2) are shown in Fig. 7, such as h2j=𝐰jh^{j}_{2}=\mathbf{w}^{j} for j=0,1,2,3j=0,1,2,3.

Refer to caption
Figure 7: The data structure. ht1j=(ht2j)2+(ht2j+1)2h^{j}_{t-1}=\sqrt{(h^{2j}_{t})^{2}+(h^{2j+1}_{t})^{2}} for t=1,2t=1,2 and j=0,,2t11j=0,\cdots,2^{t-1}-1. Moreover, h2j=𝐰jh^{j}_{2}=\mathbf{w}^{j} for j=0,1,2,3j=0,1,2,3.

Then, the operations are defined as

U(ϑ1)=R(ϑ10)I,U(ϑ2)=|00|R(ϑ20)+|11|R(ϑ21),\begin{split}&U(\bm{\vartheta}_{1})=R({\vartheta}^{0}_{1})\otimes I,\\ &U(\bm{\vartheta}_{2})=|0\rangle\langle 0|\otimes R({\vartheta}^{0}_{2})+|1\rangle\langle 1|\otimes R({\vartheta}^{1}_{2}),\end{split} (58)

based on R(ϑ)=[cos(ϑ)sin(ϑ)sin(ϑ)cos(ϑ)]R(\vartheta)=\begin{bmatrix}\cos{(\vartheta)}&-\sin{(\vartheta)}\\ \sin{(\vartheta)}&\cos{(\vartheta)}\\ \end{bmatrix}. It is easy to verify that

U(ϑ2)U(ϑ1)|00=1𝐰(𝐰0|00+𝐰1|01+𝐰2|10+𝐰3|11).U(\bm{\vartheta}_{2})U(\bm{\vartheta}_{1})|00\rangle=\frac{1}{\left\|\mathbf{w}\right\|}(\mathbf{w}^{0}|00\rangle+\mathbf{w}^{1}|01\rangle+\mathbf{w}^{2}|10\rangle+\mathbf{w}^{3}|11\rangle). (59)

Thus, the quantum state of 𝐰\mathbf{w} can be obtained.

Appendix C An example of the model aggregation

In this appendix, an example is presented to exhibit the model aggregation. Considering the model is trained by two clients (Bob1{\rm Bob}_{1}, Bob2{\rm Bob_{2}}) who respectively have a 11-scale dataset, with the help of a server (Alice). The gradients g11=2g_{1}^{1}=2, g12=3.46g_{1}^{2}=3.46, g21=5g_{2}^{1}=5, and g22=8.66g_{2}^{2}=8.66 are assumed to be gained in the QGD algorithm. Simply, the eavesdropping check phase is ignored.

Firstly, Alice announces the accuracy of parameters is γ1=1/100\gamma^{-1}=1/100 and the global dataset scale is M1+M2=2M_{1}+M_{2}=2. She chooses d1=23d_{1}=23 and d2=29d_{2}=29 with clients. After that, Bob1{\rm Bob}_{1} calculates his secret

s1,11=μ11mod23=8,\begin{split}s_{1,1}^{1}&=~{}\mu_{1}^{1}\mod~{}23\\ &=8,\end{split} (60)

s1,21=13s_{1,2}^{1}=13, s1,12=12s_{1,1}^{2}=12, and s1,22=28s_{1,2}^{2}=28. At same time, Bob2{\rm Bob}_{2} can get s2,11=20s_{2,1}^{1}=20, s2,21=18s_{2,2}^{1}=18, s2,21=19s_{2,2}^{1}=19, and s2,22=27s_{2,2}^{2}=27.

Secondly, Alice prepares a 2323-level-33 particle GHZ state |Ψ=123q=022|q|q|q\left|\Psi\right\rangle=\frac{1}{\sqrt{23}}{\sum\limits_{q=0}^{22}|q\rangle|q\rangle|q\rangle} for d1=23d_{1}=23 and gives a particle to each client respectively. Then these participants perform the measurement to get os,1=7o_{s,1}=7, o1,1=6o_{1,1}=6, and o2,1=10o_{2,1}=10. Bob1{\rm Bob}_{1} (Bob2{\rm Bob}_{2}) encodes his secret by using o1,1o_{1,1} (o2,1o_{2,1}) and sets to Alice. The result

(7+8+6+20+10)mod23=(μ11+μ21)mod23=5,\begin{split}&(7+8+6+20+10)\mod~{}23\\ =&(\mu_{1}^{1}+\mu_{2}^{1})\mod~{}23=5,\end{split} (61)

could be computed by Alice.

Finally, the equations

(μ11+μ21)mod23=5,(μ11+μ21)mod29=2,\begin{split}&(\mu_{1}^{1}+\mu_{2}^{1})\mod~{}23=5,\\ &(\mu_{1}^{1}+\mu_{2}^{1})\mod~{}29=2,\end{split} (62)

and

(μ12+μ22)mod23=8,(μ12+μ22)mod29=26,\begin{split}&(\mu_{1}^{2}+\mu_{2}^{2})~{}\mod~{}23=8,\\ &(\mu_{1}^{2}+\mu_{2}^{2})~{}\mod~{}29=26,\end{split} (63)

could be obtained through a similar procedure. According to the Chinese remainder theorem, the federated gradient (3.5,6.06) is easy to get.

References

  • [1] Q. Jia, L. Guo, Y. Fang, and G. Wang, “Efficient privacy-preserving machine learning in hierarchical distributed system,” IEEE Transactions on Network Science and Engineering, vol. 6, pp. 599–612, 2019.
  • [2] B. Gu, A. Xu, Z. Huo, C. Deng, and H. Huang, “Privacy-preserving asynchronous vertical federated learning algorithms for multiparty collaborative learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, pp. 6103–6115, 2022.
  • [3] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics.   Proceedings of Machine Learning Research, 2017, pp. 1273–1282.
  • [4] C. H. Bennett and G. Brassard, “Quantum cryptography: Public key distribution and coin tossing,” in Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing.   IEEE New York, 1984, pp. 175–179.
  • [5] N. Gisin, G. Ribordy, W. Tittel, and H. Zbinden, “Quantum cryptography,” Reviews of Modern Physics, vol. 74, pp. 145–195, 2002.
  • [6] V. Scarani, H. Bechmann-Pasquinucci, N. J. Cerf, M. Dušek, N. Lütkenhaus, and M. Peev, “The security of practical quantum key distribution,” Reviews of modern physics, vol. 81, p. 1301, 2009.
  • [7] R. Schwonnek, K. T. Goh, I. W. Primaatmaja, E. Y.-Z. Tan, R. Wolf, V. Scarani, and C. C.-W. Lim, “Device-independent quantum key distribution with random key basis,” Nature communications, vol. 12, pp. 1–8, 2021.
  • [8] M. Hillery, V. Bužek, and A. Berthiaume, “Quantum secret sharing,” Physical Review A, vol. 59, pp. 1829–1834, 1999.
  • [9] A. Karlsson, M. Koashi, and N. Imoto, “Quantum entanglement for secret sharing and secret splitting,” Physical Review A, vol. 59, pp. 162–168, 1999.
  • [10] R. Cleve, D. Gottesman, and H.-K. Lo, “How to share a quantum secret,” Physical Review Letters, vol. 83, pp. 648–651, 1999.
  • [11] K. Boström and T. Felbinger, “Deterministic secure direct communication using entanglement,” Physical Review Letters, vol. 89, p. 187902, 2002.
  • [12] F. G. Deng, G. L. Long, and X. S. Liu, “Two-step quantum direct communication protocol using the einstein-podolsky-rosen pair block,” Physical Review A, vol. 68, p. 042317, 2003.
  • [13] P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,” SIAM Review, vol. 41, pp. 303–332, 1999.
  • [14] A. W. Harrow, A. Hassidim, and S. Lloyd, “Quantum algorithm for linear systems of equations,” Physical Review Letters, vol. 103, p. 150502, 2009.
  • [15] L.-C. Wan, C.-H. Yu, S.-J. Pan, S.-J. Qin, F. Gao, and Q.-Y. Wen, “Block-encoding-based quantum algorithm for linear systems with displacement structures,” Physical Review A, vol. 104, p. 062414, 2021.
  • [16] H.-L. Liu, L.-C. Wan, C.-H. Yu, S.-J. Pan, S.-J. Qin, F. Gao, and Q.-Y. Wen, “A quantum algorithm for solving eigenproblem of the laplacian matrix of a fully connected weighted graph,” Advanced Quantum Technologies, vol. 6, p. 2300031, 2023. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/qute.202300031
  • [17] C.-H. Yu, F. Gao, and Q.-Y. Wen, “An improved quantum algorithm for ridge regression,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, pp. 858–866, 2021.
  • [18] M.-H. Chen, C.-H. Yu, J.-L. Gao, K. Yu, S. Lin, G.-D. Guo, and J. Li, “Quantum algorithm for gaussian process regression,” Physical Review A, vol. 106, p. 012406, 2022.
  • [19] F. Scala, A. Ceschini, M. Panella, and D. Gerace, “A general approach to dropout in quantum neural networks,” Advanced Quantum Technologies, p. 2300220, 2023. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/qute.202300220
  • [20] Y.-D. Wu, Y. Zhu, G. Bai, Y. Wang, and G. Chiribella, “Quantum similarity testing with convolutional neural networks,” Physical Review Letters, vol. 130, p. 210601, May 2023.
  • [21] R. LaRose, A. Tikku, É. O’Neel-Judy, L. Cincio, and P. J. Coles, “Variational quantum state diagonalization,” npj Quantum Information, vol. 5, p. 57, 2019.
  • [22] H.-L. Liu, Y.-S. Wu, L.-C. Wan, S.-J. Pan, S.-J. Qin, F. Gao, and Q.-Y. Wen, “Variational quantum algorithm for the poisson equation,” Physical Review A, vol. 104, p. 022418, 2021.
  • [23] S.-X. Zhang, Z.-Q. Wan, C.-Y. Hsieh, H. Yao, and S. Zhang, “Variational quantum-neural hybrid error mitigation,” Advanced Quantum Technologies, vol. 6, p. 2300147, 2023. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/qute.202300147
  • [24] W. Li, S. Lu, and D. L. Deng, “Quantum federated learning through blind quantum computing,” Science China Physics, Mechanics &\& Astronomy, vol. 64, pp. 1–8, 2021.
  • [25] C. Ren, R. Yan, M. Xu, H. Yu, Y. Xu, D. Niyato, and Z. Y. Dong, “Qfdsa: A quantum-secured federated learning system for smart grid dynamic security assessment,” IEEE Internet of Things Journal, vol. 11, pp. 8414–8426, 2024.
  • [26] S. Y.-C. Chen and S. Yoo, “Federated quantum machine learning,” Entropy, vol. 23, p. 460, 2021.
  • [27] R. Huang, X. Tan, and Q. Xu, “Quantum federated learning with decentralized data,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 28, pp. 1–10, 2022.
  • [28] S. Wang, L. Huang, Y. Nie, X. Zhang, P. Wang, H. Xu, and W. Yang, “Local differential private data aggregation for discrete distribution estimation,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, pp. 2046–2059, 2019.
  • [29] R. Xue, K. Xue, B. Zhu, X. Luo, T. Zhang, Q. Sun, and J. Lu, “Differentially private federated learning with an adaptive noise mechanism,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 74–87, 2024.
  • [30] L. Wossnig, Z. Zhao, and A. Prakash, “Quantum linear system algorithm for dense matrices,” Physical Review Letters, vol. 120, p. 050502, 2018.
  • [31] V. Giovannetti, S. Lloyd, and L. Maccone, “Quantum random access memory,” Physical Review Letters, vol. 100, p. 160501, 2008.
  • [32] I. Kerenidis and A. Prakash, “Quantum recommendation systems,” arXiv preprint arXiv:1603.08675, 2016.
  • [33] K. Mitarai, M. Kitagawa, and K. Fujii, “Quantum analog-digital conversion,” Physical Review A, vol. 99, p. 012301, 2019.
  • [34] I. Cong and L. Duan, “Quantum discriminant analysis for dimensionality reduction and classification,” New Journal of Physics, vol. 18, p. 073011, 2016.
  • [35] G. Brassard, P. Høyer, M. Mosca, and A. Tapp, “Quantum amplitude amplification and estimation,” Contemporary Mathematics, vol. 305, pp. 53–74, 2002.
  • [36] C. P. Shao, “Fast variational quantum algorithms for training neural networks and solving convex optimizations,” Physical Review A, vol. 99, p. 042325, 2019.
  • [37] S. S. Zhou, T. Loke, J. A. Izaac, and J. Wang, “Quantum fourier transform in computational basis,” Quantum Information Processing, vol. 16, pp. 1–19, 2017.
  • [38] H. Buhrman, R. Cleve, J. Watrous, and R. De Wolf, “Quantum fingerprinting,” Physical Review Letters, vol. 87, p. 167902, 2001.
  • [39] M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information.   Cambridge University Press, 2010.
  • [40] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector machine for big data classification,” Physical Review Letters, vol. 113, p. 130503, 2014.
  • [41] A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow.   O’Reilly Media Inc, 2022.
  • [42] M. K. Bhaskar, S. Hadfield, A. Papageorgiou, and I. Petras, “Quantum algorithms and circuits for scientific computing,” arXiv preprint arXiv:1511.08253, 2015.