This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Vertical federated learning based on DFP and BFGS

1st Wenjie Song China University of Petroleum
Qingdao, China
[email protected]
   2nd Xuan Shen China University of Petroleum
Qingdao, China
[email protected]
   3rd Given Name Surname dept. name of organization (of Aff.)
City, Country
email address
Abstract

As data privacy is gradually valued by people, federated learning(FL) has emerged because of its potential to protect data. FL uses homomorphic encryption and differential privacy encryption on the promise of ensuring data security to realize distributed machine learning by exchanging encrypted information between different data providers. However, there are still many problems in FL, such as the communication efficiency between the client and the server and the data is non-iid. In order to solve the two problems mentioned above, we propose a novel vertical federated learning framework based on the DFP and the BFGS(denoted as BDFL), then apply it to logistic regression. Finally, we perform experiments using real datasets to test efficiency of BDFL framework.

Index Terms:
Federated learning, Machine learning, Non-iid data, Data privacy.

I INTRODUCTION

On the one hand, due to the emergence of the General Data Protection Regulation, more and more people are paying attention to privacy protection in machine learning. On the other hand, in real situations, more and more data island appears, making traditional machine learning difficult to achieve. Generally speaking, AI service needs data provided by users to train on a server. However, in this process, the data may come from various institutions, and although the institution wants to get a perfect model, it does not like leaking its own data. Therefore, in order to break data island and achieve privacy protection, Google [1] proposed federated learning in 2016. In FL, AI services can perform machine learning without collecting data from various institutions. FL allows the model to be trained locally and send encrypted information to the center server. Then the center server aggregates received data and send back to every client. Finally client could update parameter by themselves. For the method of updating parameters, there are GD, SGD, Mini-Batch SGD methods, but these methods are all first-order accuracy. Therefore, we consider a higher-order accuracy method, the newton method, but in the newton method, the Hessian matrix may be irreversible and even if it does be a inverse matrix, it is also extremely difficult to compute it. Therefore, we consider adopting the quasi-newton method. Among them, DFP and BFGS are two representative algorithms. Yang [2] et al. implemented BFGS under the algorithm architecture of logistic regression and applied it to vertical federated learning. But in terms of communication, there are still problems. Therefore, we combined DFP and BFGS to propose a new algorithm, which is used in the logistic regression algorithm of vertical federated learning. In the end, compared to other algorithm, our algorithm can achieve better results with less communication times.

II RELATED WORK

In recent years, a large number of studies on federated learning have emerged [3], [4], [5]. In its architecture, the use of gradient descent methods is common. However, the convergence of the first-order gradient descent method is lower than that of the second-order newton method. The calculation is very large when calculating the inverse of the Hesian matrix, so the quasi-newton method came into being, BFGS and DFP, as the two representative methods. A series of works on horizontal federated learning has been proposed [6],[7], each client has a part of the sample, but has all the data attributes. In vertical federated learning, each client holds part of the data attributes, and the samples are overlapped. [8] suggests that logistic regression is applied under the framework of vertical federation. Yang [2] and others use L-BFGS to implement logistic regression algorithm of vertical federated learning. It reduces communication cost. [9] combines federated learning with blockchain proposing BlockFL. Because of the consensus mechanism in the blockchain, BlockFL can resist attacks from malicious clients. FedAvg [4] is an iterative method that has become a universal optimization method in FL. In addition, in terms of theoretical proof, [10], [11] gives a proof of convergence for the FedAvg algorithm for non-IID data. In particular, [12] offers a boosting method based on tree model SecureBoost. Recently, [13] proposes the FedProx algorithm on the basis of FedAvg by adding proximal term. FedProx is absolutely superior to FedAvg in statistical heterogeneity and system heterogeneity.

In summary, FeaAvg as the baseline in FL, shows bad performance in the case of statistical heterogeneity and system heterogeneity. As an improvement of FedAvg, FedProx has great performance in non-iid environments. The first-order gradient descent method in traditional machine learning has strong universality. But for FL, when the communication cost is much more than the calculation cost, a higher-precision algorithm should be selected. In other words, higher computation cost should be used in exchange for smaller communication cost.

III ANALYTICAL MODEL

In this work, inspired by BFGS in logistic regression of vertical federated learning [2], we exlore a broader framework, BDFL, that is capable of managing heterogeneous federated environments when ensuring privacy security. Besides, our novel framework performs better than BFGS [2] and SGD [14].

III-A Logistic Regression

In vertical federated learning, [14] realizes classic logistic regression method. Let XRN×TX\in R^{N\times T} be a data set containing TT data samples, and each instance has NN features. Corresponding data label is y{1,+1}Ty\in\{-1,+1\}^{T}. Suppose there are two honest but curious participants party A(host) and party B(guest). A has only the characteristics of the data. And B has not only the characteristics, but also the label of the data. So XARNA×TX^{A}\in R^{N_{A}\times T} is owned by A and XBRNB×TX^{B}\in R^{N_{B}\times T} is owned by B. Each party has different data characteristics, but the sample id is the same. Therefore, the goal of optimization is to train classification model to solve

min𝒘RN1TiTl(𝒘;𝒙i,yi)\min\limits_{\boldsymbol{w}\in R^{N}}\quad\frac{1}{T}\sum_{i}^{T}l(\boldsymbol{w};\boldsymbol{x}_{i},y_{i}) (1)

where 𝒘\boldsymbol{w} is the model parameters. So 𝒘=(𝒘A,𝒘B)\boldsymbol{w}=(\boldsymbol{w}^{A},\boldsymbol{w}^{B}) where 𝒘ARNA\boldsymbol{w}^{A}\in R^{N_{A}} and 𝒘BRNB\boldsymbol{w}^{B}\in R^{N_{B}}. Moreover 𝒙i\boldsymbol{x}_{i} represents the feature of the i-th data instance and yiy_{i} is the corresponding label. The loss function is negative log-likelihood

l(𝒘;𝒙i,yi)=log(1+exp(yi𝒘T𝒙i))l(\boldsymbol{w};\boldsymbol{x}_{i},y_{i})=log(1+exp(-y_{i}\boldsymbol{w}^{T}\boldsymbol{x}_{i})) (2)

In [14], they use SGD to decrease gradient by exchanging encrypted middle information at each iteration. Party A and Party B hold vertically encrypted gradients 𝒈ARnA\boldsymbol{g}^{A}\in R^{n_{A}} and 𝒈BRnB\boldsymbol{g}^{B}\in R^{n_{B}} respectively, which can be decrypted by the third party C. Furthemore, to achieve secure multi-party computing, the additively homomorphic encryption is accepted. In the field of homomorphic encryption, a lot of works have been completed [15] [16]. Different computing requirements correspond to different encryption methods, such as PHE, SHE, FHE. After encryption, we can directly perform encrypted data with addition or multiplication operations, the value of decrypting the operation result is consistent with the result of the direct operation on the original data. That is [[m]]+[[n]]=[[m+n]][\![m]\!]+[\![n]\!]=[\![m+n]\!] and [[m]]n=[[mn]][\![m]\!]\cdot n=[\![m\cdot n]\!] with [[]][\![\cdot]\!] represent encryption method. However, homomorphic encryption has no idea to solve exponential calculation yet. So equation (2) cannot directly apply homomorphic encryption. We consider using Taylor expansion to approximate the loss function. Fortunately, it’s proposed in [14] as

l(𝒘;𝒙i,yi)log212yi𝒘T𝒙i+18(𝒘T𝒙i)2l(\boldsymbol{w};\boldsymbol{x}_{i},y_{i})\approx log2-\frac{1}{2}y_{i}\boldsymbol{w}^{T}\boldsymbol{x}_{i}+\frac{1}{8}(\boldsymbol{w}^{T}\boldsymbol{x}_{i})^{2} (3)

III-B Newton Method

The basic idea of newton’s method is to use the first-order gradient and the second-order gradient(Hessian) at the iteration point to approximate the objective function with the quadratic function, and then use the minimum point of the quadratic model as the new iteration point. This process is repeated until the approximate minimum value that satisfies the required accuracy. The newton’s method can highly approximate the optimal value and its speed is quite fast. Though it is very quickly, the calculation is extremely huge. For federated learning, this method is perfect when trading larger computational costs for smaller communication costs.

For convenience, we mainly discuss the one-dimensional situation. For an objective function f(𝒘)f(\boldsymbol{w}), the problem of finding the extreme value of the function can be transformed into the derivative function f(𝒘)=0f^{\prime}(\boldsymbol{w})=0, and the second-order Taylor expansion of the function f(𝒘)f(\boldsymbol{w}) is obtained

f(𝒘)=f(𝒘k)+f(𝒘k)(𝒘𝒘k)+12f′′(𝒘k)(𝒘𝒘k)2f(\boldsymbol{w})=f(\boldsymbol{w}_{k})+f^{\prime}(\boldsymbol{w}_{k})(\boldsymbol{w}-\boldsymbol{w}_{k})+\frac{1}{2}f^{\prime\prime}(\boldsymbol{w}_{k})(\boldsymbol{w}-\boldsymbol{w}_{k})^{2} (4)

and take the derivative of the above formula and set it to 0, then

f(𝒘k)+f′′(𝒘k)(𝒘𝒘k)=0f^{\prime}(\boldsymbol{w}_{k})+f^{\prime\prime}(\boldsymbol{w}_{k})(\boldsymbol{w}-\boldsymbol{w}_{k})=0 (5)
𝒘=𝒘kf(𝒘k)f′′(𝒘k)\boldsymbol{w}=\boldsymbol{w}_{k}-\frac{f^{\prime}(\boldsymbol{w}_{k})}{f^{\prime\prime}(\boldsymbol{w}_{k})} (6)

it is further organized into the following iterative expression:

𝒘k+1=𝒘kλH1f(𝒘k)\boldsymbol{w}_{k+1}=\boldsymbol{w}_{k}-\lambda H^{-1}f^{\prime}(\boldsymbol{w}_{k}) (7)

where λ\lambda represent step-size and HH represent Hessian.

This formula is an iterative formula of newton method. But this method also has a fatal flaw, that is, in equation (7), the inverse of the Hessian matrix needs to be required. As we all know, not all matrices have inverses. And the computational complexity of the inversion operation is also very large. Therefore, there is quasi-newton methods. BFGS and DFP, approximate newton method.

III-C Quasi-Newton Method

The central idea of the quasi-newton method is getting a matrix similar to the Hessian inverse without computing the inverse of Hessian. Therefore, the expression of the quasi-newton method is similar to equation (7), as follows

𝒘k+1=𝒘kλCkf(𝒘k)\boldsymbol{w}_{k+1}=\boldsymbol{w}_{k}-\lambda C_{k}f^{\prime}(\boldsymbol{w}_{k}) (8)

where CkC_{k} is the matrix used to approximate H1H^{-1}.

In contrast, the update formula is as follows in SGD

𝒘k+1=𝒘kλf(𝒘k)\boldsymbol{w}_{k+1}=\boldsymbol{w}_{k}-\lambda f^{\prime}(\boldsymbol{w}_{k}) (9)

Different quasi-newton methods are inconsistent with the iterative formula of CkC_{k}. Therefore, we explain the iterative formula of DFP and BFGS on CkC_{k} below.

III-C1 DFP

Ci+1=Ci+𝒘i𝒘iT𝒘iT𝒈i(Ci𝒈i)(Ci𝒈i)T𝒈iTCi𝒈iC^{\prime}_{i+1}=C_{i}+\frac{\triangle\boldsymbol{w}_{i}\triangle\boldsymbol{w}_{i}^{T}}{\triangle\boldsymbol{w}_{i}^{T}\triangle\boldsymbol{g}_{i}}-\frac{(C_{i}\triangle\boldsymbol{g}_{i})(C_{i}\triangle\boldsymbol{g}_{i})^{T}}{\triangle\boldsymbol{g}_{i}^{T}C_{i}\triangle\boldsymbol{g}_{i}} (10)

III-C2 BFGS

Ci+1′′=(I𝒘i𝒈iT𝒈iT𝒘i)Ci(I𝒈i𝒘iT𝒈iT𝒘i)+𝒘i𝒘iT𝒈iT𝒘iC^{\prime\prime}_{i+1}=(I-\frac{\triangle\boldsymbol{w}_{i}\triangle\boldsymbol{g}_{i}^{T}}{\triangle\boldsymbol{g}_{i}^{T}\triangle\boldsymbol{w}_{i}})C_{i}(I-\frac{\triangle\boldsymbol{g}_{i}\triangle\boldsymbol{w}_{i}^{T}}{\triangle\boldsymbol{g}_{i}^{T}\triangle\boldsymbol{w}_{i}})+\frac{\triangle\boldsymbol{w}_{i}\triangle\boldsymbol{w}_{i}^{T}}{\triangle\boldsymbol{g}_{i}^{T}\triangle\boldsymbol{w}_{i}} (11)

III-C3 BDFL

Ci+1=αCi+1+(1α)Ci+1′′C_{i+1}=\alpha C^{\prime}_{i+1}+(1-\alpha)C^{\prime\prime}_{i+1} (12)

In equation (10) and equation (11), 𝒘i=𝒘i+1𝒘i,𝒈i=𝒈i+1𝒈i\triangle\boldsymbol{w}_{i}=\boldsymbol{w}_{i+1}-\boldsymbol{w}_{i},\triangle\boldsymbol{g}_{i}=\boldsymbol{g}_{i+1}-\boldsymbol{g}_{i}. In equation (12), α\alpha is a number with no limits.

III-D Compute and Exchange information

Refer to caption
Figure 1: Information exchange between parties

The gradient and the Hessian of Taylor loss of the i-th data sample are given by 𝒈i=l(𝒘;𝒙i,yi)(14𝒘T𝒙i12yi)𝒙i\boldsymbol{g}_{i}=\nabla l(\boldsymbol{w};\boldsymbol{x}_{i},y_{i})\approx(\frac{1}{4}\boldsymbol{w}^{T}\boldsymbol{x}_{i}-\frac{1}{2}y_{i})\boldsymbol{x}_{i}, H=2l(𝒘;𝒙i,yi)14𝒙i𝒙iTH=\nabla^{2}l(\boldsymbol{w};\boldsymbol{x}_{i},y_{i})\approx\frac{1}{4}\boldsymbol{x}_{i}\boldsymbol{x}_{i}^{T} respectively. For convenience, we calculate the intermediate variable 𝒘T𝒙\boldsymbol{w}^{T}\boldsymbol{x} and express it

𝒖=𝒘T𝒙i\boldsymbol{u}=\boldsymbol{w}^{T}\boldsymbol{x}_{i} (13)

III-D1 Compute Gradient and Loss

First, after initializing ww and CC, both parties A and B calculate uau_{a} and ubu_{b}. After the calculation is completed, party A sends [[𝒖a]][\![\boldsymbol{u}_{a}]\!] and [[𝒖a2]][\![\boldsymbol{u}_{a}^{2}]\!] to B. Next, B calculates [[loss]][\![loss]\!], [[𝒅]][\![\boldsymbol{d}]\!] according to formula (16) and (14) and then sends [[𝒅]][\![\boldsymbol{d}]\!] to A. Then according to the equation (15), A calculates [[𝒈a]][\![\boldsymbol{g}_{a}]\!] and B calculates [[𝒈b]][\![\boldsymbol{g}_{b}]\!].

[[𝒅]]=14([[𝒖A[i]]]+[[𝒖B[i]]]2[[yi]])[\![\boldsymbol{d}]\!]=\frac{1}{4}([\![\boldsymbol{u}_{A}[i]]\!]+[\![\boldsymbol{u}_{B}[i]]\!]-2[\![y_{i}]\!]) (14)
[[𝒈]]1NiN[[di]]xi[\![\boldsymbol{g}]\!]\approx\frac{1}{N}\sum_{i\in N}[\![d_{i}]\!]x_{i} (15)
[[loss]]1NiNlog212yi([[𝒖A[i]]]+[[𝒖B[i]]])+\displaystyle\-[\![loss]\!]\approx\frac{1}{N}\sum_{i\in N}log2-\frac{1}{2}y_{i}([\![\boldsymbol{u}_{A}[i]]\!]+[\![\boldsymbol{u}_{B}[i]]\!])+ (16)
18([[𝒖A2[i]]]+2𝒖B[i][[𝒖A[i]]]+[[𝒖B2[i]]])\displaystyle\frac{1}{8}([\![\boldsymbol{u}_{A}^{2}[i]]\!]+2\boldsymbol{u}_{B}[i][\![\boldsymbol{u}_{A}[i]]\!]+[\![\boldsymbol{u}_{B}^{2}[i]]\!])

III-D2 Send Encrypted Information And Return

B sends the calculated [[loss]][\![loss]\!] to C. And C decrypts it and displays the results. Then A&B send [[𝒈a]],[[𝒈b]][\![\boldsymbol{g}_{a}]\!],[\![\boldsymbol{g}_{b}]\!] to C. After decrypting the gradient, C sends back the respective gradient plaintext.

III-D3 Update Hessian and 𝒘\boldsymbol{w}

After both A and B have received their respective gradients, they first update their CkC_{k}. Later, update 𝒘\boldsymbol{w} using the equation (8).

III-D4 Check

Party A&B check whether ww has reached convergence. If both of them converge, then output ww, if one of them does not converge, continue the loop.

Procedure 1 Basic Logistic Regression In Vertical FL

Input : w0A,w0B,XA,XB,YB,E,λw_{0}^{A},w_{0}^{B},X_{A},X_{B},Y_{B},E,\lambda
Output : wA,wBw^{A},w^{B}
Party C: Generated public key and private key
Party C: Send private key to A and B

1:for each round k=1,..,Ek=1,..,E do
2:     Party A: Compute ua,ua2u_{a},u_{a}^{2} as equation (13)
3:     Party A: Send [[ua]],[[ua2]][\![u_{a}]\!],[\![u_{a}^{2}]\!] to B.
4:     Party B: Compute ub,ub2u_{b},u_{b}^{2} as equation (13)
5:     Party B: Compute [[loss]][\![loss]\!] as equation (16)
6:     Party B: Compute [[d]][\![d]\!] as equation (14) and send to A
7:     Party A: Compute [[gA]][\![g_{A}]\!] as equation (15)
8:     Party B: Compute [[gB]][\![g_{B}]\!] as equation (15)
9:     Party A&B: Send [[gA]],[[gB]][\![g_{A}]\!],[\![g_{B}]\!] to C
10:     Party C: Decrypted [[gA]],[[gB]][\![g_{A}]\!],[\![g_{B}]\!] and send back
11:     Party A&B: Update ww as equation (9)
12:end for
Procedure 2 BDFL Framework

Input : w0A,w0B,XA,XB,YB,C0A,C0B,E,λw_{0}^{A},w_{0}^{B},X_{A},X_{B},Y_{B},C_{0}^{A},C_{0}^{B},E,\lambda
Output : wA,wBw^{A},w^{B}
Party C: Generated public key and private key
Party C: Send private key to A and B

1:for each round k=1,..,Ek=1,..,E do
2:     Party A: Compute ua,ua2u_{a},u_{a}^{2} as equation (13)
3:     Party A: Send [[ua]],[[ua2]][\![u_{a}]\!],[\![u_{a}^{2}]\!] to B.
4:     Party B: Compute ub,ub2u_{b},u_{b}^{2} as equation (13)
5:     Party B: Compute [[loss]][\![loss]\!] as equation (16)
6:     Party B: Compute [[d]][\![d]\!] as equation (14) and send to A
7:     Party A: Compute [[gA]][\![g_{A}]\!] as equation (15)
8:     Party B: Compute [[gB]][\![g_{B}]\!] as equation (15)
9:     Party A&B: Send [[gA]],[[gB]][\![g_{A}]\!],[\![g_{B}]\!] to C
10:     Party C: Decrypted [[gA]],[[gB]][\![g_{A}]\!],[\![g_{B}]\!] and send back
11:     if k!=1k!=1 then
12:         Party A&B: Update separately CC as equation (12)
13:     end if
14:     Party A&B: Update ww as equation (8)
15:end for

IV PERFORMANCE EVALUEATION

Our numerical experiment has two parts. In both of the experiments, we select 80% of the data for training and check the training loss. The remaining 20% is used as the test dataset to check the generalization ability of the model.

IV-A Compare Quasi-Newton with SGD

The first part is to compare SGD and quasi-Newton. It is applied to credit card dataset, which consists of 30000 instances and each instances holds 23 features. So, we shuffle the order of the instances. Party A holds 12 features, and Party B holds remaining 11 features and corresponding target. By using the two quasi-newton methods of DFP and BFGS, it is compared with the SGD method.

Refer to caption
Figure 2: Training loss in Credit Card experiment. Compare Quasi-newton method with SGD. All of them use leanring-rate decay of 0.06 per round.

IV-B Compare Quasi-Newton with BDFL

In the second part, to go further, we use BDFL(we proposed) and quasi-newton method to compare. Using the breast cancer dataset, which has 569 instances, 30 atrributes and label. Because there are 20% test dataset, so split them to XAR455×20X_{A}\in R^{455\times 20}, XBR455×10X_{B}\in R^{455\times 10} and YBR455×1Y_{B}\in R^{455\times 1}. The attribute index held by Party A are from 10-29, and those held by Party B are from 0-9.

Refer to caption
Figure 3: Training loss in Breast Cancer experiment. Compare BFGS and DFP with BDFL. All of them use leanring-rate decay of 0.05 per round.

In figure 2, it is clear that BFGS is much faster than SGD. In figure 3, it shows BDFL is better than both DFP and BFGS. What is more, we run every model in test dataset.

TABLE I: The Test Accuracy
Method Credit Card Breast Cancer
SGD 90.90% 86.26%
DFP 94.41% 85.57%
BFGS 95.10% 91.29%
BDFL 91.35%

V CONCLUSIONS

In this article, we use the quasi-newton method to replace the gradient descent method on the purpose of exchanging a larger amount of calculation for a smaller communication cost. In addition, we make improvements on the original basis of the quasi-newton. A novel framework, named BDFL, is proposed under vertical federated learning. Logistic regression is applied to the BDFL framework, which is used to test actual dataset. And the experiments have shown that BDFL can meet the following two premises for multi-party modeling:

  1. 1.

    Ensure data privacy would not leak;

  2. 2.

    One of them has only data but no labels. The other has data and label.

And the convergence speed and accuracy of the model are also better than traditional methods.

But our model still has some problems, such as the convergence speed did not meet our expectations, the amount of calculation is too large, etc. We will continue to study in future work.

References

  • [1] J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” 2017.
  • [2] K. Yang, T. Fan, T. Chen, Y. Shi, and Q. Yang, “A quasi-newton method based vertical federated learning framework for logistic regression,” 2019.
  • [3] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” 2019.
  • [4] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” 2017.
  • [5] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Processing Magazine, 2020.
  • [6] J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” 2017.
  • [7] K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” 2019.
  • [8] S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne, “Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption,” 2017.
  • [9] H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Blockchained on-device federated learning,” 2019.
  • [10] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of fedavg on non-iid data,” 2020.
  • [11] A. Khaled, K. Mishchenko, and P. Richtárik, “First analysis of local gd on heterogeneous data,” 2020.
  • [12] K. Cheng, T. Fan, Y. Jin, Y. Liu, T. Chen, and Q. Yang, “Secureboost: A lossless federated learning framework,” 2019.
  • [13] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” 2020.
  • [14] S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne, “Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption,” 2017.
  • [15] A. Acar, H. Aksu, A. S. Uluagac, and M. Conti, “A survey on homomorphic encryption schemes: Theory and implementation,” 2017.
  • [16] H. Zhu, R. Wang, Y. Jin, K. Liang, and J. Ning, “Distributed additive encryption and quantization for privacy preserving federated deep learning,” 2020.