Secure Linear Aggregation Using Decentralized Threshold Additive Homomorphic Encryption For Federated Learning
Abstract
Secure linear aggregation is to linearly aggregate private inputs of different users with privacy protection. The server in a federated learning (FL) environment can fulfill any linear computation on private inputs of users through the secure linear aggregation. At present, based on pseudo-random number generator and one-time padding technique, one can efficiently compute the sum of user inputs in FL, but linear calculations of user inputs are not well supported. Based on decentralized threshold additive homomorphic encryption (DTAHE) schemes, this paper provides a secure linear aggregation protocol, which allows the server to multiply the user inputs by any coefficients and to sum them together, so that the server can build a full connected layer or a convolution layer on top of user inputs. The protocol adopts the framework of Bonawitz et al. to provide fault tolerance for user dropping out, and exploits a blockchain smart contract to encourage the server honest. The paper gives a security model, security proofs and a concrete lattice based DTAHE scheme for the protocol. It evaluates the communication and computation costs of known DTAHE construction methods. The evaluation shows that an elliptic curve based DTAHE is friendly to users and the lattice based version leads to a light computation on the server.
Keywords:
Privacy Protection, Secure Linear Aggregation, Additive Homomorphic Encryption, Smart Contract.
1 Introduction
Federated learning (FL) is intended to train better machine learning models on decentralized real-world data. The models then could be used to build more intelligent equipments for people, such as cars, wearable devices or browsers. McMahan et al. [1] proposed a well known FL protocol. The players in their protocol include users who owned data and a parameter server that aggregates model information of users. The protocol runs periodically. In each period, the parameter server randomly selects some users to upload their local model parameters, and averages the parameters to update a global learning model. A user in a period downloads the global learning model, feeds their local data, runs a deep learning network locally and gets updated local model parameters, the information of which is sent to the parameter server. In different periods the server may select different users, and within a period some of the selected users may drop out.
As real-world data are usually sensitive, an important problem in FL is data privacy. Although user data are not directly sent to the parameter server in FL, information of local model parameters may leak the raw data of users. Fredrikson et al. [2] show how to recover train samples from prediction results of a model. Rubaie and Chang [3] exploit feature vectors to reconstruct raw input data of a model. Chai et al. [4] show how to recover preferences of users by model gradient data. Bonawitz et al. [5] believe that recent updated local model parameters of a user may leak raw data of the user.
There are mainly two approaches to solve the data privacy problem in FL. One uses differential privacy and the other uses cryptographic tools. The work of Martin et al. [6] is an earlier report of the differential privacy approach. Wei et al. [7] point out a tradeoff between the convergence performance and privacy protection levels of the differential privacy method. For a fixed privacy protection level, the number of users increases, the convergence performance behaves better. However, if the number of users in each period is limited, one may use the cryptographic tools based approach. Bonawitz et al. [5] propose an elegant solution based on one-time padding and a secure pseudorandom generator.
To the best of our knowledge, the solution of Bonawitz et al. [5] is the only work suitable for the FL using cryptographic tools. They take the data privacy problem in FL as an secure aggregation problem. And they show some new requirements of a secure aggregation protocol for the FL. Except a security requirement, other requirements are as follow:
-
1.
The protocol should operates on a high-dimensional vectors;
-
2.
The protocol should tolerate users dropping out;
-
3.
The protocol should be communication efficient even with a new set of uses on each period.
With these requirements, Bonawitz et al. [5] show that previous works are unsatisfactory which include some works based on homomorphic encryption schemes. In detail, they believe that solutions based on Paillier scheme [8, 9, 10, 11] are either computationally expensive or require additional trusted dealer, and solutions based on ElGamal scheme [12, 13, 14, 15] need a high expansion factor considering the size of the group elements and that of the model parameters.
Considering the development of the communication technology, we believe that a moderate expansion factor is acceptable and the functionality of a protocol is more important. Liu et al. [16] proposed a federated forest where a parameter server should find the maximal value of user inputs. Zhuo et al. [17] proposed a federated deep reinforcement learning model where a parameter server needs to build a multi-layer perception on user inputs. The users in these scenarios are usually not mobile users so that the communication cost is not the dominate factor. However, to protect the privacy of users, a sum only secure aggregation is not enough.
We provide a secure linear aggregation protocol to enrich the parameter server. It could naturally be used in the federated averaging algorithm [1] in the same way as the secure sum aggregation [5]. It also could be used in [17] to build a linear multi-layer perception to get a reinforcement learning model. It may be adapted in [16] with the homomorphic encryption [18] to find the maximal value of user inputs. For simplicity, the solution here is only based on decentralized threshold additive homomorphic encryption (DTAHE) schemes.
Currently, there are three known methods to construct a DTAHE scheme. The first method is to distribute many secret shares to a user. Bendlin and Damgård [19] proposed a threshold homomorphic encryption scheme in this way. It relies on secret sharing schemes with general access structure. Boneh et al. [20] proposed such a scheme based on secret sharing schemes constructed by a monotone formulae. The second method is to use a large modulus for coefficients of an element in a polynomial ring. Boneh et al. [20] propose to use Shamir secret sharing scheme in this way. The last method is to use the ElGamal encryption as a basic building tool [21]. We exclude the Paillier encryption based construction method since it is hard to produce a shared key pair in a distributed manner without a trusted dealer. According to our evaluation, in the secure linear aggregation protocol, the communication overhead of the first method is too high and the computation overhead of the third method is a bit high. The second method also has a drawback since a large modulus means a higher polynomial degree when the noise bound is fixed [22]. So we provide a new method to construct a DTAHE scheme with security proofs. It does not increase the modulus size and polynomial degree.
The secure linear aggregation protocol is against an active adversary [5]. An active adversary could corrupt a parameter server and ask users to decrypt a cipher of a target user. It is not easy to defend against the attack. Note that Bonawitz et al. [5] add a consistency check round to solve a similar problem in their protocol. However, even we add such a round, the problem still exists since the target user may have dropped out before the consistency check round. To solve the problem, we introduce a blockchain system. We design a smart contract to record ciphers of users and to check the evaluation process of the parameter server. If the parameter server deviates its expected behaviours in a period, the parameter server could be punished and the users in that period could get compensation.
There exists a lot of works to introduce a blockchain into the FL. A top complained problem in FL is the incentive of users to participate a learning process. Researchers propose blockchain enabled models [23, 24, 25, 26, 27] to give rewards to data owners. Basically, model initializer proposes model parameters and rewards in the blockchain, data owners choose their interested model to download the current model parameters, to train an updated model by their local private inputs and to update information of model parameters to the blockchain, and miners of the blockchain aggregate inputs of users to get a new global model parameter and rewards relevant users. Pokhrel and Choi [28] and Kim et al. [29] show some theoretical results about the performance of blockchain enabled FL considering the delays in a blockchain. A simulation result in [30] shows that the FL without a blockchain is most efficient. Another motivation to introduce a blockchain system to the FL is about the trustiness of a learning model. Sarpatwar et al. [31] propose to model and capture provenance of an overall learning process for verification. Awan et al. [32] propose to record data produced in a learning task in a blockchain for verification. We use a blockchain as a trusted third party to verify the behaviours of the parameter server. Since a verification process usually could be separated from a learning process, the FL and the blockchain could work at their own paces.
In summary, our contributions are as follows.
-
•
We give a definition of DTAHE for FL and provide a basic linear aggregation protocol based on the definition. The basic protocol supports a parameter server to linearly operate ciphers of model parameters from different users.
-
•
We provide a secure linear aggregation protocol against an active adversary with security model and proofs. It shows how to use a blockchain smart contract to help a secure linear aggregation protocol.
-
•
We provides a new method to construct a lattice based instance of the DTAHE scheme with security proofs. Evaluations show that the DTAHE scheme leads to a lightweight computation on the server side.
2 Preliminaries
2.1 Basic Notations
For any set , we denote by the number of elements of the set . If is a string, denotes its bit length. And if is a vector, denotes the dimension of the vector. denotes the bit catenation of two strings and .
Let be a polynomial ring where is a monic irreducible polynomial of degree . Elements of the ring is denoted by vectors. For , the coefficients of is denoted by such that . The infinity norm of is defined as and the expansion factor of R is defined as .
Let be an integer. Then denotes a set of integers . The symbol denotes a ring on integers . For , denotes the unique integer in with . For , denotes the element in obtained by applying to all its coefficients. For , denotes rounding to the nearest integer and , denote rounding up or down.
Let be an integer as the security parameter. A function is negligible in if for every . An event occurs with negligible probability if the probability of the event is . An event occurs with overwhelming probability if its complement occurs with negligible probability.
Given a probability distribution , we use to denote that is sampled from . For a set , denotes that is sampled uniformly from . A distribution over integers is called -bounded if it is supported on .
2.2 Federated Learning
McMahan et al. [1] described the FL framework. We give a briefly review to see their federated averaging algorithm. As shown in Fig. 1, there is a parameter server and many users. exchanges model parameters with users to collaboratively build a better learning model. The federated averaging algorithm runs periodically. In each period, selects users randomly. In Fig. 1, we intended to show two periods with totally different users. Next we focus on a period where users in a set .

The FL in a period is as follows.
-
1.
A user is activated to download the global parameters from the parameter server.
-
2.
The user feeds their local data samples to get updated model parameters. And the parameters or their gradients with the number of local data samples should be sent to .
-
3.
The parameter server compute the updated global model parameters .
The parameter server continues to randomly activate a set of users in the next period until the parameters converge to that of the global optimal learning model.
Users could upload the together with so that the server could get the value and compute the updated . So a sum only aggregation is enough for the federated averaging algorithm. However, a parameter server may execute more computations in other federated learning models [16, 17], where a sum only aggregation is not enough.
2.3 The Model Of Secure Aggregation For FL
We adapt the notion of message-driven entities in [33] to the FL security model [5] to describe the participants in a FL scenario. A message-driven entity is initially invoked by an environment process. Each entity has their initial states. Once invoked, an entity waits for an activation that can happen for a message from the network or an environment process. On activation, the entity processes the incoming message or the arguments from a process with its current internal state, and generates a new internal state, outgoing messages and a cumulative output. Once an activation is completed, the entity waits for the next activation if it does not stop.
A protocol consists of a server side progress and a client side progress . An FL process in invokes , which runs periodically until is stopped by the FL process. In each period, randomly selects a set of users and activates their client side progresses. A user should have voluntarily invoked their before activates them. On activation, exchanges messages with until finishes a period or is stopped by the user. There may be some auxiliary entities for the security of the protocol . A common setup includes a certificate authority (CA) [5]. A is invoked and runs permanently to receive certificate requests from users and issue certificates to users. In our protocol, we also introduce blockchain miners . Miners are invoked to receive transactions from users and the server, and to execute smart contracts on commonly consented transactions.
The server and each user have a bi-directional secure channel between them. Practically, it reuses a transport layer secure channel established by FL processes. Users do not have direct links in the protocol . The communications of users are transferred by the server. Users and the may exchange messages at an initial stage. When the protocol runs, the could be offline. Blockchain miners usually have a point-to-point (P2P) network to deliver transactions and blocks. When a blockchain is used in , FL entities should have the ability to send and receive transactions to and from the P2P network.
An active adversary could corrupt the server and users in . For each period, the number of corrupted parties is at most [5]. When corrupts an entity, it knows the internal states and long-term secrets of the entity, and controls all the behaviours of the entity from that point. Obviously, the server could be corrupted to act as a malicious server. When corrupts a user, it could send transactions on behalf of the user.
We does not allow an adversary to corrupt or since we do not try to model the security of a CA system or a blockchain system. We simply make assumptions about them:
-
•
A transaction from a user or the server will be executed and recorded correctly by blockchain miners within a bounded delay.
-
•
A certificate from a proves the binding relation of a verification key and the real identity of a user.
We define as a global output for in a period where denotes the inputs of the server and the inputs of users in the set of the period , and denotes the random inputs of the adversary , server and users in . Let
be the random variable about . The randomness comes from the adversary , server and users in .
We define the input privacy of a user [5] as follows.
Definition 1
(Input Privacy of a User). For any uncorrupted user in a period , the privacy of its input is assured if
where and “” means computationally indistinguishable.
The security goal in [5] is input privacy of all uncorrupted users. Apparently, if the privacy of all users are protected, the privacy of a user is protected. If the privacy of each uncorrupted user in protected, the privacy of all uncorrupted users are protected. So the two definitions are equivalent.
2.4 DTAHE
Boneh et al. [20] give a definition of decentralized threshold fully homomorphic encryption (DTFHE). We refine it as a DTAHE definition for the FL scenario. A DTAHE scheme is a tuple of probabilistic polynomial time (PPT) algorithms , , , , , , , .
-
1.
: It takes as input a security parameter , outputs system parameters .
-
2.
: It takes as input the system parameter to produce public and private keys for a user .
-
3.
: It takes as input the system parameter , public keys of users in a set excluding the user , a threshold value and private keys of the user , to produce encrypted shares for each user .
-
4.
: It takes as input the system parameter , public keys of a set of users in , and produces an encryption key .
-
5.
: It takes as input the system parameter , a public key and a message from a user , and produces a ciphertext .
-
6.
: It takes as the system parameter , ciphers and coefficients , and produces an evaluated cipher .
-
7.
: It takes as input the system parameter , the cipher , and a set of encrypted shares to the user , and produces a partially decrypted value .
-
8.
: It takes as input the system parameter , the threshold value , the cipher and partially decrypted ciphers from users in a set with , and produces a plaintext .
One could simply give an ElGamal based DTAHE instance following the constructions in [21], which justifies the correctness of the DTAHE definition.
2.5 DTAHE Model
We adapt the model of DTFHE in [20] for the DTAHE. The first definition is evaluation correctness.
Definition 2
(Evaluation Correctness). A DTAHE scheme for a set of users satisfies evaluation correctness if for all and , the following holds:
For an evaluated cipher
the probability
is overwhelming where
and
The second definition is sematic security. It captures the privacy of messages.
Definition 3
(Sematic Security). We say that a DTAHE scheme for a user set satisfies sematic security if for all , the following holds:
For any PPT adversary , the following experiments outputs with probability :
-
•
:
-
1.
The adversary outputs and where and specify an access structure.
-
2.
The challenger runs
and provides to .
-
3.
outputs a set such that . It submits message vectors and to the challenger.
-
4.
The challenger provides the shares and a cipher set
-
5.
outputs a guess bit . The experiment outputs if .
-
1.
The last definition is simulation security. It captures the privacy of shared secrets and private keys of users.
Definition 4
(Simulation Security). A DTAHE scheme satisfies simulation security if for all , the following holds:
There is a stateful PPT algorithm such that for any PPT adversary , the following experiments and are indistinguishable:
-
•
:
-
1.
The adversary outputs and where and specify an access structure.
-
2.
The challenger runs
and provides to .
-
3.
outputs a set with and messages .
-
4.
The challenger provides the shares in and a cipher set
-
5.
issues a polynomial number of adaptive queries of the form where . For each query, the challenger computes and provides
-
6.
At the end of the experiment, outputs a distinguishing bit .
-
1.
-
•
:
-
1.
The adversary outputs and where and specify an access structure.
-
2.
The challenger runs
and provides to .
-
3.
outputs a set with and messages .
-
4.
The challenger provides shares and ciphers
-
5.
issues a polynomial number of adaptive queries of the form where . For each query, the challenger runs
and provides to .
-
6.
At the end of the experiment, outputs a distinguishing bit .
-
1.
3 Secure Linear Aggregation Protocol
We provides two protocols in this section. The first is a basic protocol showing how to embed a DTAHE scheme to the secure aggregation framework in [5]. The second is the secure linear aggregation protocol against an active adversary.
3.1 A Basic Protocol
As shown in Fig. 2, initially a server runs to provide a system-wide parameters for all users, and each user runs to produce their key pairs. The server and users then runs a four-round protocol with an agreed threshold value . We next focus on a period to show the protocol.

-
1.
Round 1 (AdvertiseKeys). When a user is active, it packs a message as and sends the message to the server . The user then waits for the first response from or stops.
The server makes a set . After it receives a message , it sets . When , packs a message as . The message is broadcasted to all the users in as their responses.
-
2.
Round 2 (ShareKeys). When a user receives , it makes a set from and checks . If the verification passes, the user executes as follows:
-
(a)
It makes a public key set from and produce encrypted shares by
-
(b)
It packs and sends the message to . Then waits for the second response or stops.
The server builds a set in the same way as satisfying . When , packs a message for each user , and sends them to the users in .
-
(a)
-
3.
Round 3 (CipherCollection). When a user receives a message , it makes a set from . If , executes as follows:
-
•
It runs .
-
•
It runs to compute the cipher of its private input .
-
•
It packs a message and sends the message to . Then waits for the third response or stops.
The server builds a set in the same way as satisfying . It runs
and sends to users in .
-
•
-
4.
Round 4 (Decryption). When a user receives , it runs
then packs a message and sends the message to the server . The user finishes the client protocol for the period and stops.
The server builds a set in the same way as satisfying . If , it runs
to get as an aggregated value. The server finishes the server side protocol for the period and waits for the next period.
Remark 1
The basic protocol uses the framework in [5] to tolerate user dropping out. It needs the sever to be honest. A malicious server may simply return a cipher as to decrypt the cipher.
3.2 The Secure Linear Aggregation Protocol
Bonawitz et al. [5] use a certificate authority (CA) and an extra consistency round to defend against an active adversary. We also use a CA but keep our protocol four rounds. We introduce a smart contract in a blockchain to encourage the server and users to be honest.
The blockchain is described as [34] where is a state transition function, is the system state in a block height , and is a transaction in the system. A transaction could be described as
where and fields are the sender and receiver accounts of the transaction, field is the values to be transferred from the sender to the receiver, field is arbitrary data of the sender, field is the auxiliary information used in the blockchain and field is the sender’s signature. The instance of the blockchain certainly could be the Ethereum [34]. Any other blockchain system supporting state transition is fine.
Initially, the server has an account and a user has an account in the blockchain. The server should deploy a smart contract on the blockchain which will have an account after deployment.
Definition 5
(). The smart contract includes three functions. A function is for the server to deposit values and set parameters. A function is for a user to record their encrypted inputs. A function is for the server and users to check the correctness of a protocol transcript.
-
•
: If the transaction is signed by the server , the field of the transaction includes “” as the function name, and a threshold value and the number of periods supported by the contract as the arguments, the function checks a deposit variable of the server . If , it checks that where is the smallest deposit value for a period. If the check fails, it stops. Otherwise, it transfers values from the server account to the contract account, and stores in the contract.
-
•
: If the field of the transaction includes “” as the function name, and an epoch session id and a cipher as the arguments, the function stores in the contract.
-
•
: If the transaction is singed by the server , the field of the transaction includes “” as the function name, and a termination flag , an epoch session id , a cipher , a list of user accounts , and coefficients as the arguments, the function checks whether is new. If the is used before in another transaction , it checks whether the two transactions are the same. If they are different, the check fails. If they are the same, the function stops. The function forms a set
If is new, it checks that the size , and
If any check fails, the value is shared by users in the set. Otherwise, it updates , and if is , it transfers the deposit back to the account of the server after six new blocks. When the deposit is cleared, the contract suicides.
Now the server has two tasks to initialize the protocol. At first, the server runs to provide system-wide parameters for all users. Secondly, the server produces a transaction as
to initialize the deployed smart contract.
A user has three tasks to participate in the protocol II. At first, it runs to produce protocol keys. Secondly, it produces a transaction
and sends the transaction to the blockchain to store its public keys. Thirdly, the user should produce a signature key pair using a signature scheme that is existential unforgeable against adaptive chosen messages (EUF-CMA). And then applies for a certificate of the verifying key from a CA.
The CA and miners are auxiliary entities in the protocol. A CA is used to defend against a “Sybil” attack where an adversary may create many blockchain accounts to act as users. The miners of a blockchain receive transactions, pack them into blocks, reach a consensus on a block and update the global state. The public keys of users, parameters of the server , and deposited values of the server are states of the blockchain maintained by the miners. The whole picture of the protocol is in Fig. 3. We next focus on a period to show the protocol.

-
1.
Round 1 (AdvertiseAccounts). When a user is active, it looks up the deposit , supported periods and the threshold in the smart contract account indexed by the server account . If , , or are too small for the user’s local policy, the user simply stops. Otherwise, it produces a signature
where is a time stamp of the user. It packs a message as and sends the message to the server . The user then waits for the first response from or stops.
The server builds a set as in the basic protocol. It then packs a message as where is a session number for the period. The message is broadcasted to all the users in set as their response.
-
2.
Round 2 (ShareKeys). When a user receives , it parses to extract the identities of users which form a set . If checks that , and the time stamps, signatures and certificates are correct. If the verifications pass, looks up the blockchain to find public keys of users in by their accounts which form a set , and executes as in the basic protocol.
The server executes in the same way as in the basic protocol to produce for a user .
-
3.
Round 3 (CipherCollection). When a user receives a message , it makes a set from . If , it executes in the same way as in the basic protocol to compute and . Then creates a transaction
and sends the transaction to the blockchain. The user then waits for a response transaction of the server from the blockchain network or stops.
The server makes a set . After it receives a transaction , if the transaction includes the same as the current session number, sets where the identity is indexed by the . It then runs
and packs a transaction
and sends the transaction to the blockchain.
-
4.
Round 4 (Decryption). When a user receives , it builds a set indexed by . If , , and the field is the current session number, then executes in the same way as in the basic protocol to interact with the server .
The server executes in the same way as in the basic protocol to get an aggregated result.
When the FL process is to stop, may send a with the termination flag. The contract will delay the withdraw action until six blocks are generated. This is intentionally for users to check the correctness of the server. Before the deposit is given back to the server, a user could send received in an period to the contract. If there are different transactions with the same field, the deposit will be shared by users in that period.
Remark 2
In a period, two transactions are transferred by the blockchain network. Miners, the parameter server and users will receive transactions, parse them, and process them independently.
3.3 Security Analysis
We prove the security of the secure linear aggregation protocol against an active adversary. In a high level, we divide the global outputs into four views, and analyze the advantages of the adversary as a new view appears.
Theorem 3.1
Suppose that a DTAHE scheme is sematic and simulation secure, a signature scheme is EUF-CMA secure, and the secret sharing scheme in the DTAHE has a privacy property [20]. Let be the threshold value, be the maximal number of corrupted users in a period, and be the minimal number of private inputs to make the final aggregated value hide one input. For any uncorrupted user , against an adversary , if and the deposit of the server in the smart contract has not lost, the protocol satisfies .
Proof: A global output consists of four views . The includes the outputs of the initialization procedures and the first round of the server and users. The views include the outputs of their corresponding rounds. We consider two global outputs and with inputs and . Note that the only difference of the two inputs are the inputs of the user . We denote the four views of as .
-
•
Initially, and include the same certificates, public keys and accounts in the initialization procedure of the protocol .
-
•
When executes, and include message-signature pairs of users and the first response of the server. and include encrypted shares and the second response of the server. Since private inputs of the user are not involved, the two global outputs are indistinguishable until now.
-
•
The and consist of two kinds of transactions, separately. Transactions of users in could form a ciphertext set . The transaction of the server includes . In the , the ciphertext of the target user is , and the evaluated ciphertext of the server is . In the , the ciphertext of the target user is , and the evaluated ciphertext of the server is produced in the same way. Now the chances of increases.
-
–
could exploit the differences of the two views and . Since the only difference is the message and the message , if the DTAHE is sematic secure, has only negligible advantages to distinguish the two views.
-
–
could exploit the combined views of and . Suppose that could get the shared secrets in and . Then could decrypt by the and algorithms. However, since the DTAHE is simulation secure, the secret shares are protected well. has only negligible advantages to distinguish the two combined views.
-
–
could exploit the combined views of and in the two global outputs.
At first, Suppose that corrupts users in and . It then knows secret keys which could be used to decrypt ciphers in . The decrypted shares may help to decrypt . However, since we assume and the secret sharing scheme has a privacy property[20], this event happens negligibly.
Secondly, may make special and . could establish blockchain accounts and store public keys in each account. Then acts as users to cheat the target user . However, since messages in and are signed by users who have certificates from a CA, the accounts of could not be identified by the user . And since we explicitly include a time stamp in the first message of a user, could not replay messages of users in other periods. To satisfy the claims, the signature scheme should be EUF-CMA secure.
-
–
-
•
and consists of partially decrypted ciphers and the final aggregated value. has more chances:
-
–
could exploit the partially decrypted ciphers to discover secret shares. Since the DTAHE is simulation secure, has only negligible advantages.
-
–
could exploit the final aggregated value. An obvious strategy is that in the transaction of the server , only is included. To make the strategy more practical, may aggregate some ciphers of corrupted users and the target user. Then the final output is the sum of inputs.
The strategy is a possible way to get inputs of a user at the cost of the deposit in the smart contract. Since comes from the blockchain network, miners will execute the function in the smart contract. The function takes as , and checks the correctness of the evaluation of the server . The smart contract builds a and checks that . So if , the check fails and the server will be punished. To avoid the punishment, should prepare a set with .
may try to withdraw their deposit before the penalty is paid. However, the smart contract needs extra six blocks to confirm the request of the server. The six-block waiting time is the last chance of users to check the behaviours of the server. Users are encouraged to send the received transaction of the server to the blockchain since they are the possible beneficiaries.
-
–
In summary, if the cryptographic primitives are secure, , and the server pays no penalties, the adversary has only negligible advantage to distinguish the two outputs and .
Remark 3
We explicitly introduce a parameter to describe the number of private inputs of uncorrupted users in a period. The parameter is in fact used in [5] implicitly. The consistency check round [5] makes sure that at least private inputs are summed. If inputs are known by an adversary , then in their protocol .
4 A DTAHE Instance
Since all known construction methods of DTAHE schemes have some weaknesses. We provide a new construction method to produce a lattice based DTAHE scheme for an interactive protocol.
4.1 The Instance
We use the BFV scheme [22, 35] as a basic building block which is implemented in the SEAL library [36]. We need a sematic secure hybrid encryption scheme [37] to encrypt shares. A lattice based instance will give us a fully lattice DHAHE instance that may be post quantum secure. We also need the Shamir secret sharing scheme denoted by .
-
1.
: It takes as input a security parameter , produces a parameter set , where is the degree depending on of a cyclotomic polynomial , is an integer depending on , is a ring , is the set of polynomials in with coefficients in , here is in fact defined as discrete Gaussian distribution, is a uniform distribution, is uniformly selected from as , and is an integer depending on .
-
2.
: It selects and samples . It sets and . It runs . The output is and .
-
3.
: It samples , computes , sets , and for each coefficient of the elements in , computes . For all the shares to , it computes a cipher .
-
4.
: It computes .
-
5.
: It selects , , computes and . It sets .
-
6.
: It computes , and sets .
-
7.
: It decrypts the shares for the user from the user as . It parses the shares of coefficients as shares of elements in , sets , then computes .
-
8.
: It recovers where is the Lagrange coefficient of the user with respect to the user set , and computes the final output .
Remark 4
If the data dimension of the message is greater than , the number of noise samples increases.
4.2 Security Analysis
The security of our scheme could be reduced to a variant of the classical ring version decisional learning with errors (RLWE) problem [38, 22]. The variant is named -Decision-RLWE problem.
Definition 6
(-Decision-RLWE). For a random set and a distribution over , denote with the distribution by choosing a uniformly random element and noise term and outputting . The problem is then to distinguish between the distribution and a uniform distribution over .
By a hybrid argument, one could conclude that if an adversary has an advantage at least to solve the -Decision-RLWE problem, the adversary has an advantage at least to solve the classical RLWE problem in [38, 22].
The first proof is about the evaluation correctness in the definition 2.
Theorem 4.1
Assume that is the user set, is -bounded and the maximal infinity norm of elements in the set is , the evaluation of the DTAHE is correct with probability if .
Proof
If , the above decryption is correct. Since , , , , , the maximal infinity norm of elements in the set is , the infinity norm of is
(5) |
The second proof is for the privacy of messages in the definition 3.
Theorem 4.2
If there is an adversary with advantage to make the experiment output , one could construct a challenger to break the -decision-RLWE problem with an advantage under the condition that the secret sharing scheme has the privacy property [20] and the hybrid encryption scheme is sematic secure.
Proof
With and , the challenger samples a -decision-RLWE instance . It embeds the problem instance into the DTAHE instance as follows:
It then provides to .
The challenger plays with by until outputs .
If the scheme is sematic secure, the ciphers leak nothing about shares. Then from the privacy property of the scheme, if , can not distinguish a secret from zero. So should produce an educated guess .
The strategy of the challenger is to use the guess of . If the outputs , the challenger believes that the -decision-RLWE instance is a uniform random sample from .
When the input is indeed a uniform random sample from , the advantage of is simple negligible since the messages are masked by random values. Otherwise, the adversary has an advantage by assumption. So the advantage of the challenger is .
The third proof is for the privacy of secret keys and shares in the definition 4.
Theorem 4.3
If the secret sharing scheme has the privacy property [20] and the hybrid encryption scheme is sematic secure, the adversary has negligible advantage to distinguish the two experiments and .
Proof
The proof needs a serial of hybrid experiments between an adversary and a challenger.
-
•
: This is the experiment in the definition 4.
-
•
:Same as , except that the challenger simulates the algorithm to produce for queries of . Note that has given the challenger a set with the size . From , the challenger could construct a set . For each party , . The challenger sets as
where is defined in the theorem 4.1. If , the challenger computes as in the game .
The correctness of the simulation is obviously since
(6) -
•
: Same as , except that the challenger shares zero as
By the privacy property [20] of the scheme and the sematic security of the scheme, and are indistinguishable.
-
•
: Same as , except that is replaced by as
(7) where .
Since , and have the same distribution. In fact, may appear in an experiment when the happens to be part of the secret keys of users. Now the challenger does not use the private keys of users or secret shares of users in . So the ideal experiment could be simulated indistinguishably.
5 Performance
We evaluate the communication and computation costs of the secure linear aggregation protocol.
5.1 Communication
We have stated that there are mainly three methods [21, 19, 20] to construct a DTAHE. We concrete the method in [21] based on an elliptic curve version ElGamal (EC-ElGamal) scheme, and other methods based on the BFV [35, 13] scheme. The user side communication overhead of the secure linear aggregation protocol is calculated as follows:
(8) |
where is the number of users. Table I shows the main components in the equation (8). The security parameter is . An element in EC-ElGamal takes bytes where one byte is for -coordinate. denotes the size of a ring element in , the number of shares to each user and the number of ciphers to encrypt the user input . and have the same meaning as and with and . The method to use Shamir secret sharing in [20] is denoted as BGGJK-2, and the other is denoted as BGGJK-1.
Fig. 4 shows the main communication overhead of the protocol on user side with different DTAHE constructions when the number of user increases. We mainly consider the components in Table I. We set for all instances, and set and so that bytes and . We set and compute . The values of and are relative to the number of users since the noise element in the algorithm should be multiplied by , which are limited by the Theorem 4.1 and the equation (6) in [22].
From Fig. 4, we exclude the BD[19] and BGGJK-1[20] methods to distribute many shares to a user. Our method is better than the BGGJK-2[20] when the user number is greater than . The EC-ElGamal method has the best communication performance when the number of users is greater than .

5.2 Computation
Since the computation costs of the protocol are dominated by a DTAHE scheme. We give Table II to show the time cost of some DTAHE algorithms. We set the user number as and implement three DTAHE schemes for comparison. The schemes are implemented in Python. The “pyOpenSSL” is used to implement the EC-ElGamal based DTAHE. The small discrete logarithm of a group element is found by the well-known “Baby-Step-Giant-Step” method. An open source library “bfv-python” is adapted to implement our scheme and the BGGJK-2 method in [20]. For simplicity, we use a public python module “eciespy” as an instance of the scheme. The CPUs are Intel(R) Core(TM) i7-8550U (1.80GHz, 1.99GHz) and the RAM is 16GB.
We use the “secp256r1” curve as the parameter set to implement the EC-ElGamal based DTAHE. For the security parameter , we set , and for our DTAHE scheme. When , the scheme constructed by the BGGJK-2 method in [20] requires . So we set according to the parameter table in [39]. Each data element in in our test occupies bits. The time costs of the three implementations are listed in the table II, which are measured in seconds. Apparently, a secure linear aggregation protocol with the EC-ElGamal based DTAHE takes the least time on user side. The protocol with our DTAHE scheme takes the least time on server side.
6 Conclusion
This paper shows a secure linear protocol mainly for complex federated learning models. When the communication cost is not the dominate factor, the protocol could be deployed in a federated learning model to protect the private inputs of users. The DTAHE schemes in the protocol may be further optimized to reduce the computation time. For example, parallel computing technologies may reduce the time cost of the EC-ElGamal based DTAHE, and multi-secret sharing schemes may reduce the communication and computation costs of our DTAHE scheme.
Acknowledgment
This work is supported by the National Key R&D Program of China (2017YFB0802500), Guangdong Major Project of Basic and Applied Basic Research(2019B030302008) and Natural Science Foundation of Guangdong Province of China (2018A0303130133).
References
- [1] H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas, “Federated learning of deep networks using model averaging,” CoRR, vol. abs/1602.05629, 2016. [Online]. Available: http://arxiv.org/abs/1602.05629
- [2] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’15. New York, NY, USA: ACM, 2015, pp. 1322–1333. [Online]. Available: http://doi.acm.org/10.1145/2810103.2813677
- [3] M. Al-Rubaie and J. M. Chang, “Reconstruction attacks against mobile-based continuous authentication systems in the cloud,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 12, pp. 2648–2663, Dec 2016.
- [4] C. Di, W. Leye, C. Kai, and Y. Qiang, “Secure federated matrix factorization,” in FML 2019 : The 1st International Workshop on Federated Machine Learning for User Privacy and Data Confidentiality, 2019.
- [5] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS’17. New York, NY, USA: Association for Computing Machinery, 2017, pp. 1175–1191. [Online]. Available: https://doi.org/10.1145/3133956.3133982
- [6] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’16. New York, NY, USA: Association for Computing Machinery, 2016, pp. 308–318. [Online]. Available: https://doi.org/10.1145/2976749.2978318
- [7] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, T. Q. S. Quek, and H. Vincent Poor, “Federated learning with differential privacy: Algorithms and performance analysis,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 3454–3469, 2020.
- [8] V. Rastogi and S. Nath, “Differentially private aggregation of distributed time-series with transformation and encryption,” in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’10. New York, NY, USA: Association for Computing Machinery, 2010, p. 735–746. [Online]. Available: https://doi.org/10.1145/1807167.1807247
- [9] M. Jawurek and F. Kerschbaum, “Fault-tolerant privacy-preserving statistics,” in Privacy Enhancing Technologies, S. Fischer-Hübner and M. Wright, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 221–238.
- [10] M. Joye and B. Libert, “A scalable scheme for privacy-preserving aggregation of time-series data,” in Financial Cryptography and Data Security, A.-R. Sadeghi, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 111–125.
- [11] I. Leontiadis, K. Elkhiyaoui, and R. Molva, “Private and dynamic time-series data aggregation with trust relaxation,” in Proceedings of the 13th International Conference on Cryptology and Network Security - Volume 8813. Berlin, Heidelberg: Springer-Verlag, 2014, p. 305–320. [Online]. Available: https://doi.org/10.1007/978-3-319-12280-9_20
- [12] E. Shi, T.-H. Chan, E. Rieffel, R. Chow, and D. Song, “Privacy-preserving aggregation of time-series data,” vol. 2, 01 2011.
- [13] T. H. H. Chan, E. Shi, and D. Song, “Privacy-preserving stream aggregation with fault tolerance,” in Financial Cryptography and Data Security, A. D. Keromytis, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 200–214.
- [14] I. Leontiadis, K. Elkhiyaoui, M. Önen, and R. Molva, “Puda – privacy and unforgeability for data aggregation,” Cryptology ePrint Archive, Report 2015/562, 2015, https://eprint.iacr.org/2015/562.
- [15] Q. Li and G. Cao, “Efficient privacy-preserving stream aggregation in mobile sensing with low aggregation error,” in Privacy Enhancing Technologies, E. De Cristofaro and M. Wright, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 60–81.
- [16] Y. Liu, Y. Liu, Z. Liu, Y. Liang, C. Meng, J. Zhang, and Y. Zheng, “Federated forest,” IEEE Transactions on Big Data, pp. 1–1, 2020.
- [17] H. H. Zhuo, W. Feng, Y. Lin, Q. Xu, and Q. Yang, “Federated deep reinforcement learning,” 2020.
- [18] J. L. H. Crawford, C. Gentry, S. Halevi, D. Platt, and V. Shoup, “Doing real work with FHE: the case of logistic regression,” in Proceedings of the 6th Workshop on Encrypted Computing & Applied Homomorphic Cryptography, WAHC@CCS 2018, Toronto, ON, Canada, October 19, 2018, 2018, pp. 1–12. [Online]. Available: https://doi.org/10.1145/3267973.3267974
- [19] R. Bendlin and I. Damgård, “Threshold decryption and zero-knowledge proofs for lattice-based cryptosystems,” in Theory of Cryptography, D. Micciancio, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 201–218.
- [20] D. Boneh, R. Gennaro, S. Goldfeder, A. Jain, S. Kim, P. M. R. Rasmussen, and A. Sahai, “Threshold cryptosystems from threshold fully homomorphic encryption,” in Advances in Cryptology – CRYPTO 2018, H. Shacham and A. Boldyreva, Eds. Cham: Springer International Publishing, 2018, pp. 565–596.
- [21] T. P. Pedersen, “A threshold cryptosystem without a trusted party,” in Advances in Cryptology — EUROCRYPT ’91, D. W. Davies, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 1991, pp. 522–526.
- [22] J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption,” Cryptology ePrint Archive, Report 2012/144, 2012, https://eprint.iacr.org/2012/144.
- [23] X. Zhu, H. Li, and Y. Yu, “Blockchain-based privacy preserving deep learning,” in Information Security and Cryptology, F. Guo, X. Huang, and M. Yung, Eds. Cham: Springer International Publishing, 2019, pp. 370–383.
- [24] X. Bao, C. Su, Y. Xiong, W. Huang, and Y. Hu, “Flchain: A blockchain for auditable federated learning with trust and incentive,” in 2019 5th International Conference on Big Data Computing and Communications (BIGCOM), 2019, pp. 151–159.
- [25] Y. Lu, X. Huang, K. Zhang, S. Maharjan, and Y. Zhang, “Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles,” IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 4298–4311, 2020.
- [26] Y. Qu, L. Gao, T. H. Luan, Y. Xiang, S. Yu, B. Li, and G. Zheng, “Decentralized privacy using blockchain-enabled federated learning in fog computing,” IEEE Internet of Things Journal, vol. 7, no. 6, pp. 5171–5183, 2020.
- [27] J. Weng, J. Weng, J. Zhang, M. Li, Y. Zhang, and W. Luo, “Deepchain: Auditable and privacy-preserving deep learning with blockchain-based incentive,” IEEE Transactions on Dependable and Secure Computing, pp. 1–1, 2019.
- [28] S. R. Pokhrel and J. Choi, “Federated learning with blockchain for autonomous vehicles: Analysis and design challenges,” IEEE Transactions on Communications, pp. 1–1, 2020.
- [29] H. Kim, J. Park, M. Bennis, and S. Kim, “Blockchained on-device federated learning,” IEEE Communications Letters, vol. 24, no. 6, pp. 1279–1283, 2020.
- [30] Q. Wang, Y. Guo, X. Wang, T. Ji, L. Yu, and P. Li, “Ai at the edge: Blockchain-empowered secure multiparty learning with heterogeneous models,” IEEE Internet of Things Journal, pp. 1–1, 2020.
- [31] K. Sarpatwar, R. Vaculin, H. Min, G. Su, T. Heath, G. Ganapavarapu, and D. Dillenberger, Towards Enabling Trusted Artificial Intelligence via Blockchain. Cham: Springer International Publishing, 2019, pp. 137–153. [Online]. Available: https://doi.org/10.1007/978-3-030-17277-0_8
- [32] S. Awan, F. Li, B. Luo, and M. Liu, “Poster: A reliable and accountable privacy-preserving federated learning framework using the blockchain,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 2561–2563. [Online]. Available: https://doi.org/10.1145/3319535.3363256
- [33] M. Di Raimondo and R. Gennaro, “New approaches for deniable authentication,” in Proceedings of the 12th ACM Conference on Computer and Communications Security, ser. CCS’ 05. New York, NY, USA: Association for Computing Machinery, 2005, pp. 112–121. [Online]. Available: https://doi.org/10.1145/1102120.1102137
- [34] D. G. Wood, “Ethereum: a secure decentralised g generalised transaction ledger homestead,” http://gavwood.com/paper.pdf, 2014, [Online, Accessed 26-June-2020].
- [35] Z. Brakerski, “Fully homomorphic encryption without modulus switching from classical gapsvp,” in Advances in Cryptology – CRYPTO 2012, R. Safavi-Naini and R. Canetti, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 868–886.
- [36] “Microsoft SEAL (release 3.5),” https://github.com/Microsoft/SEAL, Apr. 2020, microsoft Research, Redmond, WA.
- [37] J. Herranz, D. Hofheinz, and E. Kiltz, “Some (in)sufficient conditions for secure hybrid encryption,” Inf. Comput., vol. 208, no. 11, p. 1243–1257, Nov. 2010. [Online]. Available: https://doi.org/10.1016/j.ic.2010.07.002
- [38] V. Lyubashevsky, C. Peikert, and O. Regev, “On ideal lattices and learning with errors over rings,” in Advances in Cryptology – EUROCRYPT 2010, H. Gilbert, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 1–23.
- [39] M. Albrecht, M. Chase, H. Chen, J. Ding, S. Goldwasser, S. Gorbunov, S. Halevi, J. Hoffstein, K. Laine, K. Lauter, S. Lokam, D. Micciancio, D. Moody, T. Morrison, A. Sahai, and V. Vaikuntanathan, “Homomorphic encryption security standard,” HomomorphicEncryption.org, Toronto, Canada, Tech. Rep., November 2018.