Neural Network Training With Homomorphic Encryption
Abstract
We introduce a novel method and implementation architecture to train neural networks which preserves the confidentiality of both the model and the data. Our method relies on homomorphic capability of lattice based encryption scheme. Our procedure is optimized for operations on packed ciphertexts in order to achieve efficient updates of the model parameters. Our method achieves a significant reduction of computations due to our way to perform multiplications and rotations on packed ciphertexts from a feedforward network to a back-propagation network. To verify the accuracy of the training model as well as the implementation feasibility, we tested our method on the Iris data set by using the CKKS scheme with Microsoft SEAL as a back end. Although our test implementation is for simple neural network training, we believe our basic implementation block can help the further applications for more complex neural network based use cases.
Keywords: secure computation, homomorphic encryption, neural Network
1 Introduction
Deep learning has become successful mostly because of hardware improvements and cloud computing. Besides the huge quantity of data that can be processed, it remains many issues regarding safety such as privacy and anonymity related issues. Cloud computing in its essence leaks information since private information leaves physically or systematically secure environment such as server reside in the company or institution. In the realm of deep learning, two important kinds of information can be leaked: the parameters that define the model and the data used to to train the model. Privacy-preserving machine learning have been paid attention over the last decade in order to prevent such leakage.
Privacy-preserving machine learning methods can be categorized into two main classes: secure multiparty computation, and homomorphic encryption. A secure multiparty approach creates a generic training protocol that allows several parties to compute a function on their joint inputs without sharing secrets. Mohassel and Zhang [13] presented privacy-preserving neural network training protocol based on a system with two untrusted servers. Although they show the feasibility of their protocol, the latter is very expensive both computationally and in communication. A homomorphic approach does computations over encrypted data as well as the encrypted model, or just over encrypted data alone if there is no need to conceal the model parameters. Homomorphic operations require only one server in general to train the model while a multiparty computation based approach would require the simultaneous interactions of all the parties involved. Research on machine learning inferential applications using homomorphic encryption can be found for instance in [2, 3, 14, 18, 6, 8, 13, 15, 16]. Applications for deep machine learning are however not so frequent [10].
2 Preliminaries
In this section, we shall recall the CKKS [5] scheme upon which our new application is based. Since our implementation uses the CKKS scheme backed with the Microsoft SEAL library [12], we also briefly summarize the latter’s functionalities. In addition, we review packing methods for that are crucial for machine learning applications that use homomorphic encryption. We review quickly the necessary material for training neural networks that motivated our work.
A CKKS scheme has its security based on the hardness of learning with errors (LWE) problem over a polynomial factor ring [11]. It has a different encoding and batching scheme than the BFV scheme [9]. One of the major difference is that CKKS scheme allows us to encode real number into polynomial slot through canonical embedding whereas BFV scheme allows only integer for encoding, CKKS scheme is preferable for our implementation. It supports leveled homomorphic encryption with chained modulus and relinearization as with the BFV scheme. The main functionalities are: (1) KeyGen, (2) Encoding, (3) Decoding, (4) Encryption, (5) Decryption, (6) Addition, and (7) Multiplication and (8) Rotation.
Packing and rotation techniques are central to compute over ciphertexts because they can be used to speed up linear transformations significantly. Packing is used to encrypt a list of elements to one ciphertext instead of encrypting each element into distinct ciphertexts. This is done through encoding the plaintext information into the subfield of the factor ring [17]. Rotation is the Galois mapping of each element from the subfield, which in practice, is used to add elements from one subfield to another. In this way, linear transformation can be written as a sequence of sums, multiplications and rotations over packed ciphertext. Multiplications and rotations are computationally heavy operations, hence, the design of efficient ciphertext packing methods are critical. We will study the effect of different packing methods on the computation load.
We focus now on how to perform matrix-vector multiplications efficiently using two packing algorithms: (1) a row vector packing method, and (2) a diagonal packing method. These two packing methods are useful to perform linear transformations over encrypted vector. First, consider the multiplication of a matrix with a vector whose dimensions are and , respectively. Following procedure is summarized in algorithm 2. Here, we denote for packed vector rotated to right direction by slots. Similarly is for packed vector rotated to left direction by slots. For instance, , . As shown on figure 1 below, row vector packing method packs each row of respectively which results in packed ciphertexts for and one for . After packing, multiplication is processed by multiplying each of the packed ciphertexts with the packed ciphertext .

In order to obtain the final result, each intermediate results are rotated and summed by itself so that the slot contains the inner product value of row-packed vector and . Once each inner products are obtained, multiplication with the vector with only one element 1 and the others 0 (we note this vector as unit vector) is applied to remove unnecessary elements. At the end, results are summed up to yield the output of linear transformation. In this case, this entire process consumes multiplications and rotations.
We describe now the diagonal packing method from gazelle [6]. Given a fixed matrix and and fixed input , packing is done along the diagonal direction instead of row direction as shown on figure 1 (right). Input, on the other hand, packed into one ciphertext as before. In this case, we conduct iteration which multiplies each of packed diagonal component of the weight with rotated packed input vector. Packed is rotated by one at each iteration step. After this iteration, a summation of those intermediate results yields to linear transformation. Major difference from row packing method is that there are no need to multiply ciphertext with the unit vector. In addition, since rotation is only applied to the input vector, the number of rotation can be fewer. Hence, the total computation cost is composed of multiplications and rotations. Procedure is summarized in algorithm 2. Although the number of multiplication depends on the shape of given matrix and of the input vector, diagonal method is more efficient in most of the situations.
Algorithm 1 Algorithm of matrix-vector multiplication with row packing
Algorithm 2 Algorithm of matrix-vector multiplication with diagonal packing
We briefly review neural network training procedures. For simplicity, we show a simple neural network model which consists of three layers: the input layer, the middle hidden layer and the output layer. A nonlinear layer called activation layer is placed after each linear transformation. The generic structure of a neural network is as on figure 2.

Neural network training is decomposed into two main parts: feedforward and backpropagation. Feedforward is made by the composition of a linear transformation layer followed by a non-linear transformation layer, called activation layer. We initialize the entries of the weight matrix and the biases using a gaussian distribution. The dataset which consists of an input vector and a label that is fed to the first layer then linearly transformed using the weights and the biases. An output is transformed non-linearly using the activation layer. The output of the activation layer becomes the input for the second layer and this continues until we reach the last layer. The output from the last layer becomes the prediction result of that specific input vector. In order to improve the prediction, a the loss between the inferred results and the actual labels is calculated and passed to the backpropagation. Backpropagation is generally the reverse process of feedforward. For the backpropagation process, updates of the weights are calculated to optimize the model towards a better prediction. On figure 3 below, the input goes through two layers by feedforward and the loss is evaluated by taking the mean square error between the inferred result and the true label. Once the loss is calculated, backpropagation allows to compute partial derivatives of the loss with respect to parameters. For the update of the weight parameter, this partial derivative is used to minimize the loss function over the training. Along this backpropagation step, the transpose of packed encrypted weight matrix is required figure 3

Optimizations of the following primitive operations are thus crucial to perfect efficiently neural network training with homomorphic encryption: (1) matrix-vector operations to perform the feedforward phase, (2) computation of the transposition weight matrix for the backpropagation, (3) matrix-vector operations within the backpropagation step, and (4) method to computer the activation function. For (1) and (3), as proposed in gazelle [6] implementation, we pack along the diagonal. Square activation is used for (4) and also as a homomorphic friendly activation function as in cryptonets [7, 10]. We discuss (2) in the next section since the efficient transposition of packed matrix is not discussed in the previous literature.
3 Results and related work
We mentioned previously that the optimization of the computation of the linear transformation is important in order to keep the runtime and the memory size small enough to be practical. Optimization efforts for homomorphic operations on machine learning algorithms such as convolutional neural network are done by [18, 6, 4] which also make use of packing method for the inputs. However, the aforementioned articles focus on the inference part rather than training the model. Recently, [10] proposed an attempt to train a logistic regression model using packed information to allow end-to-end training. Nevertheless, to the best of our knowledge, only a few implementations of neural network training have been done. Our contribution is as follow:
-
1.
A novel packing method for the matrix of weights to achieve efficient overall training runtime. As a result, our design keeps the total number of heavy homomorphic operations significantly lower than other methods. In addition, the depth of circuit for multiplication is kept small which yields to faster and lighter implementation.
-
2.
A detailed implementation which was not done extensively in the industry before.
-
3.
A tested implementation of our method on well-known iris dataset using Microsoft SEAL.
3.1 Our matrix transposition method
In this section, we explain our algorithm which transpose the packed matrix of weights that occurs during the transition from feedforward to backpropagation. Diagonal packing method is adopted throughout the training procedure due to the following reasons:
-
1.
Number of rotations is intrinsically smaller.
-
2.
Difference between multiplication of row and diagonal packing is compensated by the training procedure.
-
3.
No multiplication is needed for creating transpose matrix, therefore, no extra circuit depth is necessary for taking transpose.
For (1), as is mentioned previously, the number of rotation operations upon linear transformation can be reduced with diagonal packing. For (2), although it is mentioned that the number of multiplication depends on both the input and output dimensions for the linear transformation, these differences are compensated because the training process has forward and backward propagations. The most significant point is (3) because, with our method, diagonal packed weight does not require any multiplications but only rotations to transpose the packed matrix. As a result, our configuration can have smaller modulus chain for multiplication circuit, which significantly benefit us for computationally lighter design.
For instance, let the input dimension of a dense weight matrix be and its output dimension be , that is the weight matrix has size . We denote the original packed weight matrix by and the packed transpose weight matrix by . Hence is consists of ciphertexts, each of them are packed row of original matrix. Also consists of ciphertexts, each of them are original packed components of the transpose weight matrix as shown in figure 4. If we adopt row packing for packing algorithm, making components of from takes (1) multiplication by unit vector, (2) rotation, and (3) summation. This procedure is shown in algorithm 3.1. We observe that the fact that components of and have few structural similarities made this procedure tedious. In the example, multiplications and rotations are needed in total to generate out of .

Algorithm 3 Algorithm of taking transpose with row packing
On the other hand, with diagonal method, packed components have good structural similarities with the packed components of transposed matrix . Therefore, creating out of is a lot easier: (i) no multiplication by unit vector, (ii) less rotation needed from row packing case. As in example, 0 multiplication and rotation needed where and has a relationship as .
Not only this algorithm significantly reduces the multiplication cost in total computation, it also keeps the circuit depth lower for training since no multiplication consumed. Therefore, it allows us to execute the training with relatively lighter encryption configuration, which helps reduces computational overhead significantly. We generalized this algorithm for generation of packed transposed matrix out of packed matrix with only rotation and summation on figure 5.

If we denote rotated to be the rotated in right direction of originally packed ciphertext , on the right side of figure 5, can be expressed with summation of , and . , however, is little bit more tricky, can be expressed with the summation of , and . Following these, all the transposed packed matrix can be expressed by rotation and summation of as follows. The generalized equation is summarized below.
In addition, if we prepare the packed matrix with pre-rotated “stepped” packing, which means that we prepare as preprocessed weight matrix before encryption as shown in figure 6. This preprocess can result in the reduction of on the fly rotation operation over encrypted weight. Algorithm 3.1 shows the procedure of this method.
which has total rotation reduced down to .

Here, table LABEL:table:outtransitionmethod1 summarizes the computation cost for each method.
Packing Method | Multiplication | Rotation |
---|---|---|
Row Packing | ||
Diagonal Packing | ||
Diagonal Packing with Step |
Algorithm 4 Algorithm of taking transpose with diagonal packing
Algorithm 5 Algorithm of taking transpose with diagonal packing
Once this algorithm is applied, packed matrix of weights is used to feedforward, and is then transposed to propagate backward to update weights. As shown in [10], generated updates of the weights in this transition method has the same structure as the original weight structure. Therefore, update of the weights can also be done in server side. The advantage of this is that we can apply bootstrapping [1] in order to complete the training process without sending ciphertext back to the client. However, in our implementation, we send the ciphertext back to client for weight update for each batch iteration for simplicity. It is important here to understand very well the underlying computational tasks in order to fit a good cryptographic solution. Here, to show the concrete example, we consider the example of a neural network with input size , hidden size , and output size . Table LABEL:table:ourtransitionmethod2 shows the number of ciphertext multiplications and rotations needed during the training procedure: (1) by naively using row packing and (2) by using diagonal packing with our transpose method. We observe that the total number of rotations and multiplications is reduced using our method by almost folds with multiplication circuit depth kept minimum.
Diagonal | Row | ||||||
---|---|---|---|---|---|---|---|
mult | rot | mult | rot | ||||
FF | Dense | ||||||
square actv | |||||||
Dense | |||||||
square actv | |||||||
Transition | transition | ||||||
BP | Dense | ||||||
square actv | |||||||
Dense | |||||||
square actv | |||||||
Total |
As can be seen, total number of multiplication and rotation reduced from to using our method with diagonal packing and huge reduction comes from the optimization of transition process. These reduction can be more significant with larger matrix size, which is the usual case for real world application.
4 Implementation
To verify the advantage of our method, we implemented a neural network model with one hidden layer for the iris dataset classification. Iris dataset is a well-known tutorial dataset for machine learning, which contains datasets and labels of kinds of iris in total. Each element of the dataset consists four iris properties that are flagged for membership. We design our neural network to have one hidden layer as in [7, 10] and described as in table LABEL:table:impl1.
layer name | dim input | dim output | dataset | iris |
---|---|---|---|---|
Dense | Learning Rate | |||
Square Activation | Batch Size | |||
Dense | Loss Function | SGD | ||
Square Activation | Epochs |
Figure 7 shows the execution flow of training. Repacking of the weight matrix is done in order to prevent the growth of noise per batch training.

The repacking is done by transferring encrypted data to a trusted environment temporarily to perform decryption and re-encrypt of the weight matrix that is sent back to the server. However, as we mentioned earlier, this communication is not necessary when bootstrapping is applied.
We use Microsoft SEAL library as a back end. Encryption parameters for the degree of modulus is and bit sequence of modulus chain is to be in the multiset ( levels of modulus chains available. On the other hand, encryption parameters for the row method has levels, because of the extra multiplication with the unit vector. The machine specifications for this experiment are: Intel(R) Core(TM) i7-8700K CPU @ GHz, GB RAM and cores.
Table 4 summarizes the running time for each steps of training: (1) feedforward, (2) taking transpose, (3) backpropagation, (4) SGD update and (5) row repacking versus diagonal repacking. For simplicity, the batch size is set to one for this comparison. We observe that the entire process gets faster by times with our optimization.
process | row | diagonal |
---|---|---|
Feedforward | ||
Transition | ||
Backpropagation | ||
SGD Update | ||
Repacking | ||
Total |
To assert the correctness of our method, training is done entirely with our method. We use same model architecture and a batch size of instead. Batch process is parallelized by Open MP with threads, which gave us around twice computation performance. Total time for entire process was seconds which is about hours with epochs. In addition, we wrote a program of training this model with plaintext using same architecture. This code is used to compare the accuracy of a model trained over plaintext and a model trained over ciphertext through CKKS. The mean square loss function and the learning accuracy transition curve per epoch are plotted side by side on figures 8 and 9. After epochs, final loss function is for plaintext-based model and for ciphertext-based model. On the other hand, final accuracy is for plaintext-based model and for ciphertext-based model. As can be seen, intrinsic approximation of real number of CKKS scheme did not affect the model accuracy.


5 Conclusion and further research
The needs for privacy-preserving solutions in untrusted environments in which machine learning algorithms are executed are essential nowadays. To achieve such privacy-preserving algorithms, homomorphic encryption has gone out of academia to be now operational at an industrial scale. We proposed an efficient privacy-preserving neural network training architecture that relies on a packing method to optimize the transposition of matrices. As a consequence, training neural networks homomorphically yields to an improvement about times with respect to other architectures. We tested its advantages on the well-known iris dataset using the SEAL library. We are investigating how our method can be used to train more complex neural network. Homomorphic encryption based neural network training is still a big challenge in terms of computation, however, we believe that it can be a huge breakthrough for privacy preserving technology in coming years.
References
- [1] Hao Chen, Ilaria Chillotti, and Yongsoo Song. Improved bootstrapping for approximate homomorphic encryption. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 34–54. Springer, 2019.
- [2] Ahmad Al Badawi and Jin Chao and Jie Lin and Chan Fook Mun and Jun Jie Sim and Benjamin Hong Meng Tan and Xiao Nan and Khin Mi Mi Aung and Vijay Ramaseshan Chandrasekhar. Towards the alexnet moment for homomorphic encryption: Hcnn, thefirst homomorphic cnn on encrypted data with gpus, 2020.
- [3] Bourse, Florian and Minelli, Michele and Minihold, Matthias and Paillier, Pascal. Fast homomorphic evaluation of deep discretized neural networks. In Hovav Shacham and Alexandra Boldyreva, editors, Advances in Cryptology – CRYPTO 2018, pages 483–512, Cham, 2018. Springer International Publishing.
- [4] Jin Chao and Ahmad Al Badawi and Balagopal Unnikrishnan and Jie Lin and Chan Fook Mun and James M. Brown and J. Peter Campbell and Michael Chiang and Jayashree Kalpathy-Cramer and Vijay Ramaseshan Chandrasekhar and Pavitra Krishnaswamy and Khin Mi Mi Aung. Carenets: Compact and resource-efficient cnn for homomorphic inference on encrypted medical images, 2019.
- [5] Cheon, Jung Hee and Kim, Andrey and Kim, Miran and Song, Yongsoo. Homomorphic encryption for arithmetic of approximate numbers. In Tsuyoshi Takagi and Thomas Peyrin, editors, Advances in Cryptology – ASIACRYPT 2017, pages 409–437, Cham, 2017. Springer International Publishing.
- [6] Juvekar, Chiraag and Vaikuntanathan, Vinod and Chandrakasan, Anantha. Gazelle: A low latency framework for secure neural network inference. In Proceedings of the 27th USENIX Conference on Security Symposium, SEC’18, page 1651–1668, USA, 2018. USENIX Association.
- [7] Dowlin, Nathan and Gilad-Bachrach, Ran and Laine, Kim and Lauter, Kristin and Naehrig, Michael and Wernsing, John. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 201–210, New York, NY, USA, 2016. JMLR.org.
- [8] Liu, Jian and Juuti, Mika and Lu, Yao and Asokan, N. Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, page 619–631, New York, NY, USA, 2017. Association for Computing Machinery.
- [9] Fan, Junfeng and Vercauteren, Frederik. Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive, Report 2012/144, 2012. https://eprint.iacr.org/2012/144.
- [10] Kim, Andrey and Song, Yongsoo and Kim, Miran and Lee, Keewoo and Cheon, Jung Hee. Logistic regression model training based on the approximate homomorphic encryption. In Proceedings of the 6th iDASH Privacy and Security Workshop, volume 11, California, 2018. BMC Medical Genomics.
- [11] Lyubashevsky, Vadim and Peikert, Chris and Regev, Oded. On ideal lattices and learning with errors over rings. In Henri Gilbert, editor, Advances in Cryptology – EUROCRYPT 2010, pages 1–23, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
- [12] Microsoft. Microsoft SEAL (release 3.4). https://github.com/Microsoft/SEAL, October 2019. Microsoft Research, Redmond, WA.
- [13] Mohassel, Payman and Zhang, Yupeng. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP), pages 19–38, San Jose, California, 2017. IEEE.
- [14] Dathathri, Roshan and Saarikivi, Olli and Chen, Hao and Laine, Kim and Lauter, Kristin and Maleki, Saeed and Musuvathi, Madanlal and Mytkowicz, Todd. Chet: An optimizing compiler for fully-homomorphic neural-network inferencing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, page 142–156, New York, NY, USA, 2019. Association for Computing Machinery.
- [15] Rouhani, Bita Darvish and Riazi, M. Sadegh and Koushanfar, Farinaz. Deepsecure: Scalable provably-secure deep learning. In Proceedings of the 55th Annual Design Automation Conference, DAC ’18, New York, NY, USA, 2018. Association for Computing Machinery.
- [16] Sakuma, Jun and Lu, Wen-jie. Crypt-cnn(ii): Cryptographically evaluate non-linear convolutional neural network. In Proceedings of Computer Security Symposium, pages 773–780, Japan, 10 2017.
- [17] Smart, Nigel P. and Vercauteren, Frederick. Fully homomorphic simd operations. Des. Codes Cryptography, 71(1):57–81, 2014.
- [18] Jiang, Xiaoqian and Kim, Miran and Lauter, Kristin and Song, Yongsoo. Secure outsourced matrix computation and application to neural networks. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS ’18, page 1209–1222, New York, NY, USA, 2018. Association for Computing Machinery.