An Encryption Method of ConvMixer Models without Performance Degradation
Abstract
In this paper, we propose an encryption method for ConvMixer models with a secret key. Encryption methods for DNN models have been studied to achieve adversarial defense, model protection and privacy-preserving image classification. However, the use of conventional encryption methods degrades the performance of models compared with that of plain models. Accordingly, we propose a novel method for encrypting ConvMixer models. The method is carried out on the basis of an embedding architecture that ConvMixer has, and models encrypted with the method can have the same performance as models trained with plain images only when using test images encrypted with a secret key. In addition, the proposed method does not require any specially prepared data for model training or network modification. In an experiment, the effectiveness of the proposed method is evaluated in terms of classification accuracy and model protection in an image classification task on the CIFAR10 dataset.
Keywords:
Image encryption; ConvMixer; DNN; Privacy preserving
1 Introduction
Deep neural network (DNN) models have been deployed in many applications including security-critical ones such as biometric authentication, automatic driving, and medical image analysis [1, 2]. However, they have been exposed to various threats such as adversarial examples, unauthorized access, and data leaks. Accordingly, it has been challenging to train/test a machine learning (ML) model with encrypted images as one way for solving these issues [3]. However, conventional methods that use models trained with encrypted images have caused performance to degrade compared with models trained with plain images.
Accordingly, in this paper, we propose a novel method based on a unique feature of ConvMixer [4], and it can overcome the above problems. In the method, a model trained with plain images is encrypted with a secret key. Also, to adapt to model encryption, test images are transformed with the same key. The proposed method allows us not only to obtain the same performance as models trained with plain images but to also update the secret key easily. In an experiment, the effectiveness of the proposed method is evaluated in terms of performance degradation and model protection performance in an image classification task on the CIFAR-10 dataset.
2 Related Work
Conventional methods for encrypting DNN models and ConvMixer are summarized here.
2.1 Model Encryption with Secret Key
Many model encryption methods have been studied to be applied to adversarial defense, model protection [5, 6, 7, 8, 9] and privacy-preserving image classification [3, 10, 11, 12, 13, 14, 15]. Almost all model encryption methods are carried out by training models with images encrypted with a secret key, but the methods can degrade the performance of the models compared with that when using non-encrypted models due to the influence of encryption.
Model encryption methods have to satisfy two requirements. The first requirement is that authorized users with a secret key can obtain almost the same performance as that of non-encrypted models from encrypted models. The second is that performance of the encrypted models is not high for unauthorized users without a correct key. The proposed method aims not only to avoid the influence of encryption but to also provide an extremely degraded accuracy to unauthorized users.
2.2 ConvMixer
ConvMixer [4] is well-known to have a high performance in image classification tasks, even though it has a small number of model parameters. ConvMixer is a type of isotropic network. It is inspired by the vision transformer (ViT) [16], so the architecture has a unique feature, called patch embedding.
Figure 1 shows the architecture of the network, which consists of two main structures: patch embedding and ConvMixer layer. First, an input image is replaced with by patch embedding with patch size and embedding dimension as
(1) |
where , , and are the height, width, and the number of channels of . Also, is a convolution operation with input channels and output channels, is a batch normalization operation, and is an activation function. In addition, to simplify the discussion, we assume that and are divisible by . Next, is transformed into by using ConvMixer layers. Each layer consists of depthwise convolution and pointwise convolution as follows.
(2) | ||||
Finally, the output of the th ConvMixer layer is transformed by Global Average Pooling and a softmax function to obtain a result.
In this paper, we utilize patch embedding in Eq.(1) to encrypt a model. Patch embedding can be done in two steps.
-
1.
Reshape an input image into a sequence of flattened patches , where is the number of patches.
-
2.
Map each patch to with dimensions of as
(3)
A kernel in in Eq.(1) corresponds to in Eq.(3). In this paper, we show that model encryption can be carried out by transforming with a secret key. Also, this encryption does not degrade the performance of ConvMixer.

3 Proposed Encryption Method
Both a novel method for encrypting models and images, and the combined use of encrypted models and images are proposed here.
3.1 Overview
Figure 2 shows an overview of the proposed method, where it is assumed that the third party is trusted, and the provider is untrusted. The third party trains a model by using plain images and transforms the trained model with a secret key. The transformed model is given to the provider, and the key is sent to a client. The client prepares a transformed test image with the key and sends it to the provider. The provider applies it to the transformed model to obtain a classification result, and the result is sent back to the client. Note that the provider has neither a key nor plain images. The proposed method enables us to achieve this without any performance degradation compared with the use of plain images.


3.2 Image Transformation
First, we address a block-wise image transformation method with a secret key to encrypt test images. As shown in Fig. 3, the procedure of the transformation consists of three steps: block segmentation, block transformation, and block integration. To transform an image , we first divide into blocks, as in , where is the number of blocks across width , is the number of blocks across height , and is the block size. In this paper, we assume that the block size of the segmentation is the same as the patch size of ConvMixer. Next, each block is flattened, and it is concatenated again to obtain a block image , where is the number of pixels in each block. Then, is transformed to in accordance with block transformation with key . Finally, is transformed so that it has the same dimensions as those of the original image , and encrypted image is obtained.
In addition, the block transformation is carried out by using the three operations shown in Fig. 3. Details on each operation are given below.
A Pixel Shuffling
-
1.
Generate a random permutation vector by using a key , where , and if .
-
2.
Pixels in each block are shuffled by vector as
(4)
B Bit Flipping
-
1.
Convert every pixel value to scale with 8 bits (i.e., multiply by 255).
-
2.
Generate a random binary vector , by using a key . To keep the transformation consistent, is distributed with of “0”s and of “1”s.
-
3.
Apply negative-positive transformation on the basis of as
(5) where is an exclusive disjunction, and is the number of bits used in .
-
4.
Convert every pixel value back to scale (i.e., divide by 255).
Since is a floating point number between 0 and 1, bit flipping can also be expressed without scaling as follows.
(6) |
C Normalization
Various normalization methods are widely used to improve the training stability, optimization efficiency, and generalization ability of DNNs. In this paper, we also use a normalization method to achieve the combined use of transformed images and models.
In the normalization used in this paper, a pixel is replaced with as
(7) | ||||
Note that is satisfied from Eq.(6). From Eq.(7), bit flipping with normalization can be regarded as an operation that reverses the positive or negative sign of a pixel value. This property allows us to use the model encryption that will be described later.


3.3 Model Transformation
In model transformation, some parameters in models trained with plain images are transformed by using a secret key. In this paper, a model transformation method is proposed to achieve the combined use of models and images transformed with the same key.
ConvMixer utilizes patch embedding (see Fig. 1), so it has the ability to adapt to the pixel order by patch embedding; patch embedding can be adapted to pixel shuffling and bit flipping because they can be expressed as an invertible linear transformation.
In the proposed method, it is assumed that the patch size used for patching is the same as the block size used for image encryption, and the number of patches is equal to that of blocks in an image. The transformation of parameters in trained models is described below.
A Adaption to Pixel Shuffling
In patch embedding, flattened patches are mapped to vectors with a dimension of as in Eq.(3). When the patch size of ConvMixer is equal to the block size for image transformation, is satisfied. Therefore, the permutation of rows in corresponds to pixel shuffling, so the model can be encrypted with key used for pixel shuffling. The accuracy of the transformed model is high only when test images are encrypted by using pixel shuffling with key . A permutation matrix is defined with key , and the transformation from matrix to is shown as follows.
(8) |
B Adaption to Bit Flipping
In addition, as shown in Eq.(7), bit flipping with normalization can be regarded as an operation that randomly inverses the positive/negative sign of a pixel value. Therefore, we can encrypt a model by inverting the sign of the rows in matrix with key used for bit flipping. The transformed model offers a high accuracy only for test images transformed by bit flipping with key . Using key to generate the same vector used in bit flipping, the transformation from to can be expressed as follows.
(9) |
where and are the th rows of matrices and .
4 Experiment and Discussion
In an experiment, the effectiveness of the proposed method is shown in terms of image classification accuracy and model protection performance.
4.1 Experiment Setup
To confirm the effectiveness of the proposed method, we evaluated the accuracy of an image classification task on the CIFAR-10 dataset (with 10 classes). CIFAR-10 consists of 60,000 color images (dimension of ), where 50,000 images are for training, 10,000 for testing, and each class contains 6,000 images. Images in the dataset were transformed by the proposed encryption algorithm, where the block size was .
We used the PyTorch [17] implementation of ConvMixer, where the patch size was , the number of channels after patch embedding was , the kernel size of depthwise convolution was , and the number of ConvMixer layers was . The ConvMixer model was trained for epochs with Adam, where the learning rate was .
4.2 Image Classification
First, we evaluated the proposed method in terms of the accuracy of image classification under the use of ConvMixer. Table 1 shows the classification result of ConvMixer. “Proposed” means that ConvMixer models and test images were transformed by the proposed method. As shown in Table 1, the proposed method did not degrade the performance at all for the transformed images. In contrast, the performance for plain images was degraded. Therefore, the proposed method was effective for model protection.
Test Image | ||
---|---|---|
Model | Plain | Proposed |
baseline | 90.46 | - |
Proposed | 11.41 | 90.46 |
4.3 Model Protection
Next, we confirmed the performance of images encrypted with a different key from that used in model encryption. We prepared 100 random keys, and test images encrypted with the keys were input to the encrypted model. From the box plot in Fig. 5, the accuracy of the models was not high under the use of wrong keys. Accordingly, the proposed method was confirmed to be robust against a random key attack.
The use of a large key spaces enhances robustness against various attacks in general. In this experiment, the key space of pixel shuffling and bit flipping ( and ) are given by and . Therefore, the key space of the proposed method is . The key space is sufficiently large, so it is difficult to find the correct key by random key estimation.

5 Conclusion
In this paper, we proposed the combined use of an image transformation method with a secret key and ConvMixer models transformed with a key. The proposed method enables us not only to use visually protected images but to also maintain the same classification accuracy as that of models trained with plain images. In addition, in an experiment, the proposed method was demonstrated to be robust against a random key attack.
Acknowledgments
This study was partially supported by JSPS KAKENHI (Grant Number ) and JST CREST (Grant Number ).
References
- [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
- [2] X. Liu, Z. Deng, and Y. Yang, “Recent progress in semantic image segmentation,” Artif. Intell. Rev., vol. 52, no. 2, pp. 1089–1106, 2019.
- [3] H. Kiya, A. P. M. Maung, Y. Kinoshita, S. Imaizumi, and S. Shiota, “An overview of compressible and learnable image transformation with secret key and its applications,” APSIPA Transactions on Signal and Information Processing, vol. 11, no. 1, e11, 2022. [Online]. Available: http://dx.doi.org/10.1561/116.00000048
- [4] A. Trockman and J. Z. Kolter, “Patches are all you need?” arXiv preprint arXiv:2201.09792, 2022. [Online]. Available: https://arxiv.org/abs/2201.09792
- [5] M. Aprilpyone and H. Kiya, “Block-wise image transformation with secret key for adversarially robust defense,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 2709–2723, 2021.
- [6] ——, “Encryption inspired adversarial defense for visual classification,” in 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 1681–1685.
- [7] ——, “Ensemble of key-based models: Defense against black-box adversarial attacks,” in 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), 2021, pp. 95–98.
- [8] ——, “A protection method of trained cnn model with a secret key from unauthorized access,” APSIPA Transactions on Signal and Information Processing, vol. 10, p. e10, 2021.
- [9] M. AprilPyone and H. Kiya, “Privacy-preserving image classification using an isotropic network,” IEEE MultiMedia, vol. 29, no. 2, pp. 23–33, 2022.
- [10] A. Kawamura, Y. Kinoshita, T. Nakachi, S. Shiota, and H. Kiya, “A privacy-preserving machine learning scheme using etc images,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E103.A, no. 12, pp. 1571–1578, 2020.
- [11] Y. Bandoh, T. Nakachi, and H. Kiya, “Distributed secure sparse modeling based on random unitary transform,” IEEE Access, vol. 8, pp. 211 762–211 772, 2020.
- [12] I. Nakamura, Y. Tonomura, and H. Kiya, “Unitary transform-based template protection and its application to -norm minimization problems,” IEICE Transactions on Information and Systems, vol. E99.D, no. 1, pp. 60–68, 2016.
- [13] T. Maekawa, A. Kwamura, T. Nakachi, and H. Kiya, “Privacy-preserving support vector machine computing using random unitary transformation,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E102.A, no. 12, pp. 1849–1855, 2019.
- [14] K. Madono, M. Tanaka, M. Onishi, and T. Ogawa, “Block-wise scrambled image recognition using adaptation network,” Artificial Intelligence of Things (AIoT), Workshop on AAAI conference Artificial Intelligence (AAAI-WS), 2020.
- [15] W. Sirichotedumrong and H. Kiya, “A gan-based image transformation scheme for privacy-preserving deep neural networks,” Proceedings of European Signal Processing Conference (EUSIPCO), pp. 745–749, 2021.
- [16] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
- [17] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 8024–8035.