On Classifying Images using Quantum Image Representation
Abstract
In this paper, we consider different Quantum Image Representation Methods to encode images into quantum states and then use a Quantum Machine Learning pipeline to classify the images. We provide encouraging results on classifying benchmark datasets of grayscale and colour images using two different classifiers. We also test multi-class classification performance.
keywords:
Autoencoder, FRQI, Machine Learning, MCQI, Variational Classifier, Quantum Image Representation1 INTRODUCTION
Quantum Image Representation (QIR) is a catch-all term for methods to encode an image as a quantum state. General data encoding methods [1] exist, such as Angle Encoding and Amplitude Encoding. But these do not take advantage of the specific structure of image data. Thus, unique representation methods have been invented to tackle image data and its representation as a quantum state using a quantum circuit.
A number of methods can be found in the literature. We have considered Flexible Representation of Quantum Images (FRQI) [2] for grayscale images and Multi-Channel Representation for Quantum Image (MCQI) [3] for colour images to encode the image and later use a quantum classifier on the encoded image to perform binary and multi-class classification.
We use a variational quantum classifier (VQC) and an autoencoder classifier (AC) to classify the images. We have obtained encouraging results using some classic benchmark datasets.
The paper is organised as follows. In Section 2 , we give a brief summary of the QIR methods used in this study. In Section 3 , we describe the quantum classifiers used for classifying the images. Section 4 details the datasets used in the paper. In Section 5 we give the implementation details. Section 6 presents the classification results obtained from our simulations. In Section 7 we conclude and provide a few markers for future work.
2 Quantum Image Representation Methods
In this section we give a summary of the QIR methods used in the text. First some details of images:
-
1.
Image dimension ( means a image pixels)
-
2.
Gray scale images Pixel values In binary
Number of Matrix Elements -
3.
Color images RGB, pixel values of each channel
Number of Matrix Elements
2.1 FRQI
FRQI encodes the image data into a quantum state given by:
(1) |
Here, and are single qubit computational basis states and are qubit computational basis states. The part is used to encode the pixel values while encodes the pixel location.
The circuit to encode the image can be constructed using Hadamard () and Control Rotation, , gates. It needs to be measured multiple times to get back the image from the state. The image retrieval process is probabilistic, and the result will depend on the number of shots used.
The number of qubits used in this representation is with qubits to encode the pixel location and 1 qubit to encode the pixel values. Pixel values are encoded as angles and are thus scaled to fit in the range .
2.2 MCQI
MCQI representation uses qubits to encode colour images while using qubits to encode the pixel location like FRQI and the 3 remaining qubits to encode the pixel values of the RGB channels. This encoding is inspired by FRQI.
MCQI encodes the image into a quantum state given by:
(2) |
The color information is encoded in:
Colour encoding angle is applied to the R channel qubit using Control Rotation () gates where it is controlled by the G and B channel qubits, and for each pixel, 3 gates are applied to encode the position and value information. The values are calculated from pixel values:
where, is the pixel values . We get in this range by dividing the integer pixel values by 255.
As before, the image retrieval process is probabilistic and depends on the number of shots.
3 Quantum Classifiers
We have used two different methods to classify the images in this paper. We describe these methods below.
3.1 Variational Quantum Classifier
To classify the images, we need some value to distinguish the two classes. The Z expectation value of the first qubit gives a natural split. We apply a variational ansatz on the quantum image and measure the expectation value of the colour qubit for FRQI and the R channel qubit for MCQI. The expectation value lies in the range , and thus a split can be formed such that:
(5) |
where is the split set to 0 by default but can be trained to get optimal results.
3.1.1 Ansatz
We use a straightforward ansatz that consists of a general single-qubit rotation on each qubit and then a layer of CNOT gates, as shown in Fig 1. The number of layers of the ansatz is a hyperparameter. The number of parameters in the classifier where number of qubits and number of layers.

3.2 Autoencoder Classifier
An autoencoder is a tool to reduce the dimension of data. We use the autoencoder’s quantum analogue [4] to compress the image state into a single qubit state. To use it as a classifier, we train the autoencoder to only compress the positive class. The autoencoder is trained by maximising the fidelity of the trash qubits with the zero state () where is the number of trash qubits. Once the autoencoder is trained on the positive class of the training data, we then use it to compress the validation data. We then measure the fidelity of the trash qubits again with the zero state and classify them depending on the resulting fidelity. The positive class should have higher fidelity values than the negative class. A single layer of the autoencoder is shown in Fig 2.

4 Dataset Details
We use two grayscale and one colour image datasets.
4.1 Bars and Stripes (BAS)
This dataset contains black and white images of dimension . Example images are shown in Fig 3 for . These images are randomly generated. Horizontal stripes are one class, and vertical bars are another class. This dataset is used for binary classification.


4.2 MNIST










This is the famous dataset of handwritten digits [5]. We have used this for both binary (0 and 1) and multi-class (0, 1 and 2) classification. The original images are of dimension, which first needs to be squared into . Bilinear interpolation is used to transform the data for different . There also exists another version of this data which contains 15 different corruption variations [6]. We also classify these corrupted datasets. Fig 4 shows the MNIST images, and Fig 5 shows the corrupted images.















4.3 Color Images
This is randomly generated colour image data for classification. The image has 4 pixels of random colours. For positive class we change the pixel values of the pixel to . This makes the pixel black. The classification problem is then to differentiate between images with and without a black pixel. The pixel can also be modified to have dark shades instead of absolute 0 values. Fig 6 shows example images for different shades.






5 Implementation Details
We use PennyLane [7] to simulate the circuits. JAX [8] is combined with PennyLane as a high-performance simulator and to utilise the GPU. The optimisation library optax [9] is used for optimising the classifier. We use the Adam optimiser [10] with a 0.1 step size for optimisation for 250 epochs unless otherwise mentioned. We use 5 layers of the VQC and 1 layer of the AC. We also use scikit-learn [11] for classical processing. All simulations were done with ‘None’ shots of the PennyLane device. This gives analytic results. We use different sizes of training datasets and a fixed 1000 data points for the validation data.
6 Results
6.1 BAS
Table 1 shows accuracies on the validation set using different training dataset sizes and values for the BAS data.
|
Classifier | ||||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | ||||
100 | VQC | 1.0 | 1.0 | 0.955 | 0.993 | ||
AC | 1.0 | 0.818 | 0.820 | 0.889 | |||
200 | VQC | 1.0 | 1.0 | 0.997 | 0.990 | ||
AC | 1.0 | 0.929 | 0.881 | 0.732 | |||
500 | VQC | 1.0 | 1.0 | 0.993 | 0.991 | ||
AC | 1.0 | 0.858 | 0.842 | 0.866 |
6.2 MNIST
Table 2 shows accuracies on the validation set using different training dataset sizes and values for the MNIST data. For AC we have used 100 epochs for with MNIST data. Table 3 shows accuracies on the validation set for the MNIST Corruption data for . Table 4 shows the results for Multi-Class classification between 0, 1 and 2 digit images using the VQC. For this, we split the range into 3 parts.
|
Classifier | ||||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | ||||
100 | VQC | 0.946 | 0.960 | 0.992 | 0.993 | ||
AC | 0.922 | 0.905 | 0.917 | 0.811 | |||
200 | VQC | 0.938 | 0.960 | 0.995 | 0.996 | ||
AC | 0.904 | 0.877 | 0.793 | 0.691 | |||
500 | VQC | 0.933 | 0.955 | 0.993 | 0.995 | ||
AC | 0.904 | 0.950 | 0.925 | 0.826 |
|
Classifier | Corruption | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
Shear | ||||||||||
500 | VQC | 0.996 | 0.998 | 0.994 | 0.990 | 0.997 | ||||||||
AC | 0.918 | 0.929 | 0.980 | 0.918 | 0.858 | |||||||||
Scale | Rotate | Brightness | Translate | Stripe | ||||||||||
500 | VQC | 0.988 | 0.993 | 0.993 | 0.965 | 0.993 | ||||||||
AC | 0.982 | 0.960 | 0.977 | 0.880 | 0.973 | |||||||||
Fog | Spatter |
|
ZigZag |
|
||||||||||
500 | VQC | 0.988 | 0.995 | 0.997 | 0.992 | 0.989 | ||||||||
AC | 0.561 | 0.904 | 0.956 | 0.955 | 0.481 |
|
||||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
100 | 0.588 | 0.838 | 0.798 | 0.824 | ||
200 | 0.597 | 0.857 | 0.831 | 0.860 | ||
500 | 0.603 | 0.807 | 0.773 | 0.765 |
6.3 Color Images
Table 5 shows accuracy on the validation set using different training dataset sizes and shade values for the colour image data. As expected the accuracy decreases as we increase the shade (see Fig 7). This is because, higher shade values imply smaller difference between the classes with a value of 255 implying no difference between the two classes.
Shade | Training Set Size | |||||
---|---|---|---|---|---|---|
100 | 200 | 500 | ||||
VQC | AC | VQC | AC | VQC | AC | |
0 | 0.970 | 0.845 | 0.975 | 0.839 | 0.995 | 0.854 |
10 | 0.986 | 0.721 | 0.993 | 0.774 | 0.998 | 0.858 |
20 | 0.966 | 0.813 | 0.989 | 0.673 | 0.992 | 0.808 |
50 | 0.951 | 0.656 | 0.965 | 0.702 | 0.969 | 0.737 |
100 | 0.907 | 0.539 | 0.897 | 0.523 | 0.918 | 0.512 |
150 | 0.759 | 0.502 | 0.785 | 0.457 | 0.803 | 0.492 |
200 | 0.647 | 0.484 | 0.677 | 0.516 | 0.635 | 0.479 |
255 | 0.507 | 0.508 | 0.495 | 0.493 | 0.506 | 0.509 |

7 CONCLUSIONS AND FUTURE WORK
Encouraging results on benchmark datasets have been obtained with both VQC and AC for binary and multi-class image classification. The work can be expanded to classify more involved images. The ansatz used for VQC was a simple layer. More research can be done to find a better ansatz that can provide improved performance while using a lower number of epochs. The effect of noise and shots on the performance can also be studied. Similarly, different autoencoder models, for example, denoising autoencoder, can be studied. Also, one can look into other Image Processing tasks like filtering on the representations/encodings.
References
- [1] Ryan LaRose and Brian Coyle “Robust data encodings for quantum classifiers” In Phys. Rev. A 102 American Physical Society, 2020, pp. 032420 DOI: 10.1103/PhysRevA.102.032420
- [2] Phuc Q. Le, Fangyan Dong and Kaoru Hirota “A flexible representation of quantum images for polynomial preparation, image compression, and processing operations” In Quantum Information Processing 10.1, 2011, pp. 63–84 DOI: 10.1007/s11128-010-0177-y
- [3] Bo Sun et al. “An RGB Multi-Channel Representation for Images on Quantum Computers” In J. Adv. Comput. Intell. Intell. Informatics 17, 2013, pp. 404–417
- [4] Jonathan Romero, Jonathan P Olson and Alan Aspuru-Guzik “Quantum autoencoders for efficient compression of quantum data” In Quantum Science and Technology 2.4 IOP Publishing, 2017, pp. 045001
- [5] Yann LeCun, Corinna Cortes and CJ Burges “MNIST handwritten digit database” In ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2, 2010
- [6] Norman Mu and Justin Gilmer “MNIST-C: A Robustness Benchmark for Computer Vision” In arXiv preprint arXiv:1906.02337, 2019
- [7] Ville Bergholm et al. “PennyLane: Automatic differentiation of hybrid quantum-classical computations” arXiv, 2018 DOI: 10.48550/ARXIV.1811.04968
- [8] James Bradbury et al. “JAX: composable transformations of Python+NumPy programs”, 2018 URL: http://github.com/google/jax
- [9] Matteo Hessel et al. “Optax: composable gradient transformation and optimisation, in JAX!”, 2022 URL: http://github.com/deepmind/optax
- [10] Diederik P. Kingma and Jimmy Ba “Adam: A Method for Stochastic Optimization” arXiv, 2014 DOI: 10.48550/ARXIV.1412.6980
- [11] F. Pedregosa et al. “Scikit-learn: Machine Learning in Python” In Journal of Machine Learning Research 12, 2011, pp. 2825–2830