On Classifying Images using Quantum Image Representation

Ankit Khandelwal^1,2 M Girish Chandra² Sayantan Pramanik³
¹Centre for High Energy Physics Indian Institute of Science Bengaluru India
²TCS Research India
³TCS Incubation India
e-mail: [email protected] [email protected] [email protected]

Abstract

In this paper, we consider different Quantum Image Representation Methods to encode images into quantum states and then use a Quantum Machine Learning pipeline to classify the images. We provide encouraging results on classifying benchmark datasets of grayscale and colour images using two different classifiers. We also test multi-class classification performance.

keywords:

Autoencoder, FRQI, Machine Learning, MCQI, Variational Classifier, Quantum Image Representation

1 INTRODUCTION

Quantum Image Representation (QIR) is a catch-all term for methods to encode an image as a quantum state. General data encoding methods [1] exist, such as Angle Encoding and Amplitude Encoding. But these do not take advantage of the specific structure of image data. Thus, unique representation methods have been invented to tackle image data and its representation as a quantum state using a quantum circuit.

A number of methods can be found in the literature. We have considered Flexible Representation of Quantum Images (FRQI) [2] for grayscale images and Multi-Channel Representation for Quantum Image (MCQI) [3] for colour images to encode the image and later use a quantum classifier on the encoded image to perform binary and multi-class classification.

We use a variational quantum classifier (VQC) and an autoencoder classifier (AC) to classify the images. We have obtained encouraging results using some classic benchmark datasets.

The paper is organised as follows. In Section 2 , we give a brief summary of the QIR methods used in this study. In Section 3 , we describe the quantum classifiers used for classifying the images. Section 4 details the datasets used in the paper. In Section 5 we give the implementation details. Section 6 presents the classification results obtained from our simulations. In Section 7 we conclude and provide a few markers for future work.

2 Quantum Image Representation Methods

In this section we give a summary of the QIR methods used in the text. First some details of images:

1.

Image dimension $=2^{n}\times 2^{n}$ ( $n=1$ means a $2\times 2$ image $=4$ pixels)
2.

Gray scale images $\rightarrow$ Pixel values $\in[0,255]\rightarrow$ In binary $\in\{0,1\}^{8}$
Number of Matrix Elements $=2^{n}\times 2^{n}$
3.

Color images $\rightarrow$ RGB, pixel values of each channel $\in[0,255]$
Number of Matrix Elements $=2^{n}\times 2^{n}\times 3$

2.1 FRQI

FRQI encodes the image data into a quantum state given by:

\left|I(\theta)\right>=\frac{1}{2^{n}}\sum_{i=0}^{2^{2n}-1}[\cos(\theta_{i})\left|0\right>+\sin(\theta_{i})\left|1\right>]\otimes\left|i\right>

(1)

\theta_{i}\in\left[0,\frac{\pi}{2}\right],\theta=(\theta_{0},\theta_{1},\dots,\theta_{2^{2n}-1})

Here, $\left|0\right>$ and $\left|1\right>$ are single qubit computational basis states and $\left|i\right>,i=0,1,\dots,2^{2n}-1$ are $2n$ qubit computational basis states. The $\cos(\theta_{i})\left|0\right>+\sin(\theta_{i})\left|1\right>$ part is used to encode the pixel values while $\left|i\right>$ encodes the pixel location.

The circuit to encode the image can be constructed using Hadamard ( $H$ ) and Control Rotation, $C^{2n}R_{y}(2\theta)$ , gates. It needs to be measured multiple times to get back the image from the state. The image retrieval process is probabilistic, and the result will depend on the number of shots used.

The number of qubits used in this representation is $2n+1$ with $2n$ qubits to encode the pixel location and 1 qubit to encode the pixel values. Pixel values are encoded as angles and are thus scaled to fit in the range $\left[0,\frac{\pi}{2}\right]$ .

2.2 MCQI

MCQI representation uses $2n+3$ qubits to encode colour images while using $2n$ qubits to encode the pixel location like FRQI and the 3 remaining qubits to encode the pixel values of the RGB channels. This encoding is inspired by FRQI.
MCQI encodes the image into a quantum state given by:

\left|I(\theta)\right>=\frac{1}{2^{n}+1}\sum_{i=0}^{2^{2n}-1}\left|C^{i}_{RGB}\right>\otimes\left|i\right>

(2)

The color information is encoded in:

	$\displaystyle\left\|C^{i}_{RGB}\right>=\cos(\theta^{i}_{R})\left\|000\right>+\cos(\theta^{i}_{G})\left\|001\right>+\cos(\theta^{i}_{B})\left\|010\right>$
	$\displaystyle+\sin(\theta^{i}_{R})\left\|100\right>+\sin(\theta^{i}_{G})\left\|101\right>+\sin(\theta^{i}_{B})\left\|110\right>$
	$\displaystyle+\cos(0)\left\|011\right>+\sin(0)\left\|111\right>$

Colour encoding angle is applied to the R channel qubit using Control Rotation ( $C^{2}R_{y}(2\theta)$ ) gates where it is controlled by the G and B channel qubits, and for each pixel, 3 $C^{2n}(C^{2}R_{y}(2\theta))$ gates are applied to encode the position and value information. The $\theta$ values are calculated from pixel values:

\theta=\cos^{-1}{p}

where, $p$ is the pixel values $\in[0,1]$ . We get in this range by dividing the integer pixel values $\in[0,255]$ by 255.

As before, the image retrieval process is probabilistic and depends on the number of shots.

3 Quantum Classifiers

We have used two different methods to classify the images in this paper. We describe these methods below.

3.1 Variational Quantum Classifier

To classify the images, we need some value to distinguish the two classes. The Z expectation value $(ez)$ of the first qubit gives a natural split. We apply a variational ansatz on the quantum image and measure the expectation value of the colour qubit for FRQI and the R channel qubit for MCQI. The expectation value lies in the range $[-1,1]$ , and thus a split can be formed such that:

\displaystyle\textbf{class}=\left\{\begin{array}[]{ l l }-1&\textbf{if}\quad ez\leq s\\ 1&\textbf{if}\quad ez>s\end{array}\right.

(5)

where $s$ is the split set to 0 by default but can be trained to get optimal results.

3.1.1 Ansatz

We use a straightforward ansatz that consists of a general single-qubit rotation on each qubit and then a layer of CNOT gates, as shown in Fig 1. The number of layers of the ansatz is a hyperparameter. The number of parameters in the classifier $=3\times N\times l$ where $N=$ number of qubits and $l=$ number of layers.

Refer to caption — Figure 1: Variational Classifier Ansatz with 4 qubits and 2 layers. U3 denotes a single qubit general rotation gate.

3.2 Autoencoder Classifier

An autoencoder is a tool to reduce the dimension of data. We use the autoencoder’s quantum analogue [4] to compress the image state into a single qubit state. To use it as a classifier, we train the autoencoder to only compress the positive class. The autoencoder is trained by maximising the fidelity of the trash qubits with the zero state ( ${\left|0\right>}^{\otimes T}$ ) where $T$ is the number of trash qubits. Once the autoencoder is trained on the positive class of the training data, we then use it to compress the validation data. We then measure the fidelity of the trash qubits again with the zero state and classify them depending on the resulting fidelity. The positive class should have higher fidelity values than the negative class. A single layer of the autoencoder is shown in Fig 2.

4 Dataset Details

We use two grayscale and one colour image datasets.

4.1 Bars and Stripes (BAS)

This dataset contains black and white images of dimension $2^{n}\times 2^{n}$ . Example images are shown in Fig 3 for $n=5$ . These images are randomly generated. Horizontal stripes are one class, and vertical bars are another class. This dataset is used for binary classification.

4.2 MNIST

This is the famous dataset of handwritten digits [5]. We have used this for both binary (0 and 1) and multi-class (0, 1 and 2) classification. The original images are of $28\times 28$ dimension, which first needs to be squared into $2^{n}\times 2^{n}$ . Bilinear interpolation is used to transform the data for different $n$ . There also exists another version of this data which contains 15 different corruption variations [6]. We also classify these corrupted datasets. Fig 4 shows the MNIST images, and Fig 5 shows the corrupted images.

4.3 $2\times 2$ Color Images

This is randomly generated $2\times 2$ colour image data for classification. The image has 4 pixels of random colours. For positive class we change the pixel values of the $4^{th}$ pixel to $(0,0,0)$ . This makes the pixel black. The classification problem is then to differentiate between images with and without a black pixel. The pixel can also be modified to have dark shades instead of absolute 0 values. Fig 6 shows example images for different shades.

5 Implementation Details

We use PennyLane [7] to simulate the circuits. JAX [8] is combined with PennyLane as a high-performance simulator and to utilise the GPU. The optimisation library optax [9] is used for optimising the classifier. We use the Adam optimiser [10] with a 0.1 step size for optimisation for 250 epochs unless otherwise mentioned. We use 5 layers of the VQC and 1 layer of the AC. We also use scikit-learn [11] for classical processing. All simulations were done with ‘None’ shots of the PennyLane device. This gives analytic results. We use different sizes of training datasets and a fixed 1000 data points for the validation data.

6 Results

6.1 BAS

Table 1 shows accuracies on the validation set using different training dataset sizes and $n$ values for the BAS data.

Table 1: Validation Set Accuracy using the VQC and the AC on FRQI representation for BAS data

Training Set

Size

Classifier

n

100

VQC

1.0

0.955

0.993

1.0

0.818

0.820

0.889

200

VQC

1.0

0.997

0.990

1.0

0.929

0.881

0.732

500

VQC

1.0

0.993

0.991

1.0

0.858

0.842

0.866

6.2 MNIST

Table 2 shows accuracies on the validation set using different training dataset sizes and $n$ values for the MNIST data. For AC we have used 100 epochs for $n<3$ with MNIST data. Table 3 shows accuracies on the validation set for the MNIST Corruption data for $n=4$ . Table 4 shows the results for Multi-Class classification between 0, 1 and 2 digit images using the VQC. For this, we split the $ez$ range into 3 parts.

Table 2: Validation Set Accuracy using the VQC and the AC on FRQI representation for MNIST data

Training Set

Size

Classifier

n

100

VQC

0.946

0.960

0.992

0.993

0.922

0.905

0.917

0.811

200

VQC

0.938

0.960

0.995

0.996

0.904

0.877

0.793

0.691

500

VQC

0.933

0.955

0.993

0.995

0.904

0.950

0.925

0.826

Table 3: Validation Set Accuracy using the VQC and the AC on FRQI representation for MNIST Corruption data

Training Set

Size

Classifier

Corruption

Shot

Noise

Impulse

Noise

Glass

Blur

Motion

Blur

Shear

500

VQC

0.996

0.998

0.994

0.990

0.997

0.918

0.929

0.980

0.918

0.858

Scale

Rotate

Brightness

Translate

Stripe

500

VQC

0.988

0.993

0.965

0.993

0.982

0.960

0.977

0.880

0.973

Fog

Spatter

Dotted

Line

ZigZag

Canny

Edges

500

VQC

0.988

0.995

0.997

0.992

0.989

0.561

0.904

0.956

0.955

0.481

Table 4: Mulit-Class Validation Set Accuracy using the VQC on FRQI representation for MNIST data

Training Set

Size

n

100

0.588

0.838

0.798

0.824

200

0.597

0.857

0.831

0.860

500

0.603

0.807

0.773

0.765

6.3 $2\times 2$ Color Images

Table 5 shows accuracy on the validation set using different training dataset sizes and shade values for the $2\times 2$ colour image data. As expected the accuracy decreases as we increase the shade (see Fig 7). This is because, higher shade values imply smaller difference between the classes with a value of 255 implying no difference between the two classes.

Table 5: Validation Set Accuracy using the VQC and the AC on MCQI representation for

2\times 2

Color Image data

Shade	Training Set Size
	100		200		500
	VQC	AC	VQC	AC	VQC	AC
0	0.970	0.845	0.975	0.839	0.995	0.854
10	0.986	0.721	0.993	0.774	0.998	0.858
20	0.966	0.813	0.989	0.673	0.992	0.808
50	0.951	0.656	0.965	0.702	0.969	0.737
100	0.907	0.539	0.897	0.523	0.918	0.512
150	0.759	0.502	0.785	0.457	0.803	0.492
200	0.647	0.484	0.677	0.516	0.635	0.479
255	0.507	0.508	0.495	0.493	0.506	0.509

7 CONCLUSIONS AND FUTURE WORK

Encouraging results on benchmark datasets have been obtained with both VQC and AC for binary and multi-class image classification. The work can be expanded to classify more involved images. The ansatz used for VQC was a simple layer. More research can be done to find a better ansatz that can provide improved performance while using a lower number of epochs. The effect of noise and shots on the performance can also be studied. Similarly, different autoencoder models, for example, denoising autoencoder, can be studied. Also, one can look into other Image Processing tasks like filtering on the representations/encodings.

References

[1] Ryan LaRose and Brian Coyle “Robust data encodings for quantum classifiers” In Phys. Rev. A 102 American Physical Society, 2020, pp. 032420 DOI: 10.1103/PhysRevA.102.032420
[2] Phuc Q. Le, Fangyan Dong and Kaoru Hirota “A flexible representation of quantum images for polynomial preparation, image compression, and processing operations” In Quantum Information Processing 10.1, 2011, pp. 63–84 DOI: 10.1007/s11128-010-0177-y
[3] Bo Sun et al. “An RGB Multi-Channel Representation for Images on Quantum Computers” In J. Adv. Comput. Intell. Intell. Informatics 17, 2013, pp. 404–417
[4] Jonathan Romero, Jonathan P Olson and Alan Aspuru-Guzik “Quantum autoencoders for efficient compression of quantum data” In Quantum Science and Technology 2.4 IOP Publishing, 2017, pp. 045001
[5] Yann LeCun, Corinna Cortes and CJ Burges “MNIST handwritten digit database” In ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2, 2010
[6] Norman Mu and Justin Gilmer “MNIST-C: A Robustness Benchmark for Computer Vision” In arXiv preprint arXiv:1906.02337, 2019
[7] Ville Bergholm et al. “PennyLane: Automatic differentiation of hybrid quantum-classical computations” arXiv, 2018 DOI: 10.48550/ARXIV.1811.04968
[8] James Bradbury et al. “JAX: composable transformations of Python+NumPy programs”, 2018 URL: http://github.com/google/jax
[9] Matteo Hessel et al. “Optax: composable gradient transformation and optimisation, in JAX!”, 2022 URL: http://github.com/deepmind/optax
[10] Diederik P. Kingma and Jimmy Ba “Adam: A Method for Stochastic Optimization” arXiv, 2014 DOI: 10.48550/ARXIV.1412.6980
[11] F. Pedregosa et al. “Scikit-learn: Machine Learning in Python” In Journal of Machine Learning Research 12, 2011, pp. 2825–2830

On Classifying Images using Quantum Image Representation

Abstract

keywords:

1 INTRODUCTION

2 Quantum Image Representation Methods

2.1 FRQI

2.2 MCQI

3 Quantum Classifiers

3.1 Variational Quantum Classifier

3.1.1 Ansatz

3.2 Autoencoder Classifier

4 Dataset Details

4.1 Bars and Stripes (BAS)

4.2 MNIST

4.3 2×22\times 2 Color Images

5 Implementation Details

6 Results

6.1 BAS

6.2 MNIST

6.3 2×22\times 2 Color Images

7 CONCLUSIONS AND FUTURE WORK

References

4.3 $2\times 2$ Color Images

6.3 $2\times 2$ Color Images