Neural networks for on-the-fly single-shot state classification

Rohit Navarathna^1,2 [email protected] Tyler Jones^1,2,3 Tina Moghaddam^1,2 Anatoly Kulikov^1,2,4 Rohit Beriwal^1,2 Markus Jerger^1,2,5 Prasanna Pakkiam^1,2 Arkady Fedorov^1,2 ¹ ARC Centre of Excellence for Engineered Quantum Systems, St Lucia, Queensland 4072, Australia
² School of Mathematics and Physics, University of Queensland, St Lucia, Queensland 4072, Australia
³ Max Kelsen, Spring Hill, Queensland 4000, Australia
⁴ Department of Physics, ETH Zürich, CH-8093 Zürich, Switzerland
⁵ JARA-FIT Institute for Quantum Information, Forschungszentrum Jülich, 52425 Jülich, Germany

Abstract

Neural networks have proven to be efficient for a number of practical applications ranging from image recognition to identifying phase transitions in quantum physics models. In this paper we investigate the application of neural networks to state classification in a single-shot quantum measurement. We use dispersive readout of a superconducting transmon circuit to demonstrate an increase in assignment fidelity for both two and three state classification. More importantly, our method is ready for on-the-fly data processing without overhead or need for large data transfer to a hard drive. In addition we demonstrate the capacity of neural networks to be trained against experimental imperfections, such as phase drift of a local oscillator in a heterodyne detection scheme.

^†^†preprint: APS/123-QED

Machine learning (ML) is ubiquitous in modern computer science, with applications ranging from image and speech recognition to self-driving vehicle and automated medical diagnostic systems. By virtue of its archetypal problem classes – regression and classification – ML algorithms and neural networks in particular have recently found a number of applications in quantum computing, helping researchers to tackle such tasks as optimizing gates and pulse sequences [Zahedinejad2016, August2017, Ding2021, Baum2021], identifying phase transitions [Rem2019, Dong2019], correcting imperfections of measurement apparatus [Palmieri2020, Zwolak2020, Durrer2020], classifying states [Hueaav2761, Cimini2020] or evolution [Stenberg2016, Flurin2020, Gentile2021] of a quantum system with little or no a priori knowledge, and even optimizing the fabrication process [Mei2021a].

A proposal to use machine learning to discriminate measurement trajectories and outcomes was one of the first and most natural applications of ML in the field, and has led to improvements in readout assignment fidelity [Magesan2015]. The technique is now being regularly implemented across the community [Dickel2018, Kono2018, Martinez2020]. As these methods continue to see success, neural networks have become a promising technique for incorporation into the readout procedure, due to their generalisability and capacity to extract useful features from dense data. A recent advancement uses neural networks to compensate for system dependent errors due to processes such as cross-talk in multiplexed qubit readout [Lienhard2021]. In this work we also apply neural networks to the readout of a superconducting transmon system. However, our approach works on-the-fly with no data processing overhead and can be trained against experimental parameter drifts, in addition to increasing readout fidelity in two and three state discrimination scenarios.

To deploy our neural-network-based state classification, we use an open source machine learning PyTorch library [NEURIPS2019_9015]. Geared towards computer vision and natural language processing, it includes the capability to realise deep neural networks and contains built-in functionality for data processing on a graphics processing unit (GPU). GPU integration enables our pipeline to be fast enough to perform on-the-fly data classification without the need to transfer raw measured signal to a hard drive. Amongst other advantages, it allows monitoring the readout assignment fidelity in real time.

With the initial training of the neural network taking on the order of minutes, consequent retraining of the network weights requires several seconds and allows the readout assignment fidelity to return to the optimal value. More importantly, the convolutional neural network used in the present work may be designed and trained in a way resilient to certain experimental parameter drifts. Specifically, we present a strategy to eliminate the effect of local relative phase drifts induced by generating microwave equipment on the readout assignment fidelity.

The measurement setup caters to repeated preparation and measurement of the transmon. We prepare the transmon in one of the two or three basis states, followed by a measurement. We use assignment fidelity to compare the efficacy of different classification methods. We performed the experiments with two samples, each comprising a primitive of the circuit quantum electrodynamics platform: a transmon coupled to a readout cavity. The first run of experiments was used as a test-bed to compare the quality of various ML methods. It was conducted using a single transmon embedded in a three-dimensional copper cavity (Sample A). The second set of experiments demonstrated on-the-fly data processing using a GPU and the methods’ stability against phase drifts; here one of the transmons on a multi-transmon chip (Sample B) was used. The relevant sample parameters are shown in Table 1 for both samples.

After an initialization period to thermalize the qubit to the ground state, we used Gaussian pulses of length $20$ ns to prepare the transmon in the three basis states; applying no pulse to keep the transmon in its ground state $\left|g\right>$ , applying one $\pi$ -pulse at $\omega_{ge}$ to prepare the transmon in its first excited state $\left|e\right>$ , and applying two consecutive $\pi$ -pulses at $\omega_{ge}$ and $\omega_{ef}$ respectively to prepare a transmon in the second excited state $\left|f\right>$ . These protocols are illustrated in Fig. 1a.

To achieve a sufficiently high SNR for transmon readout in a single-shot regime, we use a Josephson parametric amplifier (JPA) similar to one described in Ref. [Eichler2014]. Following amplification through the JPA, the readout signal is further amplified using a High Electron Mobility Transistor (HEMT) amplifier and multiple room temperature amplifiers. The signal is then downconverted to 25 MHz and acquired by a digitizer.

For the on-the-fly experiments with Sample B we acquired 512 time points per measurement, recorded to the buffer of a 500 MSa/s digitizer Spectrum M4i. 2048 time traces are transferred to PC memory (RAM) and then to the GPU memory from the populated buffer for batch processing. While the data is processed, the digitizer buffer is populated with new waveforms. This parallelization circumvents any overhead due to the data processing.

Within our data processing workflow, each acquired waveform undergoes digital downconversion (DDC) by multiplying the acquired signal with $\cos(\omega_{\rm DDC}t$ ) ( $\sin[\omega_{\rm DDC}t]$ ) where $\omega_{\rm DDC}/2\pi=25~{}$ MHz, to obtain the in-phase quadrature $I(t)$ (out-of-phase quadrature $Q(t)$ ). A Finite Impulse Response (FIR) Filter with a window of $40$ samples ( $20$ ns) and a cutoff frequency of $20$ MHz is applied to the signal to eliminate the signal image at $50$ MHz along with $25$ MHz noise (originally DC offset). After obtaining $I(t)$ and $Q(t)$ , the signal undergoes further post-processing. This may include time integration, channel correlation, or even being fed through trained PyTorch neural networks.

Due to the large number of cores in the GPU the data can be processed in parallel, which allowed us to perform real time data acquisition on-the-fly. Although the results of this paper were obtained with a repetition time of $40~{}\mu$ s, our GPU data processing can be run without overhead as fast as $3.2~{}\mu$ s per repetition, obtaining $19$ million traces ( $19$ trillion samples) in $1$ minute.

Parameter	Transmon A	Transmon B
$\omega_{cav}/2\pi$	7.08 GHz	$7.63~{}\rm{GHz}$
$\omega_{ge}/2\pi$	6.27 GHz	$5.49~{}\rm{GHz}$
$\omega_{ef}/2\pi$	5.95 GHz	$5.16~{}\rm{GHz}$
$2\chi_{ge}/2\pi$	8.00 MHz	$8.50~{}\rm{MHz}$
$2\chi_{ef}/2\pi$	5.35 MHz	$15.57~{}\rm{MHz}$
$\kappa/2\pi$	1.31 MHz	$1.56~{}\textrm{MHz}$
$T_{1}$	11.75 $\mu$ s	$4.07~{}\textrm{\mu s}$
$T_{2}$	3.17 $\mu$ s	$4.29~{}\textrm{\mu s}$

Table 1: Device parameters. Here,

\omega_{cav}

is the readout resonator frequency,

\omega_{ge}

(

\omega_{ef}

) is the frequency of transition between the ground (first excited) state and the first (second) excited state,

\chi_{ge}

\chi_{ef}

are the state-dependent dispersive shifts of the resonator frequency,

\kappa

is the decay rate of the resonator and

T_{1}

(

T_{2}

) is the relaxation (dephasing) rates of the transmons.

In order to determine the baseline readout fidelity, we first employed the conventional method of state classification. This involves preparing the relevant basis states of the transmon followed by a measurement pulse. The heterodyne measurement signal was integrated in time, giving one complex number for each acquisition with integrated in-phase and out-of-phase quadratures $I$ and $Q$ . By repeating the measurements, we can populate measurement outcomes for every prepared state on the I-Q plane, as shown in Fig. 1. The mean values of the state responses are calculated and stored as calibration data. A particular measurement response can be classified by selecting the state whose mean response is closest to this recorded point on the I-Q plane. The assignment fidelity can be evaluated as $\mathcal{F}_{a}=(1/N)\sum_{i=1}^{N}\mathbb{P}(i|i)$ , where $\mathbb{P}(i|j)$ is the probability of obtaining outcome “ $i$ ” given the system was prepared in $j^{\textrm{th}}$ state. Here, we use $N=2(3)$ and states ${\left|g\right>,\left|e\right>}$ ( ${\left|g\right>,\left|e\right>,\left|f\right>}$ ) to calculate qubit (qutrit) assignment fidelity.

Refer to caption — Figure 1: a) Transmon control pulses used for preparing the three basis states of the transmon. b) Integrated cavity responses $I+iQ$ for different prepared states. The blue, red and green points correspond to the transmon being prepared in the $\left|g\right>$ , $\left|e\right>$ and $\left|f\right>$ states respectively. The relaxation of the f-state to the excited state and excited state to ground state is visible. Some points decaying from the f-state to the excited state will be classified as ground state due to their proximity to the ground state cluster.

Alternatively, one can apply a matched filter to the heterodyne measurement signal prior to integration in the conventional method. Matched filters are calibrated by taking the means of all acquired signals corresponding to each basis states, and storing the complex conjugates of these responses. An incoming signal is convolved with these filters. The filter which returns maximum average amplitude determines the classified basis state.

To identify the best ML algorithm we first collected data from Sample A. The transmon was prepared in each of the three basis states ( $\left|g\right>$ , $\left|e\right>$ and $\left|f\right>$ ) followed by the 2 $\mu$ s measurement pulse, resulting in 50 time samples for each trace at 100 MHz. In total, we collected 16384 traces corresponding to each state for analysis. $90\%$ of this data was used for training, and the rest was used to test and obtain the classification fidelity.

Model	2 state fidelity	3 state fidelity
Conventional	0.841	$0.711$
Matched Filter	0.913	$0.747$
K Nearest Neighbours	0.902	$0.845$
Support Vector Machine	0.917	$0.851$
Random Forest Classifier	0.917	$0.874$
Vanilla Neural Network	0.912	$0.925$
LSTM	0.909	$0.904$
CNN	0.919	$0.928$

Table 2: Assignment fidelities for different machine learning models evaluated on an identical test data set generated with Sample A. For all methods, there are significant improvements over the conventional method (time integration of readout signal and setting classification thresholds). CNN is the best model for classifying between three levels. This method was selected to be used for on-the-fly data analysis. Note that the same data was used to extract both the qubit and qutrit fidelities. Since the readout parameters were optimised for the qutrit case the CNN model returns higher value for the 3 state fidelity.

Before using neural networks, we evaluated the assignment fidelity of other common machine learning methods. The assignment fidelities obtained using each of these algorithms are shown in Table 2.

The k-nearest neighbour algorithm bears the most similarity to the baseline assignment fidelity method, using the mean trajectory instead of just the integrated point from a trajectory. The long short term memory (LSTM) algorithm is popular in language processing, and was chosen because they are designed to deal with sequences of data, and can therefore process long time correlations between data points.

A convolutional neural network (CNN) is most popularly used for pattern recognition and image classification. It is a neural network where the hidden layer is a convolution of the input with a kernel (or filter). We feed the time-domain signal data to the CNN with the I(t) and Q(t) traces as two channel inputs, analogous to the red, blue and green channels of a color image.

After selecting CNN as the method with the highest assignment fidelity, we apply this model to state classification on-the-fly. The network consists of the following layers:

1.

1D convolution: A convolution layer with 2 input channels (corresponding to I and Q), 16 output channels, and a kernel size of 128. The large kernel size filters out higher frequency noise effectively. The initial kernel weights (or filter coefficients) are manually set using the He initialisation function [he2015delving], with experimentation demonstrating that CNN performance is sensitive to this weight initialisation.
2.

ReLu activation: Mapping $f(x)=\rm{max}(0,x)$ in order to expedite the learning process [Nair2010].
3.

1D convolution: A convolution layer with 16 input channels (corresponding to the output of the previous convolution layer), 32 output channels, and a kernel size of 5. This expands the previous 16 features to 32, by taking various linear combinations of the prior layer outputs.
4.

ReLu activation
5.

Max pooling: The maximums of every three neighbouring output values of the previous layer are evaluated.
6.

Flattening: This step reshapes data to a one-dimensional array.
7.

Dropout: 50% of the data points are randomly selected to be set to zero (called ‘neuron deactivation’). This step was introduced to avoid overfitting.
8.

Linear: This layer applies a linear transformation to the incoming data: $y=Ax+b$ , where weights (denoted as a matrix $A$ ) and biases (denoted as a vector $b$ ) are optimized. Here, the size of the output $y$ is half of the size of the input $x$ .
9.

ReLu activation
10.

Linear: The size of the output data is designed to be two for the qubit and three for the qutrit classification, corresponding to the possible preparation states.

The Adam optimizer [kingma2017adam] and a mean squared error (MSE) loss function was used for optimization. The output was classified using a softmax function; for the qutrit case, this results in the mapping of three outputs to one state, $\left|g\right>$ , $\left|e\right>$ or $\left|f\right>$ .

Generation of the labelled data required for model training involved preparing the transmon in each of the three states, probing with a measurement signal and recording the output. For each training cycle, we record 2048 traces of each state, and pass these through the model. The loss was calculated and gradient descent was undertaken at a learning rate of $10^{-3}$ . This cycle takes $\sim 3$ seconds.

We trained the model on new data at every training cycle, acquired in real-time from the sample in the dilution refrigerator. This framework provides intrinsic protection from overfitting; since there is a new dataset each time the loss is calculated, the model cannot learn on any spurious signal features localised to a single dataset. The low learning rate assists in learning patterns that are common to data across training cycles, thereby increasing model stability. After each update of the model weights, a further 2048 acquisitions were made to test the model. The loss and assignment fidelity on test data were stored for monitoring.

To investigate robustness of the CNN classification model against system parameter drifts, we performed measurements over 24 hours and monitored the fidelity shown in Fig. 4. Each plot shows three fidelities obtained from the same data but using different data processing techniques. First, we evaluated the fidelities obtained from the integrated responses according to the conventional method, where the mean responses for $|g\rangle$ , $|e\rangle$ and $|f\rangle$ states are calibrated only once prior to the experiment “Baseline”). Second, we obtained the fidelity using the mean responses re-calibrated every 2048 repeated measurements (“Cal-Baseline”). Finally, we plot the fidelity obtained by on-fly-processing with the CNN model.

To acquire additional insight into behaviours of our CNN model we show two separate plots corresponding to different training regimes. Each of these scenarios are described in more detail below:

•

Retrain model The model is trained for an initial 100 training cycles. The fidelity is then repeatedly tested for $\sim 1$ to $3.5$ hours, before training the model again for another $20$ training cycles. This retraining process is akin to recalibration in the “Cal-Baseline” measurement. This model performs better than both the “Baseline” and “Cal-Baseline” measurements, despite the frequency of recalibration being significantly lower than its “Cal-Baseline” counterpart.
•

No Retraining This model is never retrained after the initial 100 training cycles. It also performs better than the Baseline throughout the 24 hours, but loses its enhanced fidelity after a few hours.

Another asset of the model is robustness against some parameter drifts. To enforce this, the training dataset should contain samples which are broadly distributed across the domain of that input parameter. This effectively removes the parameter from the possible set of features upon which the model can learn. An example of such a parameter variation can be global phase drift generated by insufficient instrument synchronisation.

To endow the network with global phase robustness, a secondary dataset was created by obtaining 256 traces for each of the three basis states ( $\left|g\right>$ , $\left|e\right>$ and $\left|f\right>$ ) at each of 500 different global phases (between 0 and 2 $\pi$ ). Global phase was applied by imposing an arbitrary wait time at the beginning of each experiment repetition. The readout signals obtained within each experiment are downconverted to 25 MHz using a local oscillator which is phase-locked to the generator producing the readout signal itself. This phase-locking ensures that if measurements occur at integer multiples of 40 ns within the pulse sequence, phase coherence is conserved. By generating data with uniformly distributed initial wait times between 0 and 40 ns, global phase is randomised, forcing the model to learn a phase-robust mapping from readout trajectory to qutrit state. We fed this data into a CNN network in the same manner as the standard dataset.

The design of the neural network remains identical for this experiment, aside from using a kernel size of 10 in the first convolutional layer. Assignment fidelities are evaluated at 500 intervals with phase shifts ranging from 0 to 2 $\pi$ over the course of 9 minutes in Fig. 5, using both the baseline mean integration and CNN processing methods (trained/calibrated using data gathered immediately prior).

In summary, we have investigated a set of machine learning methods for the two and three state classification of a transmon. Convolutional neural network methods demonstrated the highest performance consistent with the results from [Lienhard2021]. In particular, CNN methods were not only performant in the detection of relaxation events during measurement, but could also be trained against experimental imperfections such as the local oscillator phase drift.

Improved assignment fidelity and ability to directly train the model against system imperfections are the key advantages for performing this readout signal processing using neural networks. The open-source, GPU-friendly, and easily implementable nature of PyTorch makes these neural networks an attractive tool for state classification.

We thank Andreas Wallraff, Markus Oppliger, Anton Potočnik, and Mintu Mondal for fabricating JPA used in the measurements. The authors were supported by the Australian Research Council Centre of Excellence for Engineered Quantum Systems (EQUS, CE170100009) and by Lockheed Martin Corporation via research contract S19 004.