This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Neural networks for on-the-fly single-shot state classification

Rohit Navarathna1,2 [email protected]    Tyler Jones1,2,3    Tina Moghaddam1,2    Anatoly Kulikov1,2,4    Rohit Beriwal1,2    Markus Jerger1,2,5    Prasanna Pakkiam1,2    Arkady Fedorov1,2 1 ARC Centre of Excellence for Engineered Quantum Systems, St Lucia, Queensland 4072, Australia
2 School of Mathematics and Physics, University of Queensland, St Lucia, Queensland 4072, Australia
3 Max Kelsen, Spring Hill, Queensland 4000, Australia
4 Department of Physics, ETH Zürich, CH-8093 Zürich, Switzerland
5 JARA-FIT Institute for Quantum Information, Forschungszentrum Jülich, 52425 Jülich, Germany
Abstract

Neural networks have proven to be efficient for a number of practical applications ranging from image recognition to identifying phase transitions in quantum physics models. In this paper we investigate the application of neural networks to state classification in a single-shot quantum measurement. We use dispersive readout of a superconducting transmon circuit to demonstrate an increase in assignment fidelity for both two and three state classification. More importantly, our method is ready for on-the-fly data processing without overhead or need for large data transfer to a hard drive. In addition we demonstrate the capacity of neural networks to be trained against experimental imperfections, such as phase drift of a local oscillator in a heterodyne detection scheme.

preprint: APS/123-QED

Machine learning (ML) is ubiquitous in modern computer science, with applications ranging from image and speech recognition to self-driving vehicle and automated medical diagnostic systems. By virtue of its archetypal problem classes – regression and classification – ML algorithms and neural networks in particular have recently found a number of applications in quantum computing, helping researchers to tackle such tasks as optimizing gates and pulse sequences [Zahedinejad2016, August2017, Ding2021, Baum2021], identifying phase transitions [Rem2019, Dong2019], correcting imperfections of measurement apparatus [Palmieri2020, Zwolak2020, Durrer2020], classifying states [Hueaav2761, Cimini2020] or evolution [Stenberg2016, Flurin2020, Gentile2021] of a quantum system with little or no a priori knowledge, and even optimizing the fabrication process [Mei2021a].

A proposal to use machine learning to discriminate measurement trajectories and outcomes was one of the first and most natural applications of ML in the field, and has led to improvements in readout assignment fidelity [Magesan2015]. The technique is now being regularly implemented across the community [Dickel2018, Kono2018, Martinez2020]. As these methods continue to see success, neural networks have become a promising technique for incorporation into the readout procedure, due to their generalisability and capacity to extract useful features from dense data. A recent advancement uses neural networks to compensate for system dependent errors due to processes such as cross-talk in multiplexed qubit readout [Lienhard2021]. In this work we also apply neural networks to the readout of a superconducting transmon system. However, our approach works on-the-fly with no data processing overhead and can be trained against experimental parameter drifts, in addition to increasing readout fidelity in two and three state discrimination scenarios.

To deploy our neural-network-based state classification, we use an open source machine learning PyTorch library [NEURIPS2019_9015]. Geared towards computer vision and natural language processing, it includes the capability to realise deep neural networks and contains built-in functionality for data processing on a graphics processing unit (GPU). GPU integration enables our pipeline to be fast enough to perform on-the-fly data classification without the need to transfer raw measured signal to a hard drive. Amongst other advantages, it allows monitoring the readout assignment fidelity in real time.

With the initial training of the neural network taking on the order of minutes, consequent retraining of the network weights requires several seconds and allows the readout assignment fidelity to return to the optimal value. More importantly, the convolutional neural network used in the present work may be designed and trained in a way resilient to certain experimental parameter drifts. Specifically, we present a strategy to eliminate the effect of local relative phase drifts induced by generating microwave equipment on the readout assignment fidelity.

The measurement setup caters to repeated preparation and measurement of the transmon. We prepare the transmon in one of the two or three basis states, followed by a measurement. We use assignment fidelity to compare the efficacy of different classification methods. We performed the experiments with two samples, each comprising a primitive of the circuit quantum electrodynamics platform: a transmon coupled to a readout cavity. The first run of experiments was used as a test-bed to compare the quality of various ML methods. It was conducted using a single transmon embedded in a three-dimensional copper cavity (Sample A). The second set of experiments demonstrated on-the-fly data processing using a GPU and the methods’ stability against phase drifts; here one of the transmons on a multi-transmon chip (Sample B) was used. The relevant sample parameters are shown in Table 1 for both samples.

After an initialization period to thermalize the qubit to the ground state, we used Gaussian pulses of length 2020 ns to prepare the transmon in the three basis states; applying no pulse to keep the transmon in its ground state |g\left|g\right>, applying one π\pi-pulse at ωge\omega_{ge} to prepare the transmon in its first excited state |e\left|e\right>, and applying two consecutive π\pi-pulses at ωge\omega_{ge} and ωef\omega_{ef} respectively to prepare a transmon in the second excited state |f\left|f\right>. These protocols are illustrated in Fig. 1a.

To achieve a sufficiently high SNR for transmon readout in a single-shot regime, we use a Josephson parametric amplifier (JPA) similar to one described in Ref. [Eichler2014]. Following amplification through the JPA, the readout signal is further amplified using a High Electron Mobility Transistor (HEMT) amplifier and multiple room temperature amplifiers. The signal is then downconverted to 25 MHz and acquired by a digitizer.

For the on-the-fly experiments with Sample B we acquired 512 time points per measurement, recorded to the buffer of a 500 MSa/s digitizer Spectrum M4i. 2048 time traces are transferred to PC memory (RAM) and then to the GPU memory from the populated buffer for batch processing. While the data is processed, the digitizer buffer is populated with new waveforms. This parallelization circumvents any overhead due to the data processing.

Within our data processing workflow, each acquired waveform undergoes digital downconversion (DDC) by multiplying the acquired signal with cos(ωDDCt\cos(\omega_{\rm DDC}t) (sin[ωDDCt]\sin[\omega_{\rm DDC}t]) where ωDDC/2π=25\omega_{\rm DDC}/2\pi=25~{}MHz, to obtain the in-phase quadrature I(t)I(t) (out-of-phase quadrature Q(t)Q(t)). A Finite Impulse Response (FIR) Filter with a window of 4040 samples (2020 ns) and a cutoff frequency of 2020 MHz is applied to the signal to eliminate the signal image at 5050 MHz along with 2525 MHz noise (originally DC offset). After obtaining I(t)I(t) and Q(t)Q(t), the signal undergoes further post-processing. This may include time integration, channel correlation, or even being fed through trained PyTorch neural networks.

Due to the large number of cores in the GPU the data can be processed in parallel, which allowed us to perform real time data acquisition on-the-fly. Although the results of this paper were obtained with a repetition time of 40μ40~{}\mus, our GPU data processing can be run without overhead as fast as 3.2μ3.2~{}\mus per repetition, obtaining 1919 million traces (1919 trillion samples) in 11 minute.

Parameter Transmon A Transmon B
ωcav/2π\omega_{cav}/2\pi 7.08 GHz 7.63GHz7.63~{}\rm{GHz}
ωge/2π\omega_{ge}/2\pi 6.27 GHz 5.49GHz5.49~{}\rm{GHz}
ωef/2π\omega_{ef}/2\pi 5.95 GHz 5.16GHz5.16~{}\rm{GHz}
2χge/2π2\chi_{ge}/2\pi 8.00 MHz 8.50MHz8.50~{}\rm{MHz}
2χef/2π2\chi_{ef}/2\pi 5.35 MHz 15.57MHz15.57~{}\rm{MHz}
κ/2π\kappa/2\pi 1.31 MHz 1.56MHz1.56~{}\textrm{MHz}
T1T_{1} 11.75 μ\mus 4.07μs4.07~{}\textrm{\mu s}
T2T_{2} 3.17 μ\mus 4.29μs4.29~{}\textrm{\mu s}
Table 1: Device parameters. Here, ωcav\omega_{cav} is the readout resonator frequency, ωge\omega_{ge} (ωef\omega_{ef}) is the frequency of transition between the ground (first excited) state and the first (second) excited state, χge\chi_{ge}, χef\chi_{ef} are the state-dependent dispersive shifts of the resonator frequency, κ\kappa is the decay rate of the resonator and T1T_{1} (T2T_{2}) is the relaxation (dephasing) rates of the transmons.

In order to determine the baseline readout fidelity, we first employed the conventional method of state classification. This involves preparing the relevant basis states of the transmon followed by a measurement pulse. The heterodyne measurement signal was integrated in time, giving one complex number for each acquisition with integrated in-phase and out-of-phase quadratures II and QQ. By repeating the measurements, we can populate measurement outcomes for every prepared state on the I-Q plane, as shown in Fig. 1. The mean values of the state responses are calculated and stored as calibration data. A particular measurement response can be classified by selecting the state whose mean response is closest to this recorded point on the I-Q plane. The assignment fidelity can be evaluated as a=(1/N)i=1N(i|i)\mathcal{F}_{a}=(1/N)\sum_{i=1}^{N}\mathbb{P}(i|i), where (i|j)\mathbb{P}(i|j) is the probability of obtaining outcome “ii” given the system was prepared in jthj^{\textrm{th}} state. Here, we use N=2(3)N=2(3) and states |g,|e{\left|g\right>,\left|e\right>} (|g,|e,|f{\left|g\right>,\left|e\right>,\left|f\right>}) to calculate qubit (qutrit) assignment fidelity.

Refer to caption
Figure 1: a) Transmon control pulses used for preparing the three basis states of the transmon. b) Integrated cavity responses I+iQI+iQ for different prepared states. The blue, red and green points correspond to the transmon being prepared in the |g\left|g\right>, |e\left|e\right> and |f\left|f\right> states respectively. The relaxation of the f-state to the excited state and excited state to ground state is visible. Some points decaying from the f-state to the excited state will be classified as ground state due to their proximity to the ground state cluster.

Alternatively, one can apply a matched filter to the heterodyne measurement signal prior to integration in the conventional method. Matched filters are calibrated by taking the means of all acquired signals corresponding to each basis states, and storing the complex conjugates of these responses. An incoming signal is convolved with these filters. The filter which returns maximum average amplitude determines the classified basis state.

Refer to caption
Figure 2: Trajectories for the three states. The solid lines are the mean of 2048 traces acquired for each prepared state. The dashed lines are examples of single shots, which are the inputs to the machine learning model. The arrows indicate the direction of the single shot trajectories. The mean trajectories start from (I,Q)=(0,0)(\rm{I},\rm{Q})=(0,0). a) Both the conventional and CNN correctly classify the shots. b) The CNN correctly classifies all shots, but the conventional method wrongly classifies the |f\left|f\right> shot as |e\left|e\right>. This is likely because the transmon decayed from |f\left|f\right> to |e\left|e\right> early in its trajectory.

To identify the best ML algorithm we first collected data from Sample A. The transmon was prepared in each of the three basis states (|g\left|g\right>, |e\left|e\right> and |f\left|f\right>) followed by the 2 μ\mus measurement pulse, resulting in 50 time samples for each trace at 100 MHz. In total, we collected 16384 traces corresponding to each state for analysis. 90%90\% of this data was used for training, and the rest was used to test and obtain the classification fidelity.

Model 2 state fidelity 3 state fidelity
Conventional 0.841 0.7110.711
Matched Filter 0.913 0.7470.747
K Nearest Neighbours 0.902 0.8450.845
Support Vector Machine 0.917 0.8510.851
Random Forest Classifier 0.917 0.8740.874
Vanilla Neural Network 0.912 0.9250.925
LSTM 0.909 0.9040.904
CNN 0.919 0.9280.928
Table 2: Assignment fidelities for different machine learning models evaluated on an identical test data set generated with Sample A. For all methods, there are significant improvements over the conventional method (time integration of readout signal and setting classification thresholds). CNN is the best model for classifying between three levels. This method was selected to be used for on-the-fly data analysis. Note that the same data was used to extract both the qubit and qutrit fidelities. Since the readout parameters were optimised for the qutrit case the CNN model returns higher value for the 3 state fidelity.

Before using neural networks, we evaluated the assignment fidelity of other common machine learning methods. The assignment fidelities obtained using each of these algorithms are shown in Table 2.

The k-nearest neighbour algorithm bears the most similarity to the baseline assignment fidelity method, using the mean trajectory instead of just the integrated point from a trajectory. The long short term memory (LSTM) algorithm is popular in language processing, and was chosen because they are designed to deal with sequences of data, and can therefore process long time correlations between data points.

A convolutional neural network (CNN) is most popularly used for pattern recognition and image classification. It is a neural network where the hidden layer is a convolution of the input with a kernel (or filter). We feed the time-domain signal data to the CNN with the I(t) and Q(t) traces as two channel inputs, analogous to the red, blue and green channels of a color image.

After selecting CNN as the method with the highest assignment fidelity, we apply this model to state classification on-the-fly. The network consists of the following layers:

  1. 1.

    1D convolution: A convolution layer with 2 input channels (corresponding to I and Q), 16 output channels, and a kernel size of 128. The large kernel size filters out higher frequency noise effectively. The initial kernel weights (or filter coefficients) are manually set using the He initialisation function [he2015delving], with experimentation demonstrating that CNN performance is sensitive to this weight initialisation.

  2. 2.

    ReLu activation: Mapping f(x)=max(0,x)f(x)=\rm{max}(0,x) in order to expedite the learning process [Nair2010].

  3. 3.

    1D convolution: A convolution layer with 16 input channels (corresponding to the output of the previous convolution layer), 32 output channels, and a kernel size of 5. This expands the previous 16 features to 32, by taking various linear combinations of the prior layer outputs.

  4. 4.

    ReLu activation

  5. 5.

    Max pooling: The maximums of every three neighbouring output values of the previous layer are evaluated.

  6. 6.

    Flattening: This step reshapes data to a one-dimensional array.

  7. 7.

    Dropout: 50% of the data points are randomly selected to be set to zero (called ‘neuron deactivation’). This step was introduced to avoid overfitting.

  8. 8.

    Linear: This layer applies a linear transformation to the incoming data: y=Ax+by=Ax+b, where weights (denoted as a matrix AA) and biases (denoted as a vector bb) are optimized. Here, the size of the output yy is half of the size of the input xx.

  9. 9.

    ReLu activation

  10. 10.

    Linear: The size of the output data is designed to be two for the qubit and three for the qutrit classification, corresponding to the possible preparation states.

The Adam optimizer [kingma2017adam] and a mean squared error (MSE) loss function was used for optimization. The output was classified using a softmax function; for the qutrit case, this results in the mapping of three outputs to one state, |g\left|g\right>, |e\left|e\right> or |f\left|f\right>.

Generation of the labelled data required for model training involved preparing the transmon in each of the three states, probing with a measurement signal and recording the output. For each training cycle, we record 2048 traces of each state, and pass these through the model. The loss was calculated and gradient descent was undertaken at a learning rate of 10310^{-3}. This cycle takes 3\sim 3 seconds.

We trained the model on new data at every training cycle, acquired in real-time from the sample in the dilution refrigerator. This framework provides intrinsic protection from overfitting; since there is a new dataset each time the loss is calculated, the model cannot learn on any spurious signal features localised to a single dataset. The low learning rate assists in learning patterns that are common to data across training cycles, thereby increasing model stability. After each update of the model weights, a further 2048 acquisitions were made to test the model. The loss and assignment fidelity on test data were stored for monitoring.

Refer to caption
Figure 3: Training over 100 training cycles and re-training over 20 cycles. The left plot shows the CNN learning to differentiate between the three (or two) states. The model provides better fidelity than the baseline fidelity within 30 training cycles over approximately one minute. The plot on the right shows the model being re-trained after running the experiment for a few hours. Retraining takes only 20 training cycles.
Refer to caption
Figure 4: Fidelity of different classification methods. Each point represents a fidelity evaluation using 2048 traces for each of the three states gathered on-the-fly, while each plot represents a different training regime. The black dashed lines indicate when the model was retrained. The retrain CNN model consistently does better than the cal-baseline measurement, even though it is only trained at discrete intervals. The no retraining model generally has a better fidelity than the baseline, but not always, due to parameter drifts.

To investigate robustness of the CNN classification model against system parameter drifts, we performed measurements over 24 hours and monitored the fidelity shown in Fig. 4. Each plot shows three fidelities obtained from the same data but using different data processing techniques. First, we evaluated the fidelities obtained from the integrated responses according to the conventional method, where the mean responses for |g|g\rangle, |e|e\rangle and |f|f\rangle states are calibrated only once prior to the experiment “Baseline”). Second, we obtained the fidelity using the mean responses re-calibrated every 2048 repeated measurements (“Cal-Baseline”). Finally, we plot the fidelity obtained by on-fly-processing with the CNN model.

To acquire additional insight into behaviours of our CNN model we show two separate plots corresponding to different training regimes. Each of these scenarios are described in more detail below:

  • Retrain model The model is trained for an initial 100 training cycles. The fidelity is then repeatedly tested for 1\sim 1 to 3.53.5 hours, before training the model again for another 2020 training cycles. This retraining process is akin to recalibration in the “Cal-Baseline” measurement. This model performs better than both the “Baseline” and “Cal-Baseline” measurements, despite the frequency of recalibration being significantly lower than its “Cal-Baseline” counterpart.

  • No Retraining This model is never retrained after the initial 100 training cycles. It also performs better than the Baseline throughout the 24 hours, but loses its enhanced fidelity after a few hours.

Another asset of the model is robustness against some parameter drifts. To enforce this, the training dataset should contain samples which are broadly distributed across the domain of that input parameter. This effectively removes the parameter from the possible set of features upon which the model can learn. An example of such a parameter variation can be global phase drift generated by insufficient instrument synchronisation.

To endow the network with global phase robustness, a secondary dataset was created by obtaining 256 traces for each of the three basis states (|g\left|g\right>, |e\left|e\right> and |f\left|f\right>) at each of 500 different global phases (between 0 and 2π\pi). Global phase was applied by imposing an arbitrary wait time at the beginning of each experiment repetition. The readout signals obtained within each experiment are downconverted to 25 MHz using a local oscillator which is phase-locked to the generator producing the readout signal itself. This phase-locking ensures that if measurements occur at integer multiples of 40 ns within the pulse sequence, phase coherence is conserved. By generating data with uniformly distributed initial wait times between 0 and 40 ns, global phase is randomised, forcing the model to learn a phase-robust mapping from readout trajectory to qutrit state. We fed this data into a CNN network in the same manner as the standard dataset.

The design of the neural network remains identical for this experiment, aside from using a kernel size of 10 in the first convolutional layer. Assignment fidelities are evaluated at 500 intervals with phase shifts ranging from 0 to 2π\pi over the course of 9 minutes in Fig. 5, using both the baseline mean integration and CNN processing methods (trained/calibrated using data gathered immediately prior).

Refer to caption
Figure 5: Assignment fidelities of the baseline mean calibration method and a phase-robust CNN model classifying data with an induced phase drift. Each point represents a fidelity evaluation using 2048 traces for each of the three states. The phase-robust CNN model displays considerably better performance than the baseline method in general, and comparable performance at zero phase shift (where the baseline was calibrated). Neither method undergoes any retraining or recalibration over the course of the experiment.

In summary, we have investigated a set of machine learning methods for the two and three state classification of a transmon. Convolutional neural network methods demonstrated the highest performance consistent with the results from [Lienhard2021]. In particular, CNN methods were not only performant in the detection of relaxation events during measurement, but could also be trained against experimental imperfections such as the local oscillator phase drift.

Improved assignment fidelity and ability to directly train the model against system imperfections are the key advantages for performing this readout signal processing using neural networks. The open-source, GPU-friendly, and easily implementable nature of PyTorch makes these neural networks an attractive tool for state classification.

We thank Andreas Wallraff, Markus Oppliger, Anton Potočnik, and Mintu Mondal for fabricating JPA used in the measurements. The authors were supported by the Australian Research Council Centre of Excellence for Engineered Quantum Systems (EQUS, CE170100009) and by Lockheed Martin Corporation via research contract S19 004.