Learning Quantum Dissipation by Neural Ordinary Differential Equation

Li Chen^1,2 [email protected] Yadong Wu^2,3 ¹Institute of Theoretical Physics and State Key Laboratory of Quantum Optics and Quantum Optics Devices, Shanxi University, Taiyuan 030006, China
²Institute for Advanced Study, Tsinghua University, Beijing 100084, China
³State Key Laboratory of Surface Physics, Institute of Nanoelectronics and Quantum Computing, and Department of Physics, Fudan University, Shanghai 200433, China

Abstract

Quantum dissipation arises from the unavoidable coupling between a quantum system and its surrounding environment, which is known as a major obstacle in the quantum processing of information. Apart from its existence, how to trace the dissipation from observational data is a crucial topic that may stimulate manners to suppress the dissipation. In this paper, we propose to learn the quantum dissipation from dynamical observations using the neural ordinary differential equation, and then demonstrate this method concretely on two open quantum-spin systems — a large spin system and a spin-1/2 chain. We also investigate the learning efficiency of the dataset, which provides useful guidance for data acquisition in experiments. Our work promisingly facilitates effective modeling and decoherence suppression in open quantum systems.

I Introduction

Quantum dissipation is closely related to such phenomena as decoherence, spectrum broadening and heating, all of which stand as serious obstacles in research areas ranging from quantum computation Ladd2010 ; Nielsen2010 ; Preskill2018 and simulation Georgescu2014 ; Bloch2012 , quantum information storage Lvovsky2009 , to quantum metrology Giovannetti2011 ; Pezze2018 , and sensing Degen2017 . The microscopic origin of dissipation is the breaking of isolation of the quantum system, i.e., the system inevitably interacts with its surrounding environment such that the information leakage occurs as the ambient degrees of freedom are traced out. Dissipation severely impairs the accuracy of modeling and manipulation of quantum systems.

Considerable efforts have been made to counteract the negative effects of dissipation. For example, quantum-nondemolition-mediated feedback has been used in suppressing the decoherence of a Schrödinger cat state in a cavity Wiseman1993 ; Vitali1997 ; spin-echo Hahn1950 and dynamical decoupling protocols Viola1999 have been widely applied to nitrogen-vacancy centers Lange2010 ; Du2009 ; Abobeih2018 ; Laraoui2013 , cold atoms Almog2010 , trapped ions Wang2017 and super-conducting circuits Guo2018 ; Pokharel2018 . These techniques often work with certain prior knowledge of the system-environment interactions. For example, in the nitrogen-vacancy centers, strong bias fields were introduced to circumvent the transverse coupling Du2009 or to effectively establish the bath correlations of the carbon nucleus Laraoui2013 . Furthermore, the coupling manner may also be recognized in advance, if it is magnetic or electric, linear or quadratical. However, such prior information is generally unachievable, especially for many-body systems interacting with complex environments.

In this paper, we propose a data-driven scheme to reconstruct the Markovian open quantum systems, where the dataset comes from the discrete observations of the relaxation dynamics under certain probes. Based on the dataset, we adopt the neural ordinary differential equation (NODE), a recently developed machine learning algorithm Chen2019 , to learn the Liouvillian of the open system by inversely solving the Lindblad (or Gorini-Kossakowski-Sudarshan-Lindblad) master equation Lindblad1976 ; Gorini1976 ; Gardiner2004 . A number of relevant works have been reported on such a topic. The operator expansion Franco2009 and the eigensystem realization algorithm (ERA) Zhang2014 ; Sone2017A were firstly used in the time-trace of the Hamiltonians; the ERA was later extended to the Markovian dissipative systems Zhang2015 and the estimation of the system size Sone2017B . Several eigenstate- Qi2019 ; Bairey2019 or steady-state-based approaches Bairey2020 were further developed. A generic approach for local Hamiltonian tomography was reported using the initial and the final observations of quench dynamics Li2020 . Also for this topic, several traditional machine-learning architectures, such as the fully connected neural network Xin2019 , long short-term memory Che2021 , and convolutional neural network Ma2021 were recently involved.

The novelty and contributions of our present work mainly lie in the following points. At first, our approach can be applied to either the Hamiltonian tomography (closed system) or the reconstruction of the Markovian dissipations (open system), with no need of prior information on the specific structures of the Hamiltonian or Liouvilian. Second, we adopt a relatively new machine-learning algorithm (namely the NODE Chen2019 ) to deal with the gradient, with such issues encountered by the traditional machine-learning algorithms as the gradient explosion and vanishing Pascanu2012 being avoided. Furthermore, we also study the learning efficiency of datasets to facilitate the data acquisition in realistic experiments. We thus expect our work can play an active role in effective modeling and guiding new system-environment decoupling protocols for various open quantum systems.

Refer to caption — Figure 1: (a) Schematic of an open quantum system. (b) ODE solvers work to find solutions of the Lindblad equation when the Hamiltonian or the Liouvilian is priorly known, while the NODE works to reconstruct the Hamiltonian or Liouvilian as solutions ( $\rho(t)$ or $\langle O(t)\rangle$ ) are given. (c) Mechanism of the NODE where $\rho(t)$ and $a(t)$ flow forward and backward, respectively.

II General Method

We consider a quantum system being coupled to the environment, as is schematically displayed in Fig. 1(a). Under the Born-Markov approximation, the equation of motion of the system is governed by the Lindblad master equation Lindblad1976 ; Gorini1976 ; Gardiner2004

\dot{\rho}=L[\rho],

(1)

where $\rho$ is the density matrix of the quantum system and $L$ is the Liouvilian super-operator in the form of (setting $\hbar=1$ )

L[\rho]=-i[H,\rho]+\sum_{k}[J_{k}\rho J_{k}^{\dagger}-\frac{1}{2}\{J_{k}^{\dagger}J_{k},\rho\}].

(2)

Here, $H$ denotes the dissipation-independent Hamiltonian, $J_{k}$ is the dissipative operator (or jumping operator) in the $k$ -th channel, and $\{J_{k}^{\dagger}J_{k},\rho\}=J_{k}^{\dagger}J_{k}\rho+\rho J_{k}^{\dagger}J_{k}$ accounts for the normalization of $\rho$ as if no jump occurs.

Conventionally, with certain Liouvilian $L$ (namely $H$ and $J_{k}$ ), one can obtain the solution $\rho(t)$ by numerically solving the Lindblad equation (Eq.(2)) using ordinary differential equation (ODE) solvers such as the Euler or the Runge-Kutta Scherer2013 . In contrast, the goal of this paper is to determine the dissipation $J_{k}$ and even $H$ when the dynamical behaviors of density matrix $\rho(t)$ or certain observations $\langle O(t)\rangle=\text{Tr}[O\rho(t)]$ are given. In other words, we aim to deal with an inverse problem to reproduce $L$ from the sequential dataset $\mathcal{D}=\{\rho(t_{n})\text{ or }\langle O(t_{n})\rangle\}$ , as illustrated in Fig. 1(b). Additionally, for the sake of experimental application, we consider that $\mathcal{D}$ is purely constructed by observations in the following discussion, i.e. $\mathcal{D}=\{\langle O(t_{n})\rangle\}$ , since compared to $\rho(t)$ , acquiring $\{\langle O(t_{n})\rangle\}$ is experimentally simpler and more feasible.

We adopt two different probes to generate the dataset $\mathcal{D}$ . i) Time-dependent probe: for a fixed $\rho(t_{0})$ , one imposes a t-dependent control $H_{\text{prob}}(t)$ , i.e.,

H=H_{0}+H_{\text{prob}}(t),

(3)

where $H_{\text{prob}}=\mathcal{P}(t)\cdot\mathbf{A}=\sum_{\mu}p_{\mu}(t)A_{\mu}$ with $p_{\mu}(t)$ being some smooth series and $A_{\mu}$ the according control operators. ii) Time-independent probe: with fixed $H$ and $J_{k}$ , one diversifies $\langle O(t)\rangle$ by varying $\rho(t_{0})$ , which suits the experimental scenarios in which preparing different initial states is more convenient. For each evolution, we perform the measurement $\langle O(t_{n})\rangle$ uniformly at discrete time within the time range $t_{n}\in[0,t_{N})$ , where $t_{n}=nt_{N}/N_{\text{ts}}$ with $N_{\text{ts}}$ being the time steps of measurement, which forms a data batch. The entire dataset is constructed by different batches, i.e.,

\mathcal{D}=\{\langle O(\mathcal{P}_{s},t_{n})\rangle|s\in[1,N_{\text{bs}}];n\in[0,N_{\text{ts}}-1]\},

(4)

where $N_{\text{bs}}$ denotes the batch-size. Therefore, the total number of data points in $\mathcal{D}$ is equal to $N=N_{\text{bs}}N_{\text{ts}}$ . Here, we mentioned that the discrete-time observations with equal intervals are considered for experimental convenience, which is however not a necessary condition for the learning algorithm that we will show below.

To learn the Liouvilian from $\mathcal{D}$ , we adopt a machine learning algorithm called the NODE Chen2019 . The NODE builds upon an ansatz $\dot{\rho}(\rho(t),t,\theta)$ with $\theta=\{\alpha,\gamma\}$ , where $\alpha$ and $\gamma$ denote the parameters to be learnt in $H$ and $L_{k}$ , respectively. The learning process is illustrated in Fig. 1(c), and it contains two parts. First, with $\dot{\rho}(\rho(t),t,\theta)$ and the initial state $\rho(t_{0})$ , we propagate the Lindblad equation (2) forward by using certain ODE solvers, and then obtain a series of predictive solutions $\{\rho_{\text{pred}}(t_{n})\}$ ; the loss function $\mathcal{L}$ is defined as a functional of $\langle O(t_{n})\rangle_{\text{pred}}$ and $\langle O(t_{n})\rangle_{\text{real}}\in\mathcal{D}$ , which effectively measures the distance between $\mathcal{D}$ and the NODE predictions. The purpose of learning is to minimize $\mathcal{L}$ by adjusting $\theta$ . To this end, the NODE introduces an adjoint field defined by $a(t)=\partial\mathcal{L}(t)/\partial\rho(t)$ . Through backward propagating $a(t)$ from $t_{N}$ to $t_{0}$ , we obtain $d\mathcal{L}/d\theta|_{t=t_{0}}$ which is then be used to update the parameters in the way of $\theta\rightarrow\theta-\lambda(d\mathcal{L}/d\theta|_{t=t_{0}})$ with $\lambda$ being the learning rate. More calculation details about the NODE algorithm can be found in Appendix A.

III Examples

We apply this method to two concrete examples. In the first example, we consider a one-body spin system with the spin quantum number being $S=3/2$ . Large spin systems are active in various research areas of quantum physics ranging from high-spin quantum dots Klochan2011 ; Doherty2013 , multi-component quantum gases Kawaguchi2012 to unconventional superconductors Wang2018 . In the second example, our system is a many-body spin-1/2 chain. Spin chains stand as fundamental models in condensed matter physics and quantum computation, which are closely related to quantum criticality Sachdev2011 , topological phase of matters Chiu2016 ; Wen2017 , etc.

III.1 Spin-3/2 system

The general spin-3/2 system refers to the four-level system with spin-vector operators $\{S_{x},S_{y},S_{z}\}$ being the generalized Pauli matrices. Since the spin-tensor operators may also be involved due to, for example, the quadratic Zeeman effect, we generally expand the Hamiltonian by the SU(4) generators, i.e.,

H_{0}=-\sum_{\mu=1}^{15}\alpha_{\mu}\hat{g}_{\mu},

(5)

where $\hat{g}_{\mu}$ denote the 15 Hermitian generators satisfying traceless $\text{Tr}[\hat{g}_{\mu}]=0$ and orthogonal $\text{Tr}[\hat{g}_{\mu}\hat{g}_{\nu}]=2\delta_{\mu\nu}$ conditions, and $\alpha_{\mu}$ are the corresponding coefficients. Furthermore, we assume that the system possesses weak dissipations in these Hermitian channels, namely

J_{\mu}=\sqrt{\gamma_{\mu}}\hat{g}_{\mu},

(6)

with $\gamma_{\mu}$ being the dissipative strength. The weakness is reflected in $\bar{\gamma}\ll\bar{\alpha}$ , where $\bar{\alpha}=N_{\alpha}^{-1}\sum_{\mu}|\alpha_{\mu}|$ and $\bar{\gamma}=N_{\gamma}^{-1}\sum_{\mu}\gamma_{\mu}$ respectively denote the mean strength of $\alpha$ and $\gamma$ , and $N_{\alpha,\gamma}$ are the number of parameters. Remark that our goal is to learn the parameters $\theta=\{\alpha,\gamma\}$ from the dataset $\mathcal{D}$ .

The dataset $\mathcal{D}$ is generated by evolving the Lindblad equation Eq. (1) within $t\in[0,20/\bar{\alpha})$ and by uniformly measuring the transverse magnetization, i.e., $\langle O\rangle=\langle S_{x}\rangle$ with $\{S_{x},S_{y},S_{z}\}$ the generalized Pauli operators. For general consideration, $\alpha$ and $\gamma$ are randomized within $\alpha\in[-1,1]\bar{\alpha}$ and $\gamma\in[0,2]\bar{\gamma}$ , where $\bar{\gamma}=0.02\bar{\alpha}$ . As mentioned before, we adopt two different probes, and for the t-dependent probe [probe i)], we fix $\rho(t_{0})=\left|3/2,3/2\right\rangle\left\langle 3/2,3/2\right|$ to be a magnetized state along $S_{z}$ and introduce a control magnetic field $H_{\text{prob}}=\sum_{\mu=x,y,z}p_{\mu}(t)S_{\mu}$ with $p_{\mu}(t)=2\bar{\alpha}\sum_{m=1}^{10}\sin(2\pi\omega_{m}t)$ and $\omega_{m}$ being randomized within $\omega_{m}\in[-1,1]\bar{\alpha}$ ; for the t-independent probe [probe ii)], we take $\rho(t_{0})$ to be random pure states that satisfy $\text{Tr}[\rho^{2}(t_{0})]=1$ . The total data numbers of $\mathcal{D}$ is $N=10^{3}$ with $N_{\text{bs}}=10$ and $N_{\text{ts}}=10^{2}$ .

In the learning process, we minimize the mean square error (MSE) loss function

\mathcal{L}=\frac{1}{N}\sum_{s,n}\left[\langle S_{x}(\mathcal{P}_{s},t_{n})\rangle_{\text{pred}}-\langle S_{x}(\mathcal{P}_{s},t_{n})\rangle_{\text{real}}\right]^{2},

(7)

and monitor the averaged relative error with respect to the parameters, i.e.,

\mathcal{E}_{\theta=\{\alpha,\gamma\}}=\overline{\frac{1}{\bar{\theta}}\sqrt{\frac{1}{N_{\theta}}\sum(\theta_{\text{pred}}-\theta_{\text{real}})^{2}}},

(8)

where the additional overscore on the right-hand side means to take a further average on $N_{\text{ini}}$ different initializations of $\theta$ . Here, we take $N_{\text{ini}}=10$ .

In Fig. 2, we present the learning results of $\mathcal{D}$ using the t-dependent probe, while leave those using the t-independent probe in Appendix B. Specifically, Fig. 2(a) shows the variation of $\mathcal{L}$ , $\mathcal{E}_{\alpha}$ and $\mathcal{E}_{\gamma}$ on training epochs, from which one can read that the algorithm converges at approximately $5\times 10^{3}$ epoch. After convergence, we compare the predictive parameters $\alpha$ and $\gamma$ (labeled by circles with error bars) with their realistic values (solid triangles) in subfigures (c) and (d), respectively, where the error bars indicate the standard deviation because of different initializations. Clearly, the algorithm successfully reproduces the parameters with relative errors $\mathcal{E}_{\alpha}\lesssim 5\text{\textperthousand}$ and $\mathcal{E}_{\gamma}\lesssim 2\%$ . Fig. 2(b) presents the spin dynamics $\langle S_{x}(t)\rangle$ of a typical batch, where the solid, dashed and dot-dashed lines correspond to the realistic $\langle S_{x}(t)\rangle_{\text{real}}$ , the predictive curve before training, and the predictive one at the $10^{3}$ training epoch respectively, with shadings indicating the predictive fluctuations due to initializations. As the training progresses, $\langle S_{x}\rangle_{\text{pred}}$ gradually approaches $\langle S_{x}\rangle_{\text{real}}$ accompanied by the diminishment of predictive fluctuations.

III.2 Spin-1/2 Chain

The second example is a spin-1/2 chain with nearest-neighbor interactions, whose Hamiltonian is written as

H_{0}=-\sum_{i,\mu}\alpha_{\mu}^{i}\sigma_{\mu}^{i}-\sum_{i,\mu,\nu}\alpha_{\mu\nu}^{i}\sigma_{\mu}^{i}\sigma_{\nu}^{i+1}.

(9)

The first term accounts for the local terms with $\sigma_{\mu=x,y,z}^{i}$ the spin-1/2 Pauli operators with $\alpha_{\mu}^{i}$ the corresponding strength. The second term characterizes the two-body interactions with strength $\alpha_{\mu\nu}^{i}$ . The one-body and two-body dissipations are considered in the according channels

\displaystyle J_{\mu}^{i}

\displaystyle=\sqrt{\gamma_{\mu}^{i}}\sigma_{\mu}^{i},\;\;\;\;\;J_{\mu\nu}^{i}

\displaystyle=\sqrt{\gamma_{\mu\nu}^{i}}\sigma_{\mu}^{i}\sigma_{\nu}^{i+1},

(10)

with $\gamma_{\mu}$ and $\gamma_{\mu\nu}$ the corresponding strength. As a proof-of-principle demonstration, we practically set the chain length to 5 under a periodical boundary condition, which keeps the numerical complexity within the computational power of a PC with two GPUs Complexity . In such a case, there is a total of 120 parameters that need to be learned, 60 each for $\alpha$ and $\gamma$ . Again, we generate two datasets using the t-dependent and the t-independent probes. For the former, we set $\rho(t_{0})=\bigotimes_{i}\left|\uparrow\right\rangle_{i}\left\langle\uparrow\right|_{i}$ and $H_{\text{prob}}=\sum_{\mu}p_{\mu}S_{\mu}$ with $S_{\mu=x,y,z}=\sum_{i}\sigma_{\mu}^{i}$ the total spin operators and $p_{\mu}=2\bar{\alpha}\sum_{m=1}^{10}\sin(2\pi\omega_{m}t)$ ; for the latter, $\rho(t_{0})$ are chosen to be random product states. The measured observables are the total spins, i.e., $\langle O\rangle=\langle S_{\mu}\rangle$ . More detailed settings are listed below: $t\in[0,5/\bar{\alpha}]$ , $N=10^{3}$ with $N_{\text{bs}}=20$ and $N_{\text{ts}}=50$ , randomized $\alpha\in[0,1]\bar{\alpha}$ , $\gamma\in[0,0.02]\bar{\alpha}$ , $\omega_{m}\in[-2,2]\bar{\alpha}$ , and $N_{\text{ini}}=10$ .

In Fig. 3, we display the learning task with the t-independent probe while leaving the other task (using t-dependent probe) in Appendix B. All instructions of the Fig. 3 are similar to those used in Fig. 2 except that $\langle S_{x}\rangle$ now denotes the total magnetization. The accurate predictions with $\mathcal{E}_{\alpha}\lesssim 1\text{\textperthousand}$ and $\mathcal{E}_{\gamma}\lesssim 1\%$ shown in Figs. 3(c) and (d) clearly demonstrate the feasibility of the algorithm on many-body spin systems.

Here, we would like to make several additional comments. At first, the system shown above carries no symmetry, which in principle allows us to learn all the parameters by looking at a global (or local) observable $\langle O\rangle$ . However, if the system intrinsically carries certain symmetry, then all the accessible operators associated with $O$ may be limited within certain subspaces, which forms the accessible set Zhang2014 . In this case, only the parameters related to the accessible set can be determined by the current time-trace approach. Second, there is a case in which the NODE fails to make unique predictions even for systems without intrinsic symmetry. The NODE can access to three parts of information — the initial state $\rho_{0}$ , the probe field $H_{\text{prob}}$ , and the measured data $\langle O\rangle\in\mathcal{D}$ , and hence it cannot make unique predictions if all the three parts share a common ”symmetry”. For example in the learning task using t-dependent probe (see Appendix B), $\rho(t_{0})$ , $H_{\text{prob}}$ and $\langle S_{x}\rangle$ are all translational invariant (namely independent on local spins), which leads to unfavorable learning results because of the ”symmetry”-induced ambiguity. This problem, however, does not occur for the task with the t-independent probe illustrated in Fig. 3, since the employed initial states $\rho(t_{0})$ explicitly break the translational ”symmetry”.

IV Learning Efficiency

In the above, we illustrated the capability of the algorithm in learning open quantum systems. Now, we turn to the question of learning efficiency — how to collect the data points can make the learning more efficient? We emphasize the importance of this question on data acquisition in realistic experiments, especially when collecting data points is expensive or time-consuming.

To make this question simple and clear, let us focus on the situation that $\gamma$ are the only unknown parameters that we would like to learn. Moreover, since different batches are independent of each other in $\mathcal{D}$ , we consider $\mathcal{D}$ only contains one batch such that the total number of data points is $N=N_{\text{ts}}$ . As mentioned before, the NODE adjusts parameters $\gamma_{\mu}$ according to $d\mathcal{L}/d\gamma_{\mu}$ , which motivates us to define

\eta=\frac{1}{NN_{\gamma}}\sum_{n=1}^{N}\sum_{\mu=1}^{N_{\gamma}}\left|\frac{d\mathcal{L}_{n}}{d\gamma_{\mu}}\right|,

(11)

where $\mathcal{L}_{n}$ means the local loss function with respect to a single data point $\langle O(t_{n})\rangle$ . The physical meaning of $\eta$ is quite clear, it characterizes the averaged sensitivity of $\mathcal{L}$ on $\gamma$ such that a large $\eta$ would speed up the learning process. To further simplify the Eq. (11), we replace the average of individual derivatives by the derivative to the mean value of $\gamma_{\mu}$ , i.e.,

\eta=\frac{1}{N}\sum_{n=1}^{N}\left|\frac{d\mathcal{L}_{n}}{d\bar{\gamma}}\right|.

(12)

For weak dissipation $\bar{\gamma}\ll\bar{\alpha}$ , $\eta$ is closely related to

\displaystyle\chi(t)

\displaystyle=\left|\frac{d\langle O(t)\rangle}{d\bar{\gamma}}\right|=\frac{\Delta_{1}te^{-\Delta_{1}t}}{\bar{\gamma}}|f(t)|,

(13)

which is composed of two parts: $|f(t)|$ is a fast-oscillating term characterized by the energy scale of $\bar{\alpha}$ , while $te^{-\Delta_{1}t}$ is a slowly varying envelop depicting the dissipation-induced damping of $\langle O(t)\rangle$ , where $\Delta_{1}\propto\bar{\gamma}$ is the real part of the Liouvilian gap. Obviously, the envelop of $\chi(t)$ is a non-monotonic function being maximized at $t=t_{\text{dc}}$ (see $\chi(t)$ and the $\chi^{2}(t)$ in Fig. 4(a)), with $t_{\text{dc}}=1/\Delta_{1}$ being the decoherent time.

The most commonly used two loss functions, mean absolute error (MAE) and MSE, are linearly and quadratically proportional to $\chi(t)$ as $\bar{\gamma}$ approaches its realistic value $\bar{\gamma}_{\text{real}}$ , i.e.,

	$\displaystyle\eta_{\text{MAE}}$	$\displaystyle=\frac{1}{N}\sum_{n}\chi(t_{n}),$		(14)
	$\displaystyle\eta_{\text{MSE}}$	$\displaystyle\propto\frac{1}{N}\sum_{n}\chi^{2}(t_{n}),$		(14)

where MAE is defined by $\mathcal{L}_{n}=|\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}|$ , while the definition of MSE is presented in Eq. (7). For uniform measurement $t_{n}=nt_{N}/N_{\text{ts}}$ , the summations in Eq. (14) can be analytically calculated in the large $N$ limit, i.e., ( $t_{N}$ in the unit of $t_{\text{dc}}$ )

	$\displaystyle\lim_{N\rightarrow\infty}\eta_{\text{MAE}}$	$\displaystyle=\frac{1-e^{-t_{N}}-t_{N}e^{-t_{N}}}{t_{N}},$		(15)
	$\displaystyle\lim_{N\rightarrow\infty}\eta_{\text{MSE}}$	$\displaystyle\propto\frac{1-e^{-2t_{N}}-2t_{N}(t_{N}+1)e^{-2t_{N}}}{4t_{N}},$		(15)

where both are non-monotonic with maximal values lying at $t_{N}^{\text{opt}}\approx 1.8t_{\text{dc}}$ and $t_{N}^{\text{opt}}\approx 1.7t_{\text{dc}}$ , respectively. This indicates that the optimal strategy of uniform data acquisition is to take $t_{N}=t_{N}^{\text{opt}}$ , as is schematically illustrated by dots in Fig. 4(a). We benchmark the above analysis on the spin-3/2 learning task using the t-dependent probe and the MSE loss. This task is with $t_{\text{dc}}\approx 9/\bar{\alpha}$ . Practically, we set $N=15$ and plot the variations of $\mathcal{E}_{\gamma}$ and $\eta$ in Figs. 4(b) and (c) respectively, where $\eta$ is numerically calculated using Eq. (11). Clearly, $t_{N}=t_{N}^{\text{opt}}=1.7t_{\text{dc}}$ exhibits the largest $\eta$ and the fastest decay rate of $\mathcal{E}_{\gamma}$ throughout the learning process, which is in good agreement with our previous discussion.

Finally, we briefly discuss the effect of the total data number $N$ . Therefore, we fix $t_{N}=t_{N}^{\text{opt}}$ and show the dependence of $\mathcal{E}_{\gamma}$ on $N$ in Fig. 4(d). An apparent feature is that the cases with $N<N_{\gamma}=15$ exhibit poor learning results, which is understandable since we need at least $N_{\gamma}$ data points to uniquely determine all $\gamma$ , resembling that one needs at least $N_{\gamma}$ equations to determine $N_{\gamma}$ variables. Furthermore, an increasing $N$ can accelerate the learning process, however, this trend will not continue endlessly (comparing the curves $N=15$ , $N=25$ and $N=40$ in Fig. 4(d)). For a fairly large $N$ , neighboring data points exhibit tiny distinguishment such that adding more data points would be of little help for the learning efficiency.

V Summary and Outlook

We proposed a scheme to learn the quantum dissipation of open systems based on the NODE, a machine learning algorithm being able to reproduce the Liouvilian from dynamical observations. The learning process can be accelerated by optimizing the strategy of data collection. There are many follow-up questions. The Lindblad master equation relies on the Born-Markov approximation, which limits the current method to weakly dissipative systems with short correlation time. The generalization from Markov to non-Markov is not so straightforward, since the memory effect leads to an integro-differential master equation Gardiner2000 ; Vega2017 . More advanced techniques are expected to deal with the gradients in relation to the integral kernel. Whether an effective Markovian description can be found for non-Markovian dynamics is still left open. Furthermore, the full-Liouvilian calculation shown above cannot be easily extended to large-scale many-body systems due to the exponential growth of the Hilbert space. One possible solution is to combine the NODE with quantum-trajectory approaches Gardiner2000 ; Daley2014 such as the truncated Wigner method Schachenmayer2015 ; Huber2022 and the tDMRG-quantum-trajectory method Daley2009 . In this regard, several recently developed machine-learning-based solvers Carleo2017 ; Nagy2019 ; Hartmann2019 ; Vicentini2019 ; Mazza2021 ; Liu2022 ; Luo2022 may provide valuable insights. Additionally, with a dissipative model, can machine learning algorithms help design the corresponding protocols for system-environment decoupling? We expect this work, as well as these questions, to stimulate more interdisciplinary studies in the fields of machine learning and open quantum systems.

Acknowledgements.

L.C. would like to thank Hui Zhai, Ce Wang, Juan Yao, Lei Pan, Yang Shen, and Sen Yang for the fruitful discussion. L.C. acknowledges support from the National Natural Science Foundation of China (Grant Nos. 12174236 and 12147215) and the postdoctoral fellowship offered by Hui Zhai. Part of this work was done during the fellowship at Tsinghua University, Beijing.

Appendix A Neural Ordinary Differential Equation

The NODE Chen2019 inherits the basic idea of the residual network He2016 and is able to reconstruct the differential equations satisfied by certain sequential data $\mathcal{D}$ . The NODE makes the ansatz $\dot{\boldsymbol{\rho}}=\boldsymbol{f}[\boldsymbol{\rho}(t),\boldsymbol{\theta},t]$ . If the specific form of $\boldsymbol{f}$ is priorly unknown, a deep neural network can be used to establish the mapping from $\{\boldsymbol{\rho}(t),\boldsymbol{\theta},t\}$ to $\boldsymbol{f}$ . However, if certain prior information is available, e.g., $\boldsymbol{f}$ satisfies the Lindblad equation as the situation considered in this work, the question comes to determine the unknown parameters $\boldsymbol{\theta}$ in $\boldsymbol{f}$ . Given an initial state $\boldsymbol{\rho}(t_{0})$ , one can propagate the NODE forwardly from $t_{0}$ to $t_{N}$ using certain ODE solvers and obtain predictive solutions $\{\dot{\boldsymbol{\rho}}_{\text{pred}}(t_{n})\}$ . The loss function $\mathcal{L}$ is defined as the effective distance between the $\{\dot{\boldsymbol{\rho}}_{\text{pred}}(t_{n})\}$ and the realistic values $\{\dot{\boldsymbol{\rho}}_{\text{real}}(t_{n})\}\in\mathcal{D}$ . Next, we show how to obtain the derivative $d\mathcal{L}/d\boldsymbol{\theta}$ , which is based on the the adjoint field method Chen2019 .

The augmented adjoint state $\boldsymbol{a}_{\text{aug}}$ is defined as the derivative of $\mathcal{L}$ with respect to the augmented states $\boldsymbol{\rho}_{\text{aug}}=\left[\boldsymbol{\rho},\boldsymbol{\theta},t\right]^{T}$ , i.e.,

$\displaystyle\boldsymbol{a}_{\text{aug}}$	$\displaystyle=\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}_{\text{aug}}}$	(16)
	$\displaystyle=\left[\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}(t)},\frac{\partial\mathcal{L}}{\partial\boldsymbol{\theta}(t)},\frac{\partial\mathcal{L}}{\partial t}\right]^{T}$
	$\displaystyle=\left[\boldsymbol{a},\boldsymbol{a}_{\theta},a_{t}\right]^{T},$

which satisfies the differential equation

\displaystyle\frac{d\boldsymbol{a}_{\text{aug}}^{T}}{dt}

\displaystyle=-\boldsymbol{a}_{\text{aug}}^{T}\cdot\frac{\partial\boldsymbol{f}_{\text{aug}}}{\partial\boldsymbol{\rho}_{\text{aug}}^{T}},

(17)

where

\boldsymbol{f}_{\text{aug}}=\frac{d\boldsymbol{\rho}_{\text{aug}}}{dt}=\left[\dot{\boldsymbol{\rho}}(t),\dot{\boldsymbol{\theta}},\dot{t}\right]^{T}=\left[\boldsymbol{f},\mathbf{0},1\right]^{T}

(18)

is the derivative of the augmented state with respect to $t$ , and

\frac{\partial\boldsymbol{f}_{\text{aug}}}{\partial\boldsymbol{\rho}_{\text{aug}}^{T}}=\frac{\partial\left[\boldsymbol{f},\mathbf{0},1\right]}{\partial\left[\boldsymbol{\rho},\boldsymbol{\theta},t\right]^{T}}=\begin{bmatrix}\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{\rho}}&\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{\theta}}&\frac{\partial\boldsymbol{f}}{\partial t}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&0\\ \end{bmatrix}

(19)

is the Jacobean matrix. Substituting Eqs. (19) and (16) into Eq. (17) we have

\left[\dot{\boldsymbol{a}}^{T},\dot{\boldsymbol{a}}_{\theta}^{T},\dot{a_{t}}\right]=-\left[\boldsymbol{a}^{T}\cdot\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{\rho}^{T}},\boldsymbol{a}^{T}\cdot\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{\theta}^{T}},\boldsymbol{a}^{T}\cdot\frac{\partial\boldsymbol{f}}{\partial t}\right],

(20)

and through backwardly propagating which from $t_{N}$ to $t_{0}$ we obtain the derivative $\boldsymbol{a}_{\theta}(t_{0})=d\mathcal{L}/d\boldsymbol{\theta}|_{t=t_{0}}$ . Particularly, the boundary condition for the backward propagation is given by

$\displaystyle\boldsymbol{a}(t_{N})$	$\displaystyle=$	$\displaystyle\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}(t_{N})},$
$\displaystyle\boldsymbol{a}_{\theta}(t_{N})$	$\displaystyle=$	$\displaystyle\left.\frac{\partial\mathcal{L}}{\partial\boldsymbol{\theta}}\right\|_{t=t_{N}}=\left.\left(\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}^{T}}\cdot\frac{\partial\boldsymbol{\rho}}{\partial t}\right)\frac{\partial t}{\partial\boldsymbol{\theta}}\right\|_{t=t_{N}}=\mathbf{0},$	(21)
$\displaystyle a_{t}(t_{N})$	$\displaystyle=$	$\displaystyle\left.\frac{\partial\mathcal{L}}{\partial t}\right\|_{t=t_{N}}=\left.\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}^{T}}\cdot\frac{\partial\boldsymbol{\rho}}{\partial t}\right\|_{t=t_{N}}=\boldsymbol{a}^{T}(t_{N})\cdot\boldsymbol{f}(t_{N}),$

where $\partial\mathcal{L}/\partial\boldsymbol{\rho}(t_{N})$ can be directly calculated using the automatic differentiation toolbox Baydin2018 ; Paszke2017 .

Note that, for a sequential data with intermediate data points at $t_{n}$ , the total loss function $\mathcal{L}$ is a summation of individual $\mathcal{L}_{n}$ , and hence $d\mathcal{L}/d\boldsymbol{\theta}$ is simply the average over all the periods of back propagation from $t_{n}$ to $t_{n-1}$ , i.e., $d\mathcal{L}/d\boldsymbol{\theta}=N_{\text{ts}}^{-1}\sum_{n=0}^{N_{\text{ts}}-1}\boldsymbol{a}_{\theta}(t_{n})$ .

Appendix B Complementary Result of the Two Examples

Here, we provide more information on the two examples. The 4-by-4 Hamiltonian $H$ of the spin-3/2 model can be expanded by the SU(4) Hermitian generators $\hat{g}_{\mu}$ in the fundamental representation. We practically take the matrices as Bertlmann2008

$\displaystyle\hat{g}_{1}$	$\displaystyle=\begin{pmatrix}0&1&0&0\\ 1&0&0&0\\ 0&0&0&0\\ 0&0&0&0\end{pmatrix},\ \hat{g}_{2}=\begin{pmatrix}0&-i&0&0\\ i&0&0&0\\ 0&0&0&0\\ 0&0&0&0\end{pmatrix},\ \hat{g}_{3}=\begin{pmatrix}1&0&0&0\\ 0&-1&0&0\\ 0&0&0&0\\ 0&0&0&0\end{pmatrix},\ \hat{g}_{4}=\begin{pmatrix}0&0&1&0\\ 0&0&0&0\\ 1&0&0&0\\ 0&0&0&0\end{pmatrix},\ \hat{g}_{5}=\begin{pmatrix}0&0&-i&0\\ 0&0&0&0\\ i&0&0&0\\ 0&0&0&0\end{pmatrix},$	(22)
$\displaystyle\hat{g}_{6}$	$\displaystyle=\begin{pmatrix}0&0&0&0\\ 0&0&1&0\\ 0&1&0&0\\ 0&0&0&0\end{pmatrix},\hat{g}_{7}=\begin{pmatrix}0&0&0&0\\ 0&0&-i&0\\ 0&i&0&0\\ 0&0&0&0\end{pmatrix},\hat{g}_{8}=\frac{1}{\sqrt{3}}\begin{pmatrix}1&0&0&0\\ 0&1&0&0\\ 0&0&-2&0\\ 0&0&0&0\end{pmatrix},\hat{g}_{9}=\begin{pmatrix}0&0&0&1\\ 0&0&0&0\\ 0&0&0&0\\ 1&0&0&0\end{pmatrix},\hat{g}_{10}=\begin{pmatrix}0&0&0&-i\\ 0&0&0&0\\ 0&0&0&0\\ i&0&0&0\end{pmatrix},$
$\displaystyle\hat{g}_{11}$	$\displaystyle=\begin{pmatrix}0&0&0&0\\ 0&0&0&1\\ 0&0&0&0\\ 0&1&0&0\end{pmatrix},\hat{g}_{12}=\begin{pmatrix}0&0&0&0\\ 0&0&0&-i\\ 0&0&0&0\\ 0&i&0&0\end{pmatrix},\hat{g}_{13}=\begin{pmatrix}0&0&0&0\\ 0&0&0&0\\ 0&0&0&1\\ 0&0&1&0\end{pmatrix},\hat{g}_{14}=\begin{pmatrix}0&0&0&0\\ 0&0&0&0\\ 0&0&0&-i\\ 0&0&i&0\end{pmatrix},\hat{g}_{15}=\frac{1}{\sqrt{6}}\begin{pmatrix}1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&-3\end{pmatrix}.$

These generators naturally satisfy the traceless condition $\text{Tr}[\hat{g}_{\mu}]=0$ and the orthogonal condition $\text{Tr}[\hat{g}_{\mu}\hat{g}_{\nu}]=2\delta_{\mu\nu}$ , which are also the properties that are supposed to be satisfied by the Lindblad dissipative operators $J_{k}$ . One may check that the one-body and the two-body dissipative operators of the second example (spin-1/2 chain) also satisfy these two conditions. Both conditions would be broken if any $J_{k}$ is with a finite trace $a_{k}=\text{Tr}[\hat{g}_{\mu}]\neq 0$ . However, since the Lindblad equation Eq. (2) is invariant under the transformation

	$\displaystyle J_{k}$	$\displaystyle\longrightarrow J_{k}-a_{k},$		(23)
	$\displaystyle H$	$\displaystyle\longrightarrow H+\frac{i}{2}\sum_{\mu}[a_{k}^{}J_{k}-(a_{k}^{}J_{k})^{\dagger}],$		(23)

one can always recover the conditions by absorbing $a_{k}$ into $H$ . Note that, the Lindblad equation has another invariance $J_{k}\rightarrow e^{i\phi_{k}}J_{k}$ with $\phi_{k}$ being an arbitrary phase factor, which implies that the phase of $\gamma_{k}$ is meaningless, and hence we set $\gamma_{k}$ positive throughout this work.

The first row of Fig. 5 shows the learning result of the spin-3/2 model using the t-independent probe, where subfigures denote the dependence of $\mathcal{L}$ , $\mathcal{E}_{\alpha}$ and $\mathcal{E}_{\gamma}$ on training epochs (a1), the spin dynamics before and during training (a2), and the predictive parameters $\alpha$ (a3) and $\gamma$ (a4) after training, respectively. It is indicated that the NODE can also accurately reproduce the spin-3/2 model with predictive errors $\mathcal{E}_{\alpha}\lesssim 1\text{\textperthousand}$ and $\mathcal{E}_{\gamma}\lesssim 5\%$ . The second row of Fig. 5 displays the result with respect to the spin-chain model using the t-dependent probe. Specifically in Fig. 5(b1), the data points were collected on the measurement of total spins. There, the loss function $\mathcal{L}$ decreases but the predictive errors $\mathcal{E}_{\alpha,\gamma}$ barely decrease. In contrast, if the measurement is performed on the local spins, favorable results are obtained as is illustrated in the Figs. 5(b2)-(b4). These behaviors verify the ”symmetry” remarks discussed in the main text, since the local observations $\langle\sigma_{\mu}^{i}\rangle$ explicitly break the translational ”symmetry”.

Appendix C General Solution of Lindblad Equation

We obtain the solution of the Lindblad equation by mapping the Liouvilian operator $L$ into the double Hilbert space, which is generally known as the Choi-Jamiolkwski isomorphism Choi1975 ; Jamiolkowski1972 or vectorization, i.e.,

	$\displaystyle\tilde{L}=$	$\displaystyle-i[\mathbb{I}\otimes H-H^{T}\otimes\mathbb{I}]$		(24)
		$\displaystyle+\sum_{k}\left[J_{k}^{*}\otimes J_{k}-\frac{1}{2}\left(\mathbb{I}\otimes J_{k}^{\dagger}J_{k}+(J_{k}^{\dagger}J_{k})^{T}\otimes\mathbb{I}\right)\right],$		(24)

where $\tilde{L}$ is the vectorized Liouvilian operator and $\mathbb{I}$ is an identity operator with the same shape as that of $H$ . Generally, $\tilde{L}$ is non-Hermitian with the spectrum structure $\tilde{L}=\sum_{m}\epsilon_{m}\mathbf{u}_{m}\cdot\mathbf{v}^{\dagger}_{m}$ where $\epsilon_{m}$ denotes the Liouvilian spectrum, and $\mathbf{u}_{m}$ and $\mathbf{v}_{m}$ are the right and the left eigenvectors, respectively. In such a framework, the general solution of the Lindblad equation can be obtained as

\tilde{\boldsymbol{\rho}}(t)=\sum_{m}\lambda_{m}e^{\epsilon_{m}t}\mathbf{u}_{m},

(25)

where $\tilde{\boldsymbol{\rho}}$ is the vectorized density operator, and

\lambda_{m}=\mathbf{v}^{\dagger}_{m}\cdot\tilde{\boldsymbol{\rho}}(t_{0})

(26)

characterizes the projective coefficients with respect to the initial state $\tilde{\boldsymbol{\rho}}(t_{0})$ .

The Liouvilian spectrum $\epsilon_{m}=\epsilon_{m}^{r}+i\epsilon_{m}^{i}$ is generally complex, where the imaginary part $\epsilon_{m}^{i}$ characterizes the undamped oscillations, while the real part $\epsilon_{m}^{r}\leq 0$ accounts for the damping of $\mathbf{u}_{m}$ during evolution. In the spectrum, it is well-known that there exists an undamped state $\mathbf{u}_{0}$ with $\epsilon_{0}=0$ , which is the steady state to which the system relaxes after a long evolution. The energy gap $\Delta=\epsilon_{0}-\epsilon_{1}=\Delta_{1}+i\Delta_{2}$ between the slowest damping state $\mathbf{u}_{1}$ and the steady state $\mathbf{u}_{0}$ is the Liouvilian gap. The real gap $\Delta_{1}>0$ determines the decoherent time in the way of $t_{\text{dc}}=1/\Delta_{1}$ . For weak dissipation $\bar{\gamma}\ll\bar{\alpha}$ , $\Delta_{1}$ is linearly proportional to the dissipative strength $\bar{\gamma}$ , i.e., $\Delta_{1}=-C\bar{\gamma}$ , with $C$ a non-universal factor depending on the number of dissipation channels and the particular form of external probes, whereas the imaginary gap $\Delta_{2}$ is characterized by the energy scale of $\bar{\alpha}$ . Hence, we generally have $\Delta_{1}\ll|\Delta_{2}|$ .

The dataset $\mathcal{D}$ is constructed by the measurement

	$\displaystyle\left\langle O(t)\right\rangle$	$\displaystyle=\sum_{m}\lambda_{m}e^{\epsilon_{m}t}\tilde{O}^{\dagger}\cdot\mathbf{u}_{m}$		(27)
		$\displaystyle\approx\lambda_{0}(\tilde{O}^{\dagger}\cdot\mathbf{u}_{0})+\lambda_{1}e^{-\Delta_{1}t}f(t),$		(27)

where $\tilde{O}$ is the vectorized observable $O$ and $f(t)=e^{-i\Delta_{2}t}(\tilde{O}^{\dagger}\cdot\mathbf{u}_{1})+\text{c.c.}$ In the second line, we have neglected the contributions of faster damping states with $m>1$ . Clearly, the first term is a t-independent constant; the second term exhibits a fast oscillation $f(t)$ modulated by a slowly damped envelop $e^{-\Delta_{1}t}$ . Taking derivative of Eq. (27) with respect to $\bar{\gamma}$ leads to $\chi(t)$ [Eq. (13)], based on which one can obtain $\eta$ as $\bar{\gamma}$ approaches the realistic value $\bar{\gamma}_{\text{real}}$ . Particularly for the MAE loss $\eta_{\text{MAE}}$ , we have

$\displaystyle\eta_{\text{MAE}}$	$\displaystyle=\frac{1}{N}\sum_{n}\left\|\frac{d\|\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}\|}{d\bar{\gamma}}\right\|$	(28)
	$\displaystyle=\frac{1}{N}\sum_{n}\left\|\frac{d\langle O(t_{n})\rangle_{\text{pred}}}{d\bar{\gamma}}\right\|$
	$\displaystyle=\frac{1}{N}\sum_{n}\chi(t).$

On the other hand, for the MSE loss function, $\eta_{\text{MSE}}$ can be calculated by

$\displaystyle\eta_{\text{MSE}}$	$\displaystyle=\frac{1}{N}\sum_{n}\left\|\frac{d\left(\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}\right)^{2}}{d\bar{\gamma}}\right\|$	(29)
	$\displaystyle=\frac{1}{N}\sum_{n}\left\|\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}\right\|\left\|\frac{d\langle O(t_{n})\rangle_{\text{pred}}}{d\bar{\gamma}}\right\|$
	$\displaystyle=\frac{1}{N}\sum_{n}\|\delta\bar{\gamma}\|\left\|\frac{\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}}{\delta\bar{\gamma}}\right\|\left\|\frac{d\langle O(t_{n})\rangle_{\text{pred}}}{d\bar{\gamma}}\right\|$
	$\displaystyle\propto\frac{1}{N}\sum_{n}\chi(t)^{2},$

with $\delta\bar{\gamma}=\bar{\gamma}-\bar{\gamma}_{\text{real}}$ . Eqs. (28) and (29) reproduce the Eq. (14).

References

(1) T. D. Ladd, F. Jelezko, R. Laflamme, Y. Nakamura, C. Monroe, and J. L. O’Brien, Quantum Computers, Nature 464, 45 (2010).
(2) M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information: 10th Anniversary Edition (Cambridge University Press, Cambridge, 2010).
(3) J. Preskill, Quantum Computing in the NISQ Era and Beyond, Quantum 2, 79 (2018).
(4) I. M. Georgescu, S. Ashhab, and F. Nori, Quantum Simulation, Rev. Mod. Phys. 86, 153 (2014).
(5) I. Bloch, J. Dalibard, and S. Nascimbène, Quantum Simulations with Ultracold Quantum Gases, Nat. Phys. 8, 267 (2012).
(6) A. I. Lvovsky, B. C. Sanders, and W. Tittel, Optical Quantum Memory, Nature Photon 3, 706 (2009).
(7) V. Giovannetti, S. Lloyd, and L. Maccone, Advances in Quantum Metrology, Nature Photon 5, 222 (2011).
(8) L. Pezzè, A. Smerzi, M. K. Oberthaler, R. Schmied, and P. Treutlein, Quantum Metrology with Nonclassical States of Atomic Ensembles, Rev. Mod. Phys. 90, 035005 (2018).
(9) C. L. Degen, F. Reinhard, and P. Cappellaro, Quantum Sensing, Rev. Mod. Phys. 89, 035002 (2017).
(10) H. M. Wiseman and G. J. Milburn, Quantum Theory of Optical Feedback via Homodyne Detection, Phys. Rev. Lett. 70, 548 (1993).
(11) D. Vitali, P. Tombesi, and G. J. Milburn, Controlling the Decoherence of a “Meter” via Stroboscopic Feedback, Phys. Rev. Lett. 79, 2442 (1997).
(12) E. L. Hahn, Spin Echoes, Phys. Rev. 80, 580 (1950).
(13) L. Viola, E. Knill, and S. Lloyd, Dynamical Decoupling of Open Quantum Systems, Phys. Rev. Lett. 82, 2417 (1999).
(14) G. de Lange, Z. H. Wang, D. Ristè, V. V. Dobrovitski, and R. Hanson, Universal Dynamical Decoupling of a Single Solid-State Spin from a Spin Bath, Science 330, 60 (2010).
(15) J. Du, X. Rong, N. Zhao, Y. Wang, J. Yang, and R. B. Liu, Preserving Electron Spin Coherence in Solids by Optimal Dynamical Decoupling, Nature 461, 1265 (2009).
(16) M. H. Abobeih, J. Cramer, M. A. Bakker, N. Kalb, M. Markham, D. J. Twitchen, and T. H. Taminiau, One-Second Coherence for a Single Electron Spin Coupled to a Multi-Qubit Nuclear-Spin Environment, Nat Commun 9, 2552 (2018).
(17) A. Laraoui, F. Dolde, C. Burk, F. Reinhard, J. Wrachtrup, and C. A. Meriles, High-Resolution Correlation Spectroscopy of 13C Spins near a Nitrogen-Vacancy Centre in Diamond, Nat. Commun. 4, 1651 (2013).
(18) Y. Sagi, I. Almog, and N. Davidson, Process Tomography of Dynamical Decoupling in a Dense Cold Atomic Ensemble, Phys. Rev. Lett. 105, 053201 (2010).
(19) Y. Wang, M. Um, J. Zhang, S. An, M. Lyu, J.-N. Zhang, L.-M. Duan, D. Yum, and K. Kim, Single-Qubit Quantum Memory Exceeding Ten-Minute Coherence Time, Nature Photon 11, 646 (2017).
(20) Q. Guo, S.-B. Zheng, J. Wang, C. Song, P. Zhang, K. Li, W. Liu, H. Deng, K. Huang, D. Zheng, X. Zhu, H. Wang, C.-Y. Lu, and J.-W. Pan, Dephasing-Insensitive Quantum Information Storage and Processing with Superconducting Qubits, Phys. Rev. Lett. 121, 130501 (2018).
(21) B. Pokharel, N. Anand, B. Fortman, and D. A. Lidar, Demonstration of Fidelity Improvement Using Dynamical Decoupling with Superconducting Qubits, Phys. Rev. Lett. 121, 220502 (2018).
(22) R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, Neural Ordinary Differential Equations, ArXiv:1806.07366 [Cs, Stat] (2018).
(23) G. Lindblad, On the Generators of Quantum Dynamical Semigroups, Commun. Math. Phys. 48, 119 (1976).
(24) V. Gorini, A. Kossakowski, and E. C. G. Sudarshan, Completely Positive Dynamical Semigroups of N‐level Systems, J. Math. Phys. 17, 821 (1976).
(25) C. W. Gardiner and P. Zoller, Quantum Noise: A Handbook of Markovian and Non-Markovian Quantum Stochastic Methods with Applications to Quantum Optics, 3rd ed (Springer, Berlin; New York, 2004).
(26) C. Di Franco, M. Paternostro, and M. S. Kim, Hamiltonian Tomography in an Access-Limited Setting without State Initialization, Phys. Rev. Lett. 102, 187203 (2009).
(27) J. Zhang and M. Sarovar, Quantum Hamiltonian Identification from Measurement Time Traces, Phys. Rev. Lett. 113, 080401 (2014).
(28) A. Sone and P. Cappellaro, Hamiltonian Identifiability Assisted by a Single-Probe Measurement, Phys. Rev. A 95, 022335 (2017).
(29) J. Zhang and M. Sarovar, Identification of Open Quantum Systems from Observable Time Traces, Phys. Rev. A 91, 052121 (2015).
(30) A. Sone and P. Cappellaro, Exact Dimension Estimation of Interacting Qubit Systems Assisted by a Single Quantum Probe, Phys. Rev. A 96, 062334 (2017).
(31) X.-L. Qi and D. Ranard, Determining a Local Hamiltonian from a Single Eigenstate, Quantum 3, 159 (2019).
(32) E. Bairey, I. Arad, and N. H. Lindner, Learning a Local Hamiltonian from Local Measurements, Phys. Rev. Lett. 122, 020504 (2019).
(33) E. Bairey, C. Guo, D. Poletti, N. H. Lindner, and I. Arad, Learning the Dynamics of Open Quantum Systems from Their Steady States, New J. Phys. 22, 032001 (2020).
(34) Z. Li, L. Zou, and T. H. Hsieh, Hamiltonian Tomography via Quantum Quench, Phys. Rev. Lett. 124, 160502 (2020).
(35) T. Xin, S. Lu, N. Cao, G. Anikeeva, D. Lu, J. Li, G. Long, and B. Zeng, Local-Measurement-Based Quantum State Tomography via Neural Networks, Npj Quantum Inf 5, 109 (2019).
(36) L. Che, C. Wei, Y. Huang, D. Zhao, S. Xue, X. Nie, J. Li, D. Lu, and T. Xin, Learning Quantum Hamiltonians from Single-Qubit Measurements, Phys. Rev. Research 3, 023246 (2021).
(37) X. Ma, Z. C. Tu, and S.-J. Ran, Deep Learning Quantum States for Hamiltonian Estimation, Chinese Phys. Lett. 38, 110301 (2021).
(38) R. Pascanu, T. Mikolov, and Y. Bengio, On the Difficulty of Training Recurrent Neural Networks, (2012).
(39) P. O. J. Scherer, Computational Physics: Simulation of Classical and Quantum Systems (Springer International Publishing, Heidelberg, 2013).
(40) M. W. Doherty, N. B. Manson, P. Delaney, F. Jelezko, J. Wrachtrup, and L. C. L. Hollenberg, The Nitrogen-Vacancy Colour Centre in Diamond, Physics Reports 528, 1 (2013).
(41) O. Klochan, A. P. Micolich, A. R. Hamilton, K. Trunov, D. Reuter, and A. D. Wieck, Observation of the Kondo Effect in a Spin- $\frac{3}{2}$ Hole Quantum Dot, Phys. Rev. Lett. 107, 076805 (2011).
(42) Y. Kawaguchi and M. Ueda, Spinor Bose-Einstein Condensates, Physics Reports 520, 253 (2012).
(43) H. Kim, K. Wang, Y. Nakajima, R. Hu, S. Ziemak, P. Syers, L. Wang, H. Hodovanets, J. D. Denlinger, P. M. R. Brydon, D. F. Agterberg, M. A. Tanatar, R. Prozorov, and J. Paglione, Beyond Triplet: Unconventional Superconductivity in a Spin-3/2 Topological Semimetal, Sci. Adv. 4, eaao4513 (2018).
(44) S. Sachdev and B. Keimer, Quantum Criticality, Physics Today 64, 29 (2011).
(45) C.-K. Chiu, J. C. Y. Teo, A. P. Schnyder, and S. Ryu, Classification of Topological Quantum Matter with Symmetries, Rev. Mod. Phys. 88, 035005 (2016).
(46) X.-G. Wen, Colloquium: Zoo of Quantum-Topological Phases of Matter, Rev. Mod. Phys. 89, 041004 (2017).
(47) To deal with an open quantum system with $L$ spins is equivalent to deal with a closed system with $2L$ spins, since the open system can be viewed as two coupled closed systems under the Choi-Jamiolkwski isomorphism [see Eq. (24) in Appendix C]. The computational complexity are further enlarged by $N_{\text{bs}}N_{\text{ini}}\sim 10^{2}$ times due to batches and initializations. Practically, we rent two Nvidia GTX 1080 Ti graphics cards to speed up the numerics, which keeps the training time of the spin-chain task with $L=5$ at about one day.
(48) C. W. Gardiner and P. Zoller, Quantum Noise (Springer Berlin Heidelberg, Berlin, Heidelberg, 2000).
(49) I. de Vega and D. Alonso, Dynamics of Non-Markovian Open Quantum Systems, Rev. Mod. Phys. 89, 015001 (2017).
(50) A. J. Daley, Quantum Trajectories and Open Many-Body Quantum Systems, Advances in Physics 63, 77 (2014).
(51) J. Schachenmayer, A. Pikovski, and A. M. Rey, Many-Body Quantum Spin Dynamics with Monte Carlo Trajectories on a Discrete Phase Space, Phys. Rev. X 5, 011022 (2015).
(52) J. Huber, A. M. Rey, and P. Rabl, Realistic Simulations of Spin Squeezing and Cooperative Coupling Effects in Large Ensembles of Interacting Two-Level Systems, Phys. Rev. A 105, 013716 (2022).
(53) A. J. Daley, J. M. Taylor, S. Diehl, M. Baranov, and P. Zoller, Atomic Three-Body Loss as a Dynamical Three-Body Interaction, Phys. Rev. Lett. 102, 040402 (2009).
(54) G. Carleo and M. Troyer, Solving the Quantum Many-Body Problem with Artificial Neural Networks, Science 355, 602 (2017).
(55) A. Nagy and V. Savona, Variational Quantum Monte Carlo Method with a Neural-Network Ansatz for Open Quantum Systems, Phys. Rev. Lett. 122, 250501 (2019).
(56) M. J. Hartmann and G. Carleo, Neural-Network Approach to Dissipative Quantum Many-Body Dynamics, Phys. Rev. Lett. 122, 250502 (2019).
(57) F. Vicentini, A. Biella, N. Regnault, and C. Ciuti, Variational Neural-Network Ansatz for Steady States in Open Quantum Systems, Phys. Rev. Lett. 122, 250503 (2019).
(58) P. P. Mazza, D. Zietlow, F. Carollo, S. Andergassen, G. Martius, and I. Lesanovsky, Machine Learning Time-Local Generators of Open Quantum Dynamics, Phys. Rev. Research 3, 023084 (2021).
(59) Z. Liu, L.-M. Duan, and D.-L. Deng, Solving Quantum Master Equations with Deep Quantum Neural Networks, Phys. Rev. Research 4, 013097 (2022).
(60) D. Luo, Z. Chen, J. Carrasquilla, and B. K. Clark, Autoregressive Neural Network for Simulating Open Quantum Systems via a Probabilistic Formulation, Phys. Rev. Lett. 128, 090501 (2022).
(61) K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Las Vegas, NV, USA, 2016), pp. 770–778.
(62) R. A. Bertlmann and P. Krammer, Bloch Vectors for Qudits, J. Phys. A: Math. Theor. 41, 235303 (2008).
(63) A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind. Automatic Differentiation in Machine Learning: a Survey. The Journal of Machine Learning Research, 18, 1 (2018).
(64) A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, Automatic Differentiation in PyTorch, in NIPS 2017 Workshop on Autodiff (Long Beach, California, USA, 2017).
(65) M.-D. Choi, Completely Positive Linear Maps on Complex Matrices, Linear Algebra and Its Applications 10, 285 (1975).
(66) A. Jamiołkowski, Linear Transformations Which Preserve Trace and Positive Semidefiniteness of Operators, Reports on Mathematical Physics 3, 275 (1972).