This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning Quantum Dissipation by Neural Ordinary Differential Equation

Li Chen1,2 [email protected]    Yadong Wu2,3 1Institute of Theoretical Physics and State Key Laboratory of Quantum Optics and Quantum Optics Devices, Shanxi University, Taiyuan 030006, China
2Institute for Advanced Study, Tsinghua University, Beijing 100084, China
3State Key Laboratory of Surface Physics, Institute of Nanoelectronics and Quantum Computing, and Department of Physics, Fudan University, Shanghai 200433, China
Abstract

Quantum dissipation arises from the unavoidable coupling between a quantum system and its surrounding environment, which is known as a major obstacle in the quantum processing of information. Apart from its existence, how to trace the dissipation from observational data is a crucial topic that may stimulate manners to suppress the dissipation. In this paper, we propose to learn the quantum dissipation from dynamical observations using the neural ordinary differential equation, and then demonstrate this method concretely on two open quantum-spin systems — a large spin system and a spin-1/2 chain. We also investigate the learning efficiency of the dataset, which provides useful guidance for data acquisition in experiments. Our work promisingly facilitates effective modeling and decoherence suppression in open quantum systems.

I Introduction

Quantum dissipation is closely related to such phenomena as decoherence, spectrum broadening and heating, all of which stand as serious obstacles in research areas ranging from quantum computation Ladd2010 ; Nielsen2010 ; Preskill2018 and simulation Georgescu2014 ; Bloch2012 , quantum information storage Lvovsky2009 , to quantum metrology Giovannetti2011 ; Pezze2018 , and sensing Degen2017 . The microscopic origin of dissipation is the breaking of isolation of the quantum system, i.e., the system inevitably interacts with its surrounding environment such that the information leakage occurs as the ambient degrees of freedom are traced out. Dissipation severely impairs the accuracy of modeling and manipulation of quantum systems.

Considerable efforts have been made to counteract the negative effects of dissipation. For example, quantum-nondemolition-mediated feedback has been used in suppressing the decoherence of a Schrödinger cat state in a cavity Wiseman1993 ; Vitali1997 ; spin-echo Hahn1950 and dynamical decoupling protocols Viola1999 have been widely applied to nitrogen-vacancy centers Lange2010 ; Du2009 ; Abobeih2018 ; Laraoui2013 , cold atoms Almog2010 , trapped ions Wang2017 and super-conducting circuits Guo2018 ; Pokharel2018 . These techniques often work with certain prior knowledge of the system-environment interactions. For example, in the nitrogen-vacancy centers, strong bias fields were introduced to circumvent the transverse coupling Du2009 or to effectively establish the bath correlations of the carbon nucleus Laraoui2013 . Furthermore, the coupling manner may also be recognized in advance, if it is magnetic or electric, linear or quadratical. However, such prior information is generally unachievable, especially for many-body systems interacting with complex environments.

In this paper, we propose a data-driven scheme to reconstruct the Markovian open quantum systems, where the dataset comes from the discrete observations of the relaxation dynamics under certain probes. Based on the dataset, we adopt the neural ordinary differential equation (NODE), a recently developed machine learning algorithm Chen2019 , to learn the Liouvillian of the open system by inversely solving the Lindblad (or Gorini-Kossakowski-Sudarshan-Lindblad) master equation Lindblad1976 ; Gorini1976 ; Gardiner2004 . A number of relevant works have been reported on such a topic. The operator expansion Franco2009 and the eigensystem realization algorithm (ERA) Zhang2014 ; Sone2017A were firstly used in the time-trace of the Hamiltonians; the ERA was later extended to the Markovian dissipative systems Zhang2015 and the estimation of the system size Sone2017B . Several eigenstate- Qi2019 ; Bairey2019 or steady-state-based approaches Bairey2020 were further developed. A generic approach for local Hamiltonian tomography was reported using the initial and the final observations of quench dynamics Li2020 . Also for this topic, several traditional machine-learning architectures, such as the fully connected neural network Xin2019 , long short-term memory Che2021 , and convolutional neural network Ma2021 were recently involved.

The novelty and contributions of our present work mainly lie in the following points. At first, our approach can be applied to either the Hamiltonian tomography (closed system) or the reconstruction of the Markovian dissipations (open system), with no need of prior information on the specific structures of the Hamiltonian or Liouvilian. Second, we adopt a relatively new machine-learning algorithm (namely the NODE Chen2019 ) to deal with the gradient, with such issues encountered by the traditional machine-learning algorithms as the gradient explosion and vanishing Pascanu2012 being avoided. Furthermore, we also study the learning efficiency of datasets to facilitate the data acquisition in realistic experiments. We thus expect our work can play an active role in effective modeling and guiding new system-environment decoupling protocols for various open quantum systems.

Refer to caption
Figure 1: (a) Schematic of an open quantum system. (b) ODE solvers work to find solutions of the Lindblad equation when the Hamiltonian or the Liouvilian is priorly known, while the NODE works to reconstruct the Hamiltonian or Liouvilian as solutions (ρ(t)\rho(t) or O(t)\langle O(t)\rangle) are given. (c) Mechanism of the NODE where ρ(t)\rho(t) and a(t)a(t) flow forward and backward, respectively.

II General Method

We consider a quantum system being coupled to the environment, as is schematically displayed in Fig. 1(a). Under the Born-Markov approximation, the equation of motion of the system is governed by the Lindblad master equation Lindblad1976 ; Gorini1976 ; Gardiner2004

ρ˙=L[ρ],\dot{\rho}=L[\rho], (1)

where ρ\rho is the density matrix of the quantum system and LL is the Liouvilian super-operator in the form of (setting =1\hbar=1)

L[ρ]=i[H,ρ]+k[JkρJk12{JkJk,ρ}].L[\rho]=-i[H,\rho]+\sum_{k}[J_{k}\rho J_{k}^{\dagger}-\frac{1}{2}\{J_{k}^{\dagger}J_{k},\rho\}]. (2)

Here, HH denotes the dissipation-independent Hamiltonian, JkJ_{k} is the dissipative operator (or jumping operator) in the kk-th channel, and {JkJk,ρ}=JkJkρ+ρJkJk\{J_{k}^{\dagger}J_{k},\rho\}=J_{k}^{\dagger}J_{k}\rho+\rho J_{k}^{\dagger}J_{k} accounts for the normalization of ρ\rho as if no jump occurs.

Conventionally, with certain Liouvilian LL (namely HH and JkJ_{k}), one can obtain the solution ρ(t)\rho(t) by numerically solving the Lindblad equation (Eq.(2)) using ordinary differential equation (ODE) solvers such as the Euler or the Runge-Kutta Scherer2013 . In contrast, the goal of this paper is to determine the dissipation JkJ_{k} and even HH when the dynamical behaviors of density matrix ρ(t)\rho(t) or certain observations O(t)=Tr[Oρ(t)]\langle O(t)\rangle=\text{Tr}[O\rho(t)] are given. In other words, we aim to deal with an inverse problem to reproduce LL from the sequential dataset 𝒟={ρ(tn) or O(tn)}\mathcal{D}=\{\rho(t_{n})\text{ or }\langle O(t_{n})\rangle\}, as illustrated in Fig. 1(b). Additionally, for the sake of experimental application, we consider that 𝒟\mathcal{D} is purely constructed by observations in the following discussion, i.e. 𝒟={O(tn)}\mathcal{D}=\{\langle O(t_{n})\rangle\}, since compared to ρ(t)\rho(t), acquiring {O(tn)}\{\langle O(t_{n})\rangle\} is experimentally simpler and more feasible.

We adopt two different probes to generate the dataset 𝒟\mathcal{D}. i) Time-dependent probe: for a fixed ρ(t0)\rho(t_{0}), one imposes a t-dependent control Hprob(t)H_{\text{prob}}(t), i.e.,

H=H0+Hprob(t),H=H_{0}+H_{\text{prob}}(t), (3)

where Hprob=𝒫(t)𝐀=μpμ(t)AμH_{\text{prob}}=\mathcal{P}(t)\cdot\mathbf{A}=\sum_{\mu}p_{\mu}(t)A_{\mu} with pμ(t)p_{\mu}(t) being some smooth series and AμA_{\mu} the according control operators. ii) Time-independent probe: with fixed HH and JkJ_{k}, one diversifies O(t)\langle O(t)\rangle by varying ρ(t0)\rho(t_{0}), which suits the experimental scenarios in which preparing different initial states is more convenient. For each evolution, we perform the measurement O(tn)\langle O(t_{n})\rangle uniformly at discrete time within the time range tn[0,tN)t_{n}\in[0,t_{N}), where tn=ntN/Ntst_{n}=nt_{N}/N_{\text{ts}} with NtsN_{\text{ts}} being the time steps of measurement, which forms a data batch. The entire dataset is constructed by different batches, i.e.,

𝒟={O(𝒫s,tn)|s[1,Nbs];n[0,Nts1]},\mathcal{D}=\{\langle O(\mathcal{P}_{s},t_{n})\rangle|s\in[1,N_{\text{bs}}];n\in[0,N_{\text{ts}}-1]\}, (4)

where NbsN_{\text{bs}} denotes the batch-size. Therefore, the total number of data points in 𝒟\mathcal{D} is equal to N=NbsNtsN=N_{\text{bs}}N_{\text{ts}}. Here, we mentioned that the discrete-time observations with equal intervals are considered for experimental convenience, which is however not a necessary condition for the learning algorithm that we will show below.

To learn the Liouvilian from 𝒟\mathcal{D}, we adopt a machine learning algorithm called the NODE Chen2019 . The NODE builds upon an ansatz ρ˙(ρ(t),t,θ)\dot{\rho}(\rho(t),t,\theta) with θ={α,γ}\theta=\{\alpha,\gamma\}, where α\alpha and γ\gamma denote the parameters to be learnt in HH and LkL_{k}, respectively. The learning process is illustrated in Fig. 1(c), and it contains two parts. First, with ρ˙(ρ(t),t,θ)\dot{\rho}(\rho(t),t,\theta) and the initial state ρ(t0)\rho(t_{0}), we propagate the Lindblad equation (2) forward by using certain ODE solvers, and then obtain a series of predictive solutions {ρpred(tn)}\{\rho_{\text{pred}}(t_{n})\}; the loss function \mathcal{L} is defined as a functional of O(tn)pred\langle O(t_{n})\rangle_{\text{pred}} and O(tn)real𝒟\langle O(t_{n})\rangle_{\text{real}}\in\mathcal{D}, which effectively measures the distance between 𝒟\mathcal{D} and the NODE predictions. The purpose of learning is to minimize \mathcal{L} by adjusting θ\theta. To this end, the NODE introduces an adjoint field defined by a(t)=(t)/ρ(t)a(t)=\partial\mathcal{L}(t)/\partial\rho(t). Through backward propagating a(t)a(t) from tNt_{N} to t0t_{0}, we obtain d/dθ|t=t0d\mathcal{L}/d\theta|_{t=t_{0}} which is then be used to update the parameters in the way of θθλ(d/dθ|t=t0)\theta\rightarrow\theta-\lambda(d\mathcal{L}/d\theta|_{t=t_{0}}) with λ\lambda being the learning rate. More calculation details about the NODE algorithm can be found in Appendix A.

III Examples

We apply this method to two concrete examples. In the first example, we consider a one-body spin system with the spin quantum number being S=3/2S=3/2. Large spin systems are active in various research areas of quantum physics ranging from high-spin quantum dots Klochan2011 ; Doherty2013 , multi-component quantum gases Kawaguchi2012 to unconventional superconductors Wang2018 . In the second example, our system is a many-body spin-1/2 chain. Spin chains stand as fundamental models in condensed matter physics and quantum computation, which are closely related to quantum criticality Sachdev2011 , topological phase of matters Chiu2016 ; Wen2017 , etc.

III.1 Spin-3/2 system

The general spin-3/2 system refers to the four-level system with spin-vector operators {Sx,Sy,Sz}\{S_{x},S_{y},S_{z}\} being the generalized Pauli matrices. Since the spin-tensor operators may also be involved due to, for example, the quadratic Zeeman effect, we generally expand the Hamiltonian by the SU(4) generators, i.e.,

H0=μ=115αμg^μ,H_{0}=-\sum_{\mu=1}^{15}\alpha_{\mu}\hat{g}_{\mu}, (5)

where g^μ\hat{g}_{\mu} denote the 15 Hermitian generators satisfying traceless Tr[g^μ]=0\text{Tr}[\hat{g}_{\mu}]=0 and orthogonal Tr[g^μg^ν]=2δμν\text{Tr}[\hat{g}_{\mu}\hat{g}_{\nu}]=2\delta_{\mu\nu} conditions, and αμ\alpha_{\mu} are the corresponding coefficients. Furthermore, we assume that the system possesses weak dissipations in these Hermitian channels, namely

Jμ=γμg^μ,J_{\mu}=\sqrt{\gamma_{\mu}}\hat{g}_{\mu}, (6)

with γμ\gamma_{\mu} being the dissipative strength. The weakness is reflected in γ¯α¯\bar{\gamma}\ll\bar{\alpha}, where α¯=Nα1μ|αμ|\bar{\alpha}=N_{\alpha}^{-1}\sum_{\mu}|\alpha_{\mu}| and γ¯=Nγ1μγμ\bar{\gamma}=N_{\gamma}^{-1}\sum_{\mu}\gamma_{\mu} respectively denote the mean strength of α\alpha and γ\gamma, and Nα,γN_{\alpha,\gamma} are the number of parameters. Remark that our goal is to learn the parameters θ={α,γ}\theta=\{\alpha,\gamma\} from the dataset 𝒟\mathcal{D}.

The dataset 𝒟\mathcal{D} is generated by evolving the Lindblad equation Eq. (1) within t[0,20/α¯)t\in[0,20/\bar{\alpha}) and by uniformly measuring the transverse magnetization, i.e., O=Sx\langle O\rangle=\langle S_{x}\rangle with {Sx,Sy,Sz}\{S_{x},S_{y},S_{z}\} the generalized Pauli operators. For general consideration, α\alpha and γ\gamma are randomized within α[1,1]α¯\alpha\in[-1,1]\bar{\alpha} and γ[0,2]γ¯\gamma\in[0,2]\bar{\gamma}, where γ¯=0.02α¯\bar{\gamma}=0.02\bar{\alpha}. As mentioned before, we adopt two different probes, and for the t-dependent probe [probe i)], we fix ρ(t0)=|3/2,3/23/2,3/2|\rho(t_{0})=\left|3/2,3/2\right\rangle\left\langle 3/2,3/2\right| to be a magnetized state along SzS_{z} and introduce a control magnetic field Hprob=μ=x,y,zpμ(t)SμH_{\text{prob}}=\sum_{\mu=x,y,z}p_{\mu}(t)S_{\mu} with pμ(t)=2α¯m=110sin(2πωmt)p_{\mu}(t)=2\bar{\alpha}\sum_{m=1}^{10}\sin(2\pi\omega_{m}t) and ωm\omega_{m} being randomized within ωm[1,1]α¯\omega_{m}\in[-1,1]\bar{\alpha}; for the t-independent probe [probe ii)], we take ρ(t0)\rho(t_{0}) to be random pure states that satisfy Tr[ρ2(t0)]=1\text{Tr}[\rho^{2}(t_{0})]=1. The total data numbers of 𝒟\mathcal{D} is N=103N=10^{3} with Nbs=10N_{\text{bs}}=10 and Nts=102N_{\text{ts}}=10^{2}.

Refer to caption
Figure 2: Learning results of the spin-3/2 model with the t-dependent probe. (a) Dependence of the loss function \mathcal{L}, the relative error of α\alpha, and the relative error of γ\gamma on training epochs. (b) Spin dynamics where solid, dashed and dot-dashed lines correspond to Sxreal\langle S_{x}\rangle_{\text{real}} in the dataset 𝒟\mathcal{D}, Sxpred\langle S_{x}\rangle_{\text{pred}} before training and Sxpred\langle S_{x}\rangle_{\text{pred}} after 10310^{3} epochs of training. (c) and (d) Real parameters (solid triangles) and predictive parameters (circles with error bars) after training, respectively. Color shadings in (b) and error bars in (c) and (d) indicate the predictive fluctuations due to Nini=10N_{\text{ini}}=10 different initializations.

In the learning process, we minimize the mean square error (MSE) loss function

=1Ns,n[Sx(𝒫s,tn)predSx(𝒫s,tn)real]2,\mathcal{L}=\frac{1}{N}\sum_{s,n}\left[\langle S_{x}(\mathcal{P}_{s},t_{n})\rangle_{\text{pred}}-\langle S_{x}(\mathcal{P}_{s},t_{n})\rangle_{\text{real}}\right]^{2}, (7)

and monitor the averaged relative error with respect to the parameters, i.e.,

θ={α,γ}=1θ¯1Nθ(θpredθreal)2¯,\mathcal{E}_{\theta=\{\alpha,\gamma\}}=\overline{\frac{1}{\bar{\theta}}\sqrt{\frac{1}{N_{\theta}}\sum(\theta_{\text{pred}}-\theta_{\text{real}})^{2}}}, (8)

where the additional overscore on the right-hand side means to take a further average on NiniN_{\text{ini}} different initializations of θ\theta. Here, we take Nini=10N_{\text{ini}}=10.

In Fig. 2, we present the learning results of 𝒟\mathcal{D} using the t-dependent probe, while leave those using the t-independent probe in Appendix B. Specifically, Fig. 2(a) shows the variation of \mathcal{L}, α\mathcal{E}_{\alpha} and γ\mathcal{E}_{\gamma} on training epochs, from which one can read that the algorithm converges at approximately 5×1035\times 10^{3} epoch. After convergence, we compare the predictive parameters α\alpha and γ\gamma (labeled by circles with error bars) with their realistic values (solid triangles) in subfigures (c) and (d), respectively, where the error bars indicate the standard deviation because of different initializations. Clearly, the algorithm successfully reproduces the parameters with relative errors α5\mathcal{E}_{\alpha}\lesssim 5\text{\textperthousand} and γ2%\mathcal{E}_{\gamma}\lesssim 2\%. Fig. 2(b) presents the spin dynamics Sx(t)\langle S_{x}(t)\rangle of a typical batch, where the solid, dashed and dot-dashed lines correspond to the realistic Sx(t)real\langle S_{x}(t)\rangle_{\text{real}}, the predictive curve before training, and the predictive one at the 10310^{3} training epoch respectively, with shadings indicating the predictive fluctuations due to initializations. As the training progresses, Sxpred\langle S_{x}\rangle_{\text{pred}} gradually approaches Sxreal\langle S_{x}\rangle_{\text{real}} accompanied by the diminishment of predictive fluctuations.

III.2 Spin-1/2 Chain

Refer to caption
Figure 3: Learning results of the spin-1/2 chain with the t-independent probe. (a) Dependence of \mathcal{L}, α\alpha, and γ\gamma on training epochs. (b) Total spin dynamics of Sxreal𝒟\langle S_{x}\rangle_{\text{real}}\in\mathcal{D}, Sxpred\langle S_{x}\rangle_{\text{pred}} before training and Sxpred\langle S_{x}\rangle_{\text{pred}} after 10310^{3} epochs of training. (c) and (d) Real parameters (solid triangles) and predictive parameters (circles with error bars) after training.

The second example is a spin-1/2 chain with nearest-neighbor interactions, whose Hamiltonian is written as

H0=i,μαμiσμii,μ,ναμνiσμiσνi+1.H_{0}=-\sum_{i,\mu}\alpha_{\mu}^{i}\sigma_{\mu}^{i}-\sum_{i,\mu,\nu}\alpha_{\mu\nu}^{i}\sigma_{\mu}^{i}\sigma_{\nu}^{i+1}. (9)

The first term accounts for the local terms with σμ=x,y,zi\sigma_{\mu=x,y,z}^{i} the spin-1/2 Pauli operators with αμi\alpha_{\mu}^{i} the corresponding strength. The second term characterizes the two-body interactions with strength αμνi\alpha_{\mu\nu}^{i}. The one-body and two-body dissipations are considered in the according channels

Jμi\displaystyle J_{\mu}^{i} =γμiσμi,Jμνi\displaystyle=\sqrt{\gamma_{\mu}^{i}}\sigma_{\mu}^{i},\;\;\;\;\;J_{\mu\nu}^{i} =γμνiσμiσνi+1,\displaystyle=\sqrt{\gamma_{\mu\nu}^{i}}\sigma_{\mu}^{i}\sigma_{\nu}^{i+1}, (10)

with γμ\gamma_{\mu} and γμν\gamma_{\mu\nu} the corresponding strength. As a proof-of-principle demonstration, we practically set the chain length to 5 under a periodical boundary condition, which keeps the numerical complexity within the computational power of a PC with two GPUs Complexity . In such a case, there is a total of 120 parameters that need to be learned, 60 each for α\alpha and γ\gamma. Again, we generate two datasets using the t-dependent and the t-independent probes. For the former, we set ρ(t0)=i|i|i\rho(t_{0})=\bigotimes_{i}\left|\uparrow\right\rangle_{i}\left\langle\uparrow\right|_{i} and Hprob=μpμSμH_{\text{prob}}=\sum_{\mu}p_{\mu}S_{\mu} with Sμ=x,y,z=iσμiS_{\mu=x,y,z}=\sum_{i}\sigma_{\mu}^{i} the total spin operators and pμ=2α¯m=110sin(2πωmt)p_{\mu}=2\bar{\alpha}\sum_{m=1}^{10}\sin(2\pi\omega_{m}t); for the latter, ρ(t0)\rho(t_{0}) are chosen to be random product states. The measured observables are the total spins, i.e., O=Sμ\langle O\rangle=\langle S_{\mu}\rangle. More detailed settings are listed below: t[0,5/α¯]t\in[0,5/\bar{\alpha}], N=103N=10^{3} with Nbs=20N_{\text{bs}}=20 and Nts=50N_{\text{ts}}=50, randomized α[0,1]α¯\alpha\in[0,1]\bar{\alpha}, γ[0,0.02]α¯\gamma\in[0,0.02]\bar{\alpha}, ωm[2,2]α¯\omega_{m}\in[-2,2]\bar{\alpha}, and Nini=10N_{\text{ini}}=10.

In Fig. 3, we display the learning task with the t-independent probe while leaving the other task (using t-dependent probe) in Appendix B. All instructions of the Fig. 3 are similar to those used in Fig. 2 except that Sx\langle S_{x}\rangle now denotes the total magnetization. The accurate predictions with α1\mathcal{E}_{\alpha}\lesssim 1\text{\textperthousand} and γ1%\mathcal{E}_{\gamma}\lesssim 1\% shown in Figs. 3(c) and (d) clearly demonstrate the feasibility of the algorithm on many-body spin systems.

Here, we would like to make several additional comments. At first, the system shown above carries no symmetry, which in principle allows us to learn all the parameters by looking at a global (or local) observable O\langle O\rangle. However, if the system intrinsically carries certain symmetry, then all the accessible operators associated with OO may be limited within certain subspaces, which forms the accessible set Zhang2014 . In this case, only the parameters related to the accessible set can be determined by the current time-trace approach. Second, there is a case in which the NODE fails to make unique predictions even for systems without intrinsic symmetry. The NODE can access to three parts of information — the initial state ρ0\rho_{0}, the probe field HprobH_{\text{prob}}, and the measured data O𝒟\langle O\rangle\in\mathcal{D}, and hence it cannot make unique predictions if all the three parts share a common ”symmetry”. For example in the learning task using t-dependent probe (see Appendix B), ρ(t0)\rho(t_{0}), HprobH_{\text{prob}} and Sx\langle S_{x}\rangle are all translational invariant (namely independent on local spins), which leads to unfavorable learning results because of the ”symmetry”-induced ambiguity. This problem, however, does not occur for the task with the t-independent probe illustrated in Fig. 3, since the employed initial states ρ(t0)\rho(t_{0}) explicitly break the translational ”symmetry”.

IV Learning Efficiency

In the above, we illustrated the capability of the algorithm in learning open quantum systems. Now, we turn to the question of learning efficiency — how to collect the data points can make the learning more efficient? We emphasize the importance of this question on data acquisition in realistic experiments, especially when collecting data points is expensive or time-consuming.

To make this question simple and clear, let us focus on the situation that γ\gamma are the only unknown parameters that we would like to learn. Moreover, since different batches are independent of each other in 𝒟\mathcal{D}, we consider 𝒟\mathcal{D} only contains one batch such that the total number of data points is N=NtsN=N_{\text{ts}}. As mentioned before, the NODE adjusts parameters γμ\gamma_{\mu} according to d/dγμd\mathcal{L}/d\gamma_{\mu}, which motivates us to define

η=1NNγn=1Nμ=1Nγ|dndγμ|,\eta=\frac{1}{NN_{\gamma}}\sum_{n=1}^{N}\sum_{\mu=1}^{N_{\gamma}}\left|\frac{d\mathcal{L}_{n}}{d\gamma_{\mu}}\right|, (11)

where n\mathcal{L}_{n} means the local loss function with respect to a single data point O(tn)\langle O(t_{n})\rangle. The physical meaning of η\eta is quite clear, it characterizes the averaged sensitivity of \mathcal{L} on γ\gamma such that a large η\eta would speed up the learning process. To further simplify the Eq. (11), we replace the average of individual derivatives by the derivative to the mean value of γμ\gamma_{\mu}, i.e.,

η=1Nn=1N|dndγ¯|.\eta=\frac{1}{N}\sum_{n=1}^{N}\left|\frac{d\mathcal{L}_{n}}{d\bar{\gamma}}\right|. (12)

For weak dissipation γ¯α¯\bar{\gamma}\ll\bar{\alpha}, η\eta is closely related to

χ(t)\displaystyle\chi(t) =|dO(t)dγ¯|=Δ1teΔ1tγ¯|f(t)|,\displaystyle=\left|\frac{d\langle O(t)\rangle}{d\bar{\gamma}}\right|=\frac{\Delta_{1}te^{-\Delta_{1}t}}{\bar{\gamma}}|f(t)|, (13)

which is composed of two parts: |f(t)||f(t)| is a fast-oscillating term characterized by the energy scale of α¯\bar{\alpha}, while teΔ1tte^{-\Delta_{1}t} is a slowly varying envelop depicting the dissipation-induced damping of O(t)\langle O(t)\rangle, where Δ1γ¯\Delta_{1}\propto\bar{\gamma} is the real part of the Liouvilian gap. Obviously, the envelop of χ(t)\chi(t) is a non-monotonic function being maximized at t=tdct=t_{\text{dc}} (see χ(t)\chi(t) and the χ2(t)\chi^{2}(t) in Fig. 4(a)), with tdc=1/Δ1t_{\text{dc}}=1/\Delta_{1} being the decoherent time.

The most commonly used two loss functions, mean absolute error (MAE) and MSE, are linearly and quadratically proportional to χ(t)\chi(t) as γ¯\bar{\gamma} approaches its realistic value γ¯real\bar{\gamma}_{\text{real}}, i.e.,

ηMAE\displaystyle\eta_{\text{MAE}} =1Nnχ(tn),\displaystyle=\frac{1}{N}\sum_{n}\chi(t_{n}), (14)
ηMSE\displaystyle\eta_{\text{MSE}} 1Nnχ2(tn),\displaystyle\propto\frac{1}{N}\sum_{n}\chi^{2}(t_{n}),

where MAE is defined by n=|O(tn)predO(tn)real|\mathcal{L}_{n}=|\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}|, while the definition of MSE is presented in Eq. (7). For uniform measurement tn=ntN/Ntst_{n}=nt_{N}/N_{\text{ts}}, the summations in Eq. (14) can be analytically calculated in the large NN limit, i.e., (tNt_{N} in the unit of tdct_{\text{dc}})

limNηMAE\displaystyle\lim_{N\rightarrow\infty}\eta_{\text{MAE}} =1etNtNetNtN,\displaystyle=\frac{1-e^{-t_{N}}-t_{N}e^{-t_{N}}}{t_{N}}, (15)
limNηMSE\displaystyle\lim_{N\rightarrow\infty}\eta_{\text{MSE}} 1e2tN2tN(tN+1)e2tN4tN,\displaystyle\propto\frac{1-e^{-2t_{N}}-2t_{N}(t_{N}+1)e^{-2t_{N}}}{4t_{N}},

where both are non-monotonic with maximal values lying at tNopt1.8tdct_{N}^{\text{opt}}\approx 1.8t_{\text{dc}} and tNopt1.7tdct_{N}^{\text{opt}}\approx 1.7t_{\text{dc}}, respectively. This indicates that the optimal strategy of uniform data acquisition is to take tN=tNoptt_{N}=t_{N}^{\text{opt}}, as is schematically illustrated by dots in Fig. 4(a). We benchmark the above analysis on the spin-3/2 learning task using the t-dependent probe and the MSE loss. This task is with tdc9/α¯t_{\text{dc}}\approx 9/\bar{\alpha}. Practically, we set N=15N=15 and plot the variations of γ\mathcal{E}_{\gamma} and η\eta in Figs. 4(b) and (c) respectively, where η\eta is numerically calculated using Eq. (11). Clearly, tN=tNopt=1.7tdct_{N}=t_{N}^{\text{opt}}=1.7t_{\text{dc}} exhibits the largest η\eta and the fastest decay rate of γ\mathcal{E}_{\gamma} throughout the learning process, which is in good agreement with our previous discussion.

Refer to caption
Figure 4: (a) χ\chi and χ2\chi^{2} as a function of tt, where dots visually indicate the optimal strategy of data acquisition with tN=tNoptt_{N}=t_{N}^{\text{opt}}. (b)-(d) learning results of the spin-3/2 example using the t-dependent probe and the MSE loss function. (b) and (c) respectively show the variation of γ\mathcal{E}_{\gamma} and η\eta on tNt_{N} with N=Nγ=15N=N_{\gamma}=15 being fixed. (d) shows the variation of γ\mathcal{E}_{\gamma} on NN with tN=tNopt=1.7tdct_{N}=t_{N}^{\text{opt}}=1.7t_{\text{dc}} being fixed. All the numerical results have been averaged on Nini=50N_{\text{ini}}=50 random initializations.

Finally, we briefly discuss the effect of the total data number NN. Therefore, we fix tN=tNoptt_{N}=t_{N}^{\text{opt}} and show the dependence of γ\mathcal{E}_{\gamma} on NN in Fig. 4(d). An apparent feature is that the cases with N<Nγ=15N<N_{\gamma}=15 exhibit poor learning results, which is understandable since we need at least NγN_{\gamma} data points to uniquely determine all γ\gamma, resembling that one needs at least NγN_{\gamma} equations to determine NγN_{\gamma} variables. Furthermore, an increasing NN can accelerate the learning process, however, this trend will not continue endlessly (comparing the curves N=15N=15, N=25N=25 and N=40N=40 in Fig. 4(d)). For a fairly large NN, neighboring data points exhibit tiny distinguishment such that adding more data points would be of little help for the learning efficiency.

V Summary and Outlook

We proposed a scheme to learn the quantum dissipation of open systems based on the NODE, a machine learning algorithm being able to reproduce the Liouvilian from dynamical observations. The learning process can be accelerated by optimizing the strategy of data collection. There are many follow-up questions. The Lindblad master equation relies on the Born-Markov approximation, which limits the current method to weakly dissipative systems with short correlation time. The generalization from Markov to non-Markov is not so straightforward, since the memory effect leads to an integro-differential master equation Gardiner2000 ; Vega2017 . More advanced techniques are expected to deal with the gradients in relation to the integral kernel. Whether an effective Markovian description can be found for non-Markovian dynamics is still left open. Furthermore, the full-Liouvilian calculation shown above cannot be easily extended to large-scale many-body systems due to the exponential growth of the Hilbert space. One possible solution is to combine the NODE with quantum-trajectory approaches Gardiner2000 ; Daley2014 such as the truncated Wigner method Schachenmayer2015 ; Huber2022 and the tDMRG-quantum-trajectory method Daley2009 . In this regard, several recently developed machine-learning-based solvers Carleo2017 ; Nagy2019 ; Hartmann2019 ; Vicentini2019 ; Mazza2021 ; Liu2022 ; Luo2022 may provide valuable insights. Additionally, with a dissipative model, can machine learning algorithms help design the corresponding protocols for system-environment decoupling? We expect this work, as well as these questions, to stimulate more interdisciplinary studies in the fields of machine learning and open quantum systems.

Acknowledgements.
L.C. would like to thank Hui Zhai, Ce Wang, Juan Yao, Lei Pan, Yang Shen, and Sen Yang for the fruitful discussion. L.C. acknowledges support from the National Natural Science Foundation of China (Grant Nos. 12174236 and 12147215) and the postdoctoral fellowship offered by Hui Zhai. Part of this work was done during the fellowship at Tsinghua University, Beijing.

Appendix A Neural Ordinary Differential Equation

The NODE Chen2019 inherits the basic idea of the residual network He2016 and is able to reconstruct the differential equations satisfied by certain sequential data 𝒟\mathcal{D}. The NODE makes the ansatz 𝝆˙=𝒇[𝝆(t),𝜽,t]\dot{\boldsymbol{\rho}}=\boldsymbol{f}[\boldsymbol{\rho}(t),\boldsymbol{\theta},t]. If the specific form of 𝒇\boldsymbol{f} is priorly unknown, a deep neural network can be used to establish the mapping from {𝝆(t),𝜽,t}\{\boldsymbol{\rho}(t),\boldsymbol{\theta},t\} to 𝒇\boldsymbol{f}. However, if certain prior information is available, e.g., 𝒇\boldsymbol{f} satisfies the Lindblad equation as the situation considered in this work, the question comes to determine the unknown parameters 𝜽\boldsymbol{\theta} in 𝒇\boldsymbol{f}. Given an initial state 𝝆(t0)\boldsymbol{\rho}(t_{0}), one can propagate the NODE forwardly from t0t_{0} to tNt_{N} using certain ODE solvers and obtain predictive solutions {𝝆˙pred(tn)}\{\dot{\boldsymbol{\rho}}_{\text{pred}}(t_{n})\}. The loss function \mathcal{L} is defined as the effective distance between the {𝝆˙pred(tn)}\{\dot{\boldsymbol{\rho}}_{\text{pred}}(t_{n})\} and the realistic values {𝝆˙real(tn)}𝒟\{\dot{\boldsymbol{\rho}}_{\text{real}}(t_{n})\}\in\mathcal{D}. Next, we show how to obtain the derivative d/d𝜽d\mathcal{L}/d\boldsymbol{\theta}, which is based on the the adjoint field method Chen2019 .

The augmented adjoint state 𝒂aug\boldsymbol{a}_{\text{aug}} is defined as the derivative of \mathcal{L} with respect to the augmented states 𝝆aug=[𝝆,𝜽,t]T\boldsymbol{\rho}_{\text{aug}}=\left[\boldsymbol{\rho},\boldsymbol{\theta},t\right]^{T}, i.e.,

𝒂aug\displaystyle\boldsymbol{a}_{\text{aug}} =𝝆aug\displaystyle=\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}_{\text{aug}}} (16)
=[𝝆(t),𝜽(t),t]T\displaystyle=\left[\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}(t)},\frac{\partial\mathcal{L}}{\partial\boldsymbol{\theta}(t)},\frac{\partial\mathcal{L}}{\partial t}\right]^{T}
=[𝒂,𝒂θ,at]T,\displaystyle=\left[\boldsymbol{a},\boldsymbol{a}_{\theta},a_{t}\right]^{T},

which satisfies the differential equation

d𝒂augTdt\displaystyle\frac{d\boldsymbol{a}_{\text{aug}}^{T}}{dt} =𝒂augT𝒇aug𝝆augT,\displaystyle=-\boldsymbol{a}_{\text{aug}}^{T}\cdot\frac{\partial\boldsymbol{f}_{\text{aug}}}{\partial\boldsymbol{\rho}_{\text{aug}}^{T}}, (17)

where

𝒇aug=d𝝆augdt=[𝝆˙(t),𝜽˙,t˙]T=[𝒇,𝟎,1]T\boldsymbol{f}_{\text{aug}}=\frac{d\boldsymbol{\rho}_{\text{aug}}}{dt}=\left[\dot{\boldsymbol{\rho}}(t),\dot{\boldsymbol{\theta}},\dot{t}\right]^{T}=\left[\boldsymbol{f},\mathbf{0},1\right]^{T} (18)

is the derivative of the augmented state with respect to tt, and

𝒇aug𝝆augT=[𝒇,𝟎,1][𝝆,𝜽,t]T=[𝒇𝝆𝒇𝜽𝒇t𝟎𝟎𝟎𝟎𝟎0]\frac{\partial\boldsymbol{f}_{\text{aug}}}{\partial\boldsymbol{\rho}_{\text{aug}}^{T}}=\frac{\partial\left[\boldsymbol{f},\mathbf{0},1\right]}{\partial\left[\boldsymbol{\rho},\boldsymbol{\theta},t\right]^{T}}=\begin{bmatrix}\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{\rho}}&\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{\theta}}&\frac{\partial\boldsymbol{f}}{\partial t}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&0\\ \end{bmatrix} (19)

is the Jacobean matrix. Substituting Eqs. (19) and (16) into Eq. (17) we have

[𝒂˙T,𝒂˙θT,at˙]=[𝒂T𝒇𝝆T,𝒂T𝒇𝜽T,𝒂T𝒇t],\left[\dot{\boldsymbol{a}}^{T},\dot{\boldsymbol{a}}_{\theta}^{T},\dot{a_{t}}\right]=-\left[\boldsymbol{a}^{T}\cdot\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{\rho}^{T}},\boldsymbol{a}^{T}\cdot\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{\theta}^{T}},\boldsymbol{a}^{T}\cdot\frac{\partial\boldsymbol{f}}{\partial t}\right], (20)

and through backwardly propagating which from tNt_{N} to t0t_{0} we obtain the derivative 𝒂θ(t0)=d/d𝜽|t=t0\boldsymbol{a}_{\theta}(t_{0})=d\mathcal{L}/d\boldsymbol{\theta}|_{t=t_{0}}. Particularly, the boundary condition for the backward propagation is given by

𝒂(tN)\displaystyle\boldsymbol{a}(t_{N}) =\displaystyle= 𝝆(tN),\displaystyle\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}(t_{N})},
𝒂θ(tN)\displaystyle\boldsymbol{a}_{\theta}(t_{N}) =\displaystyle= 𝜽|t=tN=(𝝆T𝝆t)t𝜽|t=tN=𝟎,\displaystyle\left.\frac{\partial\mathcal{L}}{\partial\boldsymbol{\theta}}\right|_{t=t_{N}}=\left.\left(\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}^{T}}\cdot\frac{\partial\boldsymbol{\rho}}{\partial t}\right)\frac{\partial t}{\partial\boldsymbol{\theta}}\right|_{t=t_{N}}=\mathbf{0}, (21)
at(tN)\displaystyle a_{t}(t_{N}) =\displaystyle= t|t=tN=𝝆T𝝆t|t=tN=𝒂T(tN)𝒇(tN),\displaystyle\left.\frac{\partial\mathcal{L}}{\partial t}\right|_{t=t_{N}}=\left.\frac{\partial\mathcal{L}}{\partial\boldsymbol{\rho}^{T}}\cdot\frac{\partial\boldsymbol{\rho}}{\partial t}\right|_{t=t_{N}}=\boldsymbol{a}^{T}(t_{N})\cdot\boldsymbol{f}(t_{N}),

where /𝝆(tN)\partial\mathcal{L}/\partial\boldsymbol{\rho}(t_{N}) can be directly calculated using the automatic differentiation toolbox Baydin2018 ; Paszke2017 .

Note that, for a sequential data with intermediate data points at tnt_{n}, the total loss function \mathcal{L} is a summation of individual n\mathcal{L}_{n}, and hence d/d𝜽d\mathcal{L}/d\boldsymbol{\theta} is simply the average over all the periods of back propagation from tnt_{n} to tn1t_{n-1}, i.e., d/d𝜽=Nts1n=0Nts1𝒂θ(tn)d\mathcal{L}/d\boldsymbol{\theta}=N_{\text{ts}}^{-1}\sum_{n=0}^{N_{\text{ts}}-1}\boldsymbol{a}_{\theta}(t_{n}).

Appendix B Complementary Result of the Two Examples

Here, we provide more information on the two examples. The 4-by-4 Hamiltonian HH of the spin-3/2 model can be expanded by the SU(4) Hermitian generators g^μ\hat{g}_{\mu} in the fundamental representation. We practically take the matrices as Bertlmann2008

g^1\displaystyle\hat{g}_{1} =(0100100000000000),g^2=(0i00i00000000000),g^3=(1000010000000000),g^4=(0010000010000000),g^5=(00i00000i0000000),\displaystyle=\begin{pmatrix}0&1&0&0\\ 1&0&0&0\\ 0&0&0&0\\ 0&0&0&0\end{pmatrix},\ \hat{g}_{2}=\begin{pmatrix}0&-i&0&0\\ i&0&0&0\\ 0&0&0&0\\ 0&0&0&0\end{pmatrix},\ \hat{g}_{3}=\begin{pmatrix}1&0&0&0\\ 0&-1&0&0\\ 0&0&0&0\\ 0&0&0&0\end{pmatrix},\ \hat{g}_{4}=\begin{pmatrix}0&0&1&0\\ 0&0&0&0\\ 1&0&0&0\\ 0&0&0&0\end{pmatrix},\ \hat{g}_{5}=\begin{pmatrix}0&0&-i&0\\ 0&0&0&0\\ i&0&0&0\\ 0&0&0&0\end{pmatrix}, (22)
g^6\displaystyle\hat{g}_{6} =(0000001001000000),g^7=(000000i00i000000),g^8=13(1000010000200000),g^9=(0001000000001000),g^10=(000i00000000i000),\displaystyle=\begin{pmatrix}0&0&0&0\\ 0&0&1&0\\ 0&1&0&0\\ 0&0&0&0\end{pmatrix},\hat{g}_{7}=\begin{pmatrix}0&0&0&0\\ 0&0&-i&0\\ 0&i&0&0\\ 0&0&0&0\end{pmatrix},\hat{g}_{8}=\frac{1}{\sqrt{3}}\begin{pmatrix}1&0&0&0\\ 0&1&0&0\\ 0&0&-2&0\\ 0&0&0&0\end{pmatrix},\hat{g}_{9}=\begin{pmatrix}0&0&0&1\\ 0&0&0&0\\ 0&0&0&0\\ 1&0&0&0\end{pmatrix},\hat{g}_{10}=\begin{pmatrix}0&0&0&-i\\ 0&0&0&0\\ 0&0&0&0\\ i&0&0&0\end{pmatrix},
g^11\displaystyle\hat{g}_{11} =(0000000100000100),g^12=(0000000i00000i00),g^13=(0000000000010010),g^14=(00000000000i00i0),g^15=16(1000010000100003).\displaystyle=\begin{pmatrix}0&0&0&0\\ 0&0&0&1\\ 0&0&0&0\\ 0&1&0&0\end{pmatrix},\hat{g}_{12}=\begin{pmatrix}0&0&0&0\\ 0&0&0&-i\\ 0&0&0&0\\ 0&i&0&0\end{pmatrix},\hat{g}_{13}=\begin{pmatrix}0&0&0&0\\ 0&0&0&0\\ 0&0&0&1\\ 0&0&1&0\end{pmatrix},\hat{g}_{14}=\begin{pmatrix}0&0&0&0\\ 0&0&0&0\\ 0&0&0&-i\\ 0&0&i&0\end{pmatrix},\hat{g}_{15}=\frac{1}{\sqrt{6}}\begin{pmatrix}1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&-3\end{pmatrix}.
Refer to caption
Figure 5: First row: learning results for the spin-3/2 model with the t-independent probe. (a1) Dependence of \mathcal{L}, α\alpha and γ\gamma on training epochs. (a2) Spin dynamics of Sxreal𝒟\langle S_{x}\rangle_{\text{real}}\in\mathcal{D}, Sxpred\langle S_{x}\rangle_{\text{pred}} before training and Sxpred\langle S_{x}\rangle_{\text{pred}} after 10310^{3} epochs of training. (a3) and (a4) Real parameters (solid triangles) and predictive parameters (circles with error bars) after training. Second row: learning results for the spin-1/2 chain with the t-dependent probe. (b1) Training processes of \mathcal{L}, α\alpha and γ\gamma with the dataset being constructed by total spin measurements, i.e. 𝒟={Sμ=x,y,z(𝒫s,tn)}\mathcal{D}=\{\langle S_{\mu=x,y,z}(\mathcal{P}_{s},t_{n})\rangle\}. (b2)-(b4) Learning results on the dataset constructed by local spin measurements, i.e. 𝒟={σμ=x,y,zi=3(𝒫s,tn)}\mathcal{D}=\{\langle\sigma_{\mu=x,y,z}^{i=3}(\mathcal{P}_{s},t_{n})\rangle\}.

These generators naturally satisfy the traceless condition Tr[g^μ]=0\text{Tr}[\hat{g}_{\mu}]=0 and the orthogonal condition Tr[g^μg^ν]=2δμν\text{Tr}[\hat{g}_{\mu}\hat{g}_{\nu}]=2\delta_{\mu\nu}, which are also the properties that are supposed to be satisfied by the Lindblad dissipative operators JkJ_{k}. One may check that the one-body and the two-body dissipative operators of the second example (spin-1/2 chain) also satisfy these two conditions. Both conditions would be broken if any JkJ_{k} is with a finite trace ak=Tr[g^μ]0a_{k}=\text{Tr}[\hat{g}_{\mu}]\neq 0. However, since the Lindblad equation Eq. (2) is invariant under the transformation

Jk\displaystyle J_{k} Jkak,\displaystyle\longrightarrow J_{k}-a_{k}, (23)
H\displaystyle H H+i2μ[akJk(akJk)],\displaystyle\longrightarrow H+\frac{i}{2}\sum_{\mu}[a_{k}^{*}J_{k}-(a_{k}^{*}J_{k})^{\dagger}],

one can always recover the conditions by absorbing aka_{k} into HH. Note that, the Lindblad equation has another invariance JkeiϕkJkJ_{k}\rightarrow e^{i\phi_{k}}J_{k} with ϕk\phi_{k} being an arbitrary phase factor, which implies that the phase of γk\gamma_{k} is meaningless, and hence we set γk\gamma_{k} positive throughout this work.

The first row of Fig. 5 shows the learning result of the spin-3/2 model using the t-independent probe, where subfigures denote the dependence of \mathcal{L}, α\mathcal{E}_{\alpha} and γ\mathcal{E}_{\gamma} on training epochs (a1), the spin dynamics before and during training (a2), and the predictive parameters α\alpha (a3) and γ\gamma (a4) after training, respectively. It is indicated that the NODE can also accurately reproduce the spin-3/2 model with predictive errors α1\mathcal{E}_{\alpha}\lesssim 1\text{\textperthousand} and γ5%\mathcal{E}_{\gamma}\lesssim 5\%. The second row of Fig. 5 displays the result with respect to the spin-chain model using the t-dependent probe. Specifically in Fig. 5(b1), the data points were collected on the measurement of total spins. There, the loss function \mathcal{L} decreases but the predictive errors α,γ\mathcal{E}_{\alpha,\gamma} barely decrease. In contrast, if the measurement is performed on the local spins, favorable results are obtained as is illustrated in the Figs. 5(b2)-(b4). These behaviors verify the ”symmetry” remarks discussed in the main text, since the local observations σμi\langle\sigma_{\mu}^{i}\rangle explicitly break the translational ”symmetry”.

Appendix C General Solution of Lindblad Equation

We obtain the solution of the Lindblad equation by mapping the Liouvilian operator LL into the double Hilbert space, which is generally known as the Choi-Jamiolkwski isomorphism Choi1975 ; Jamiolkowski1972 or vectorization, i.e.,

L~=\displaystyle\tilde{L}= i[𝕀HHT𝕀]\displaystyle-i[\mathbb{I}\otimes H-H^{T}\otimes\mathbb{I}] (24)
+k[JkJk12(𝕀JkJk+(JkJk)T𝕀)],\displaystyle+\sum_{k}\left[J_{k}^{*}\otimes J_{k}-\frac{1}{2}\left(\mathbb{I}\otimes J_{k}^{\dagger}J_{k}+(J_{k}^{\dagger}J_{k})^{T}\otimes\mathbb{I}\right)\right],

where L~\tilde{L} is the vectorized Liouvilian operator and 𝕀\mathbb{I} is an identity operator with the same shape as that of HH. Generally, L~\tilde{L} is non-Hermitian with the spectrum structure L~=mϵm𝐮m𝐯m\tilde{L}=\sum_{m}\epsilon_{m}\mathbf{u}_{m}\cdot\mathbf{v}^{\dagger}_{m} where ϵm\epsilon_{m} denotes the Liouvilian spectrum, and 𝐮m\mathbf{u}_{m} and 𝐯m\mathbf{v}_{m} are the right and the left eigenvectors, respectively. In such a framework, the general solution of the Lindblad equation can be obtained as

𝝆~(t)=mλmeϵmt𝐮m,\tilde{\boldsymbol{\rho}}(t)=\sum_{m}\lambda_{m}e^{\epsilon_{m}t}\mathbf{u}_{m}, (25)

where 𝝆~\tilde{\boldsymbol{\rho}} is the vectorized density operator, and

λm=𝐯m𝝆~(t0)\lambda_{m}=\mathbf{v}^{\dagger}_{m}\cdot\tilde{\boldsymbol{\rho}}(t_{0}) (26)

characterizes the projective coefficients with respect to the initial state 𝝆~(t0)\tilde{\boldsymbol{\rho}}(t_{0}).

The Liouvilian spectrum ϵm=ϵmr+iϵmi\epsilon_{m}=\epsilon_{m}^{r}+i\epsilon_{m}^{i} is generally complex, where the imaginary part ϵmi\epsilon_{m}^{i} characterizes the undamped oscillations, while the real part ϵmr0\epsilon_{m}^{r}\leq 0 accounts for the damping of 𝐮m\mathbf{u}_{m} during evolution. In the spectrum, it is well-known that there exists an undamped state 𝐮0\mathbf{u}_{0} with ϵ0=0\epsilon_{0}=0, which is the steady state to which the system relaxes after a long evolution. The energy gap Δ=ϵ0ϵ1=Δ1+iΔ2\Delta=\epsilon_{0}-\epsilon_{1}=\Delta_{1}+i\Delta_{2} between the slowest damping state 𝐮1\mathbf{u}_{1} and the steady state 𝐮0\mathbf{u}_{0} is the Liouvilian gap. The real gap Δ1>0\Delta_{1}>0 determines the decoherent time in the way of tdc=1/Δ1t_{\text{dc}}=1/\Delta_{1}. For weak dissipation γ¯α¯\bar{\gamma}\ll\bar{\alpha}, Δ1\Delta_{1} is linearly proportional to the dissipative strength γ¯\bar{\gamma}, i.e., Δ1=Cγ¯\Delta_{1}=-C\bar{\gamma}, with CC a non-universal factor depending on the number of dissipation channels and the particular form of external probes, whereas the imaginary gap Δ2\Delta_{2} is characterized by the energy scale of α¯\bar{\alpha}. Hence, we generally have Δ1|Δ2|\Delta_{1}\ll|\Delta_{2}|.

The dataset 𝒟\mathcal{D} is constructed by the measurement

O(t)\displaystyle\left\langle O(t)\right\rangle =mλmeϵmtO~𝐮m\displaystyle=\sum_{m}\lambda_{m}e^{\epsilon_{m}t}\tilde{O}^{\dagger}\cdot\mathbf{u}_{m} (27)
λ0(O~𝐮0)+λ1eΔ1tf(t),\displaystyle\approx\lambda_{0}(\tilde{O}^{\dagger}\cdot\mathbf{u}_{0})+\lambda_{1}e^{-\Delta_{1}t}f(t),

where O~\tilde{O} is the vectorized observable OO and f(t)=eiΔ2t(O~𝐮1)+c.c.f(t)=e^{-i\Delta_{2}t}(\tilde{O}^{\dagger}\cdot\mathbf{u}_{1})+\text{c.c.} In the second line, we have neglected the contributions of faster damping states with m>1m>1. Clearly, the first term is a t-independent constant; the second term exhibits a fast oscillation f(t)f(t) modulated by a slowly damped envelop eΔ1te^{-\Delta_{1}t}. Taking derivative of Eq. (27) with respect to γ¯\bar{\gamma} leads to χ(t)\chi(t) [Eq. (13)], based on which one can obtain η\eta as γ¯\bar{\gamma} approaches the realistic value γ¯real\bar{\gamma}_{\text{real}}. Particularly for the MAE loss ηMAE\eta_{\text{MAE}}, we have

ηMAE\displaystyle\eta_{\text{MAE}} =1Nn|d|O(tn)predO(tn)real|dγ¯|\displaystyle=\frac{1}{N}\sum_{n}\left|\frac{d|\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}|}{d\bar{\gamma}}\right| (28)
=1Nn|dO(tn)preddγ¯|\displaystyle=\frac{1}{N}\sum_{n}\left|\frac{d\langle O(t_{n})\rangle_{\text{pred}}}{d\bar{\gamma}}\right|
=1Nnχ(t).\displaystyle=\frac{1}{N}\sum_{n}\chi(t).

On the other hand, for the MSE loss function, ηMSE\eta_{\text{MSE}} can be calculated by

ηMSE\displaystyle\eta_{\text{MSE}} =1Nn|d(O(tn)predO(tn)real)2dγ¯|\displaystyle=\frac{1}{N}\sum_{n}\left|\frac{d\left(\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}\right)^{2}}{d\bar{\gamma}}\right| (29)
=1Nn|O(tn)predO(tn)real||dO(tn)preddγ¯|\displaystyle=\frac{1}{N}\sum_{n}\left|\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}\right|\left|\frac{d\langle O(t_{n})\rangle_{\text{pred}}}{d\bar{\gamma}}\right|
=1Nn|δγ¯||O(tn)predO(tn)realδγ¯||dO(tn)preddγ¯|\displaystyle=\frac{1}{N}\sum_{n}|\delta\bar{\gamma}|\left|\frac{\langle O(t_{n})\rangle_{\text{pred}}-\langle O(t_{n})\rangle_{\text{real}}}{\delta\bar{\gamma}}\right|\left|\frac{d\langle O(t_{n})\rangle_{\text{pred}}}{d\bar{\gamma}}\right|
1Nnχ(t)2,\displaystyle\propto\frac{1}{N}\sum_{n}\chi(t)^{2},

with δγ¯=γ¯γ¯real\delta\bar{\gamma}=\bar{\gamma}-\bar{\gamma}_{\text{real}}. Eqs. (28) and (29) reproduce the Eq. (14).

References

  • (1) T. D. Ladd, F. Jelezko, R. Laflamme, Y. Nakamura, C. Monroe, and J. L. O’Brien, Quantum Computers, Nature 464, 45 (2010).
  • (2) M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information: 10th Anniversary Edition (Cambridge University Press, Cambridge, 2010).
  • (3) J. Preskill, Quantum Computing in the NISQ Era and Beyond, Quantum 2, 79 (2018).
  • (4) I. M. Georgescu, S. Ashhab, and F. Nori, Quantum Simulation, Rev. Mod. Phys. 86, 153 (2014).
  • (5) I. Bloch, J. Dalibard, and S. Nascimbène, Quantum Simulations with Ultracold Quantum Gases, Nat. Phys. 8, 267 (2012).
  • (6) A. I. Lvovsky, B. C. Sanders, and W. Tittel, Optical Quantum Memory, Nature Photon 3, 706 (2009).
  • (7) V. Giovannetti, S. Lloyd, and L. Maccone, Advances in Quantum Metrology, Nature Photon 5, 222 (2011).
  • (8) L. Pezzè, A. Smerzi, M. K. Oberthaler, R. Schmied, and P. Treutlein, Quantum Metrology with Nonclassical States of Atomic Ensembles, Rev. Mod. Phys. 90, 035005 (2018).
  • (9) C. L. Degen, F. Reinhard, and P. Cappellaro, Quantum Sensing, Rev. Mod. Phys. 89, 035002 (2017).
  • (10) H. M. Wiseman and G. J. Milburn, Quantum Theory of Optical Feedback via Homodyne Detection, Phys. Rev. Lett. 70, 548 (1993).
  • (11) D. Vitali, P. Tombesi, and G. J. Milburn, Controlling the Decoherence of a “Meter” via Stroboscopic Feedback, Phys. Rev. Lett. 79, 2442 (1997).
  • (12) E. L. Hahn, Spin Echoes, Phys. Rev. 80, 580 (1950).
  • (13) L. Viola, E. Knill, and S. Lloyd, Dynamical Decoupling of Open Quantum Systems, Phys. Rev. Lett. 82, 2417 (1999).
  • (14) G. de Lange, Z. H. Wang, D. Ristè, V. V. Dobrovitski, and R. Hanson, Universal Dynamical Decoupling of a Single Solid-State Spin from a Spin Bath, Science 330, 60 (2010).
  • (15) J. Du, X. Rong, N. Zhao, Y. Wang, J. Yang, and R. B. Liu, Preserving Electron Spin Coherence in Solids by Optimal Dynamical Decoupling, Nature 461, 1265 (2009).
  • (16) M. H. Abobeih, J. Cramer, M. A. Bakker, N. Kalb, M. Markham, D. J. Twitchen, and T. H. Taminiau, One-Second Coherence for a Single Electron Spin Coupled to a Multi-Qubit Nuclear-Spin Environment, Nat Commun 9, 2552 (2018).
  • (17) A. Laraoui, F. Dolde, C. Burk, F. Reinhard, J. Wrachtrup, and C. A. Meriles, High-Resolution Correlation Spectroscopy of 13C Spins near a Nitrogen-Vacancy Centre in Diamond, Nat. Commun. 4, 1651 (2013).
  • (18) Y. Sagi, I. Almog, and N. Davidson, Process Tomography of Dynamical Decoupling in a Dense Cold Atomic Ensemble, Phys. Rev. Lett. 105, 053201 (2010).
  • (19) Y. Wang, M. Um, J. Zhang, S. An, M. Lyu, J.-N. Zhang, L.-M. Duan, D. Yum, and K. Kim, Single-Qubit Quantum Memory Exceeding Ten-Minute Coherence Time, Nature Photon 11, 646 (2017).
  • (20) Q. Guo, S.-B. Zheng, J. Wang, C. Song, P. Zhang, K. Li, W. Liu, H. Deng, K. Huang, D. Zheng, X. Zhu, H. Wang, C.-Y. Lu, and J.-W. Pan, Dephasing-Insensitive Quantum Information Storage and Processing with Superconducting Qubits, Phys. Rev. Lett. 121, 130501 (2018).
  • (21) B. Pokharel, N. Anand, B. Fortman, and D. A. Lidar, Demonstration of Fidelity Improvement Using Dynamical Decoupling with Superconducting Qubits, Phys. Rev. Lett. 121, 220502 (2018).
  • (22) R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, Neural Ordinary Differential Equations, ArXiv:1806.07366 [Cs, Stat] (2018).
  • (23) G. Lindblad, On the Generators of Quantum Dynamical Semigroups, Commun. Math. Phys. 48, 119 (1976).
  • (24) V. Gorini, A. Kossakowski, and E. C. G. Sudarshan, Completely Positive Dynamical Semigroups of N‐level Systems, J. Math. Phys. 17, 821 (1976).
  • (25) C. W. Gardiner and P. Zoller, Quantum Noise: A Handbook of Markovian and Non-Markovian Quantum Stochastic Methods with Applications to Quantum Optics, 3rd ed (Springer, Berlin; New York, 2004).
  • (26) C. Di Franco, M. Paternostro, and M. S. Kim, Hamiltonian Tomography in an Access-Limited Setting without State Initialization, Phys. Rev. Lett. 102, 187203 (2009).
  • (27) J. Zhang and M. Sarovar, Quantum Hamiltonian Identification from Measurement Time Traces, Phys. Rev. Lett. 113, 080401 (2014).
  • (28) A. Sone and P. Cappellaro, Hamiltonian Identifiability Assisted by a Single-Probe Measurement, Phys. Rev. A 95, 022335 (2017).
  • (29) J. Zhang and M. Sarovar, Identification of Open Quantum Systems from Observable Time Traces, Phys. Rev. A 91, 052121 (2015).
  • (30) A. Sone and P. Cappellaro, Exact Dimension Estimation of Interacting Qubit Systems Assisted by a Single Quantum Probe, Phys. Rev. A 96, 062334 (2017).
  • (31) X.-L. Qi and D. Ranard, Determining a Local Hamiltonian from a Single Eigenstate, Quantum 3, 159 (2019).
  • (32) E. Bairey, I. Arad, and N. H. Lindner, Learning a Local Hamiltonian from Local Measurements, Phys. Rev. Lett. 122, 020504 (2019).
  • (33) E. Bairey, C. Guo, D. Poletti, N. H. Lindner, and I. Arad, Learning the Dynamics of Open Quantum Systems from Their Steady States, New J. Phys. 22, 032001 (2020).
  • (34) Z. Li, L. Zou, and T. H. Hsieh, Hamiltonian Tomography via Quantum Quench, Phys. Rev. Lett. 124, 160502 (2020).
  • (35) T. Xin, S. Lu, N. Cao, G. Anikeeva, D. Lu, J. Li, G. Long, and B. Zeng, Local-Measurement-Based Quantum State Tomography via Neural Networks, Npj Quantum Inf 5, 109 (2019).
  • (36) L. Che, C. Wei, Y. Huang, D. Zhao, S. Xue, X. Nie, J. Li, D. Lu, and T. Xin, Learning Quantum Hamiltonians from Single-Qubit Measurements, Phys. Rev. Research 3, 023246 (2021).
  • (37) X. Ma, Z. C. Tu, and S.-J. Ran, Deep Learning Quantum States for Hamiltonian Estimation, Chinese Phys. Lett. 38, 110301 (2021).
  • (38) R. Pascanu, T. Mikolov, and Y. Bengio, On the Difficulty of Training Recurrent Neural Networks, (2012).
  • (39) P. O. J. Scherer, Computational Physics: Simulation of Classical and Quantum Systems (Springer International Publishing, Heidelberg, 2013).
  • (40) M. W. Doherty, N. B. Manson, P. Delaney, F. Jelezko, J. Wrachtrup, and L. C. L. Hollenberg, The Nitrogen-Vacancy Colour Centre in Diamond, Physics Reports 528, 1 (2013).
  • (41) O. Klochan, A. P. Micolich, A. R. Hamilton, K. Trunov, D. Reuter, and A. D. Wieck, Observation of the Kondo Effect in a Spin-32\frac{3}{2} Hole Quantum Dot, Phys. Rev. Lett. 107, 076805 (2011).
  • (42) Y. Kawaguchi and M. Ueda, Spinor Bose-Einstein Condensates, Physics Reports 520, 253 (2012).
  • (43) H. Kim, K. Wang, Y. Nakajima, R. Hu, S. Ziemak, P. Syers, L. Wang, H. Hodovanets, J. D. Denlinger, P. M. R. Brydon, D. F. Agterberg, M. A. Tanatar, R. Prozorov, and J. Paglione, Beyond Triplet: Unconventional Superconductivity in a Spin-3/2 Topological Semimetal, Sci. Adv. 4, eaao4513 (2018).
  • (44) S. Sachdev and B. Keimer, Quantum Criticality, Physics Today 64, 29 (2011).
  • (45) C.-K. Chiu, J. C. Y. Teo, A. P. Schnyder, and S. Ryu, Classification of Topological Quantum Matter with Symmetries, Rev. Mod. Phys. 88, 035005 (2016).
  • (46) X.-G. Wen, Colloquium: Zoo of Quantum-Topological Phases of Matter, Rev. Mod. Phys. 89, 041004 (2017).
  • (47) To deal with an open quantum system with LL spins is equivalent to deal with a closed system with 2L2L spins, since the open system can be viewed as two coupled closed systems under the Choi-Jamiolkwski isomorphism [see Eq. (24) in Appendix C]. The computational complexity are further enlarged by NbsNini102N_{\text{bs}}N_{\text{ini}}\sim 10^{2} times due to batches and initializations. Practically, we rent two Nvidia GTX 1080 Ti graphics cards to speed up the numerics, which keeps the training time of the spin-chain task with L=5L=5 at about one day.
  • (48) C. W. Gardiner and P. Zoller, Quantum Noise (Springer Berlin Heidelberg, Berlin, Heidelberg, 2000).
  • (49) I. de Vega and D. Alonso, Dynamics of Non-Markovian Open Quantum Systems, Rev. Mod. Phys. 89, 015001 (2017).
  • (50) A. J. Daley, Quantum Trajectories and Open Many-Body Quantum Systems, Advances in Physics 63, 77 (2014).
  • (51) J. Schachenmayer, A. Pikovski, and A. M. Rey, Many-Body Quantum Spin Dynamics with Monte Carlo Trajectories on a Discrete Phase Space, Phys. Rev. X 5, 011022 (2015).
  • (52) J. Huber, A. M. Rey, and P. Rabl, Realistic Simulations of Spin Squeezing and Cooperative Coupling Effects in Large Ensembles of Interacting Two-Level Systems, Phys. Rev. A 105, 013716 (2022).
  • (53) A. J. Daley, J. M. Taylor, S. Diehl, M. Baranov, and P. Zoller, Atomic Three-Body Loss as a Dynamical Three-Body Interaction, Phys. Rev. Lett. 102, 040402 (2009).
  • (54) G. Carleo and M. Troyer, Solving the Quantum Many-Body Problem with Artificial Neural Networks, Science 355, 602 (2017).
  • (55) A. Nagy and V. Savona, Variational Quantum Monte Carlo Method with a Neural-Network Ansatz for Open Quantum Systems, Phys. Rev. Lett. 122, 250501 (2019).
  • (56) M. J. Hartmann and G. Carleo, Neural-Network Approach to Dissipative Quantum Many-Body Dynamics, Phys. Rev. Lett. 122, 250502 (2019).
  • (57) F. Vicentini, A. Biella, N. Regnault, and C. Ciuti, Variational Neural-Network Ansatz for Steady States in Open Quantum Systems, Phys. Rev. Lett. 122, 250503 (2019).
  • (58) P. P. Mazza, D. Zietlow, F. Carollo, S. Andergassen, G. Martius, and I. Lesanovsky, Machine Learning Time-Local Generators of Open Quantum Dynamics, Phys. Rev. Research 3, 023084 (2021).
  • (59) Z. Liu, L.-M. Duan, and D.-L. Deng, Solving Quantum Master Equations with Deep Quantum Neural Networks, Phys. Rev. Research 4, 013097 (2022).
  • (60) D. Luo, Z. Chen, J. Carrasquilla, and B. K. Clark, Autoregressive Neural Network for Simulating Open Quantum Systems via a Probabilistic Formulation, Phys. Rev. Lett. 128, 090501 (2022).
  • (61) K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Las Vegas, NV, USA, 2016), pp. 770–778.
  • (62) R. A. Bertlmann and P. Krammer, Bloch Vectors for Qudits, J. Phys. A: Math. Theor. 41, 235303 (2008).
  • (63) A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind. Automatic Differentiation in Machine Learning: a Survey. The Journal of Machine Learning Research, 18, 1 (2018).
  • (64) A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, Automatic Differentiation in PyTorch, in NIPS 2017 Workshop on Autodiff (Long Beach, California, USA, 2017).
  • (65) M.-D. Choi, Completely Positive Linear Maps on Complex Matrices, Linear Algebra and Its Applications 10, 285 (1975).
  • (66) A. Jamiołkowski, Linear Transformations Which Preserve Trace and Positive Semidefiniteness of Operators, Reports on Mathematical Physics 3, 275 (1972).