Preparation of Metrological States in Dipolar-Interacting Spin Systems

Tian-Xing Zheng^1,2 Anran Li¹ Jude Rosen¹ Sisi Zhou^1,3 Martin Koppenhöfer¹ Ziqi Ma^4,5 Frederic T. Chong⁴ Aashish A. Clerk¹ Liang Jiang¹ Peter C. Maurer¹ ¹Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, USA ²Department of Physics, University of Chicago, Chicago, Illinois 60637, USA ³Institute for Quantum Information and Matter, California Institute of Technology, Pasadena, California 91125, USA ⁴Department of Computer Science, University of Chicago, Chicago, Illinois 60637, USA ⁵Microsoft, Redmond, Washington 98052, USA

Abstract

Spin systems are an attractive candidate for quantum-enhanced metrology. Here we develop a variational method to generate metrological states in small dipolar-interacting ensembles with limited qubit controls and unknown spin locations. The generated states enable sensing beyond the standard quantum limit (SQL) and approaching the Heisenberg limit (HL). Depending on the circuit depth and the level of readout noise, the resulting states resemble Greenberger-Horne-Zeilinger (GHZ) states or Spin Squeezed States (SSS). Sensing beyond the SQL holds in the presence of finite spin polarization and a non-Markovian noise environment.

^†^†preprint: APS/123-QED

Introduction.

Spin systems have emerged as a promising platform for quantum sensing [Degen et al., 2017; Barry et al., 2020; Schirhagl et al., 2014; Tetienne, 2021] with applications ranging from tests of fundamental physics [Hensen et al., 2015,Marti et al., 2018] to mapping fields and temperature profiles in condensed matter systems and life sciences [Schirhagl et al., 2014]. Improving the sensitivity of these qubit sensors has so far largely relied on increasing the number of sensing spins and extending spin coherence through material engineering and coherent control. However, with increasing spin density, dipolar interactions between individual sensor spins cause single-qubit dephasing [Yan et al., 2013,Childress et al., 2006] and, in the absence of advanced dynamical decoupling [Waugh et al., 1968; Mehring, 2012; Choi et al., 2020], set a limit to the sensitivity.

Although dipolar interactions in dense spin ensembles lead to complex evolution, they can provide a resource for the creation of metrological states that enable sensing beyond the SQL. Current approaches to create such states (i.e., GHZ states and SSS) either require all-to-all interactions [Pedrozo-Peñafiel et al., 2020; Kitagawa and Ueda, 1993; Bilitewski et al., 2021] or single-qubit addressability [Chen et al., 2020,Neumann et al., 2008], which are challenging to implement experimentally. An alternative approach that relies on adiabatic state preparation requires less control but results in preparation times that increase exponentially with system size [Cappellaro and Lukin, 2009,Choi et al., 2017a], leaving this method susceptible to dephasing.

Variational methods provide a powerful tool for controlling many-body quantum systems [Cerezo et al., 2021; Kokail et al., 2019; Li and Benjamin, 2017; Zhou et al., 2020a]. Such methods have been proposed for Rydberg-interacting atomic systems [Kaubruegger et al., 2019,Kaubruegger et al., 2021] and demonstrated in trapped ions [Marciniak et al., 2021]. However, these techniques rely on effective all-to-all interactions (e.g. almost constant interaction strength inside the Rydberg radius [Kaubruegger et al., 2019,Borish et al., 2020,Bernien et al., 2017]) which are generally absent in solid-state spin ensembles. In this work, we develop a variational algorithm that drives dipolar-interacting spin systems [Fig. 1(a)] into highly entangled states. The resulting states can be subsequently used for Ramsey-interferometry-based single parameter estimation [Degen et al., 2017]. The required system control relies solely on uniform single-qubit rotations and free evolution under dipolar interactions. The optimization can be directly performed on an experimental device using only its measurement outcomes without the need to know the spatial distribution of the spins (later referred as ‘spin configuration’). Potential experimental platforms include dipolar-interacting ensembles of NV centers, nitrogen defects in diamond (P1), rare-earth-doped crystals, and ultra-cold molecules.

Refer to caption — Figure 1: (a) Schematic of a dipolar-interacting spin ensemble in a 3D-random configuration. (b) The quantum circuit consists of three parts: a sequence for generating entanglement (entangler), phase accumulation (Ramsey) and single-qubit readout in the $S_{z}$ basis. Dipolar interactions during Ramsey interference are eliminated by dynamical decoupling [Yan *et al.*, 2013,Waugh *et al.*, 1968,Zhou *et al.*, 2020b]. The measurement outcome is processed on a classical computer and used to determine the next generation for $\bm{\theta}$ . (c) Gate sequence of each variational layer and the Wigner distributions for a 5-spin state after each gate. (d) Illustration of an optimization process on a 3-spin system with $m=1$ . The contour plots show the 2D projection of the multidimensional $\bm{\theta}$ space for fixed $\vartheta_{1}$ . The orange points mark the sampling positions in the parameter space. Convergence to the global maximum is reached in the 63th generation.

Variational Ansatz.

As shown in Fig. 1(b), the variational circuit $\mathcal{S}(\bm{\theta})=\mathcal{U}_{m}...\mathcal{U}_{2}\mathcal{U}_{1}$ is constructed by $m$ layers of unitary operations. Each $\mathcal{U}_{i}$ consists of the parameterized control gates

\mathcal{U}_{i}=R_{y}\left(\frac{\pi}{2}\right)D\left(\tau^{\prime}_{i}\right)R_{y}\left(-\frac{\pi}{2}\right)R_{x}\left(\vartheta_{i}\right)D\left(\tau_{i}\right),

(1)

where $R_{\mu}(\vartheta)=\exp(-i\vartheta\sum_{j=1}^{N}S_{j}^{\mu})$ are single-qubit rotations and $S_{j}^{\mu}$ $(\mu\in\{x,y,z\})$ is the $\mu$ component of the $j$ -th spin operator. $D(\tau)=\exp(-i\tau H_{\text{dd}}/\hbar)$ is the time evolution operator of the spin ensemble under dipolar-interaction Hamiltonian $H_{\text{dd}}=\sum_{i<j}V_{ij}(2S^{z}_{i}S^{z}_{j}-S^{x}_{i}S^{x}_{j}-S^{y}_{i}S^{y}_{j})$ . The coupling strength between two spins at positions $\bm{r}_{i}$ and $\bm{r}_{j}$ is

V_{ij}=\frac{\mu_{0}}{4\pi}\frac{\gamma_{i}\gamma_{j}\hbar^{2}}{\left|\bm{r}_{i}-\bm{r}_{j}\right|^{3}}\frac{(1-3\cos\beta_{ij})}{2},

(2)

with $\mu_{0}$ the vacuum permeability, $\hbar$ the reduced Planck constant, $\gamma$ the spin’s gyromagnetic ratio, and $\beta_{ij}$ the angle between the line segment connecting ( $\bm{r}_{i}$ , $\bm{r}_{j}$ ) and the direction of bias magnetic field. An evolutionary algorithm [Hansen, 2016,SM, ] is applied on the $m$ -layer circuit which contains $3m$ free parameters constituting the vector $\bm{\theta}=(\tau_{1},\vartheta_{1},\tau^{\prime}_{1},...,\tau_{i},\vartheta_{i},\tau^{\prime}_{i},...,\tau_{m},\vartheta_{m},\tau^{\prime}_{m})$ . Each $\tau_{i}$ is restricted to $\tau_{i}\in[0,1/\bar{f}_{\text{dd}}]$ where $\bar{f}_{\text{dd}}$ is the average nearest-neighbor interaction strength for the considered spin configuration. The Ansatz in Eq. (1) is the most general set of global single-qubit gates that preserves the initial collective spin direction $\langle\sum_{i}\bm{S}_{i}\rangle/|\langle\sum_{i}\bm{S}_{i}\rangle|$ , here chosen to be the $x$ -direction [Kaubruegger et al., 2019,SM, ]. Although this Ansatz does not enable universal system control [Schirmer et al., 2001; Albertini and D’Alessandro, 2002, 2021a; SM, ], we show that with increasing circuit depth, sensing near the HL can be achieved.

Metrological cost function.

The Ramsey protocol shown in Fig. 1(b) encodes the quantity of interest in the accumulated phase $\phi=\omega t_{\text{R}}$ , with $\omega$ the detuning frequency and $t_{\text{R}}$ the Ramsey sensing time. The Classical Fisher Information (CFI) [Degen et al., 2017,SM, ] quantifies how precisely one can estimate an unknown parameter $\phi$ under a measurement basis. Our variational approach treats the spin systems as a black-box for which the algorithm finds a control sequence that maximizes the CFI associated with the parameter estimation problem

\mathrm{CFI}_{\phi}=\sum_{z}\Tr[P_{z}\rho_{\phi}]\left(\frac{\partial\log\Tr[P_{z}\rho_{\phi}]}{\partial\phi}\right)^{2}.

(3)

The sum runs over the $2^{N}$ basis states $\ket{z}=\otimes_{i=1}^{N}\ket{s_{i}^{z}}$ , where $s_{i}^{z}$ are the eigenvalues of $S_{i}^{z}$ . $P_{z}\equiv\ket{z}\bra{z}$ denotes the corresponding measurement operator and $\rho_{\phi}$ the density matrix. The CFI is chosen as cost function because it is a measure for the maximal achievable sensitivity for a given measurement basis [Degen et al., 2017,Kobayashi et al., 2011]. Likewise, $P_{z}$ provides the maximal information that can be gained from single-qubit measurements. However, measurement operators such as parity or total spin polarization result in a smaller outcome space and are therefore more efficient in experimental implementations. While this study optimizes the measurement for $P_{z}$ , the obtained results likewise hold for parity and total spin polarization [SM, ].

Numerical results for regular and disordered spin configurations.

We start by testing our approach for three distinct regular spin configurations. Figure 2(a) shows the CFI after optimization for spins arranged on a linear chain (blue), a two-dimensional (2D) square lattice (orange), and a circle (green). All three configurations result in states with CFI above the SQL. When multiple circuit layers are added, the CFI further improves. Next, we simulate the case of disordered three-dimensional (3D) spin configurations (later referred as 3D-random). In our simulations the spins are randomly located in a box of length $L\propto N^{1/3}$ (constant spin density). Compared to the regular spin array, the disordered case shows a noticeable saturation of the CFI as a funciton of $N$ . With increased circuit depth, sensing precision beyond the SQL is maintained. The required circuit depth increases drastically with $N$ [Fig. 2(c)].

Characterization of entanglement.

We investigate the $N$ -qubit entangled states created by our variational method. Figure 3(a) shows the corresponding Wigner distributions [Hillery et al., 1984; Dowling et al., 1994; Koczor et al., 2020] for a regular 2D spin array (top) and the average Wigner distributions for 50 different 3D-random spin configurations (bottom). In both cases, the optimized states resemble GHZ states when $N$ is small and $m$ is large. For large $N$ and small $m$ , the states are close to SSS. Non-Gaussian states that provide sensitivity beyond the SSS but lower than GHZ states are also generated. Our algorithm tends to drive the systems into a GHZ state, as it has the unique property of attaining the HL in Ramsey spectroscopy [Giovannetti et al., 2006].

For quantitatively analysing the buildup of entanglement, the von-Neumann entanglement entropy ( $E_{\text{vN}}=-\Tr(\rho_{\text{s}}\log_{2}\rho_{\text{s}})$ ) [Nielsen and Chuang, 2010] is used as a measure for the degree of entanglement between a spin subsystem ( $\rho_{\text{s}}=\Tr_{\text{s}}{\rho_{\text{tot}}}$ ) and the remaining system. As an example, we explore one case of a 3D-random configuration of 9 spins. Figure 3(b) shows the von-Neumann entropy of each spin after employing a 2-layer circuit (left) and a 7-layer circuit (right). In the case of $m=2$ , the achieved degree of entanglement is modest with spin No.6 for example showing no substantial entanglement with the remaining spins. When the circuit depth is increased to 7, all spins display substantial entanglement. While the single-particle entropy detects spins unentangled with the remaining system, it does not determine whether all spins are entangled with each other or entanglement is local. We distinguish these two scenarios by identifying the smallest clusters with $E_{\text{vN}}\leq 0.4$ . For $m=2$ , the spin ensemble segments into 5 clusters [Fig. 3(b)], while for $m=7$ only 2 clusters are found. The results verify that multiple layers are required to overcome the anisotropy of the dipolar interaction (Eq. (2)) when building up entanglement over the entire system. Finally, in Fig. 3(c) we analyze the size of the largest cluster for each of the 50 spin configurations and observe an overall increase of the largest cluster size and a decrease of the variance.

State preparation time.

Minimizing the preparation time is central in practical applications, as it increases bandwidth, reduces decoherence, and enables more measurement repetitions [Degen et al., 2017]. Figure 3(d) shows the average state preparation time for 8 spins in 50 different 3D-random configurations as a function of layer number. The preparation time increases with the layer number and is inversely proportional to the average dipole coupling strength of the nearest-neighbour spins $\bar{f}_{\text{dd}}$ . Compared to adiabatic methods [Cappellaro and Lukin, 2009], our approach results in an $11\times$ reduction of the preparation time to reach the same CFI for identical spin number and density [SM, ].

State preparation under decoherence, initialization and readout errors

Until now our analysis assumed full coherence and perfect spin initialization and readout. However, dephasing, initializaiton, and readout errors will be limiting factors in experimental implementations. We next examine the impact of such imperfections on state preparation and sensing. Figure 4(a) shows the CFI in the presence of imperfect initialization and finite readout fidelity for spins on a 2D square lattice. For $N\leq 10$ , beyond-SQL precision is reached with 90% initialization and 95% readout fidelity, respectively.

Next, we investigate the resulting states when readout errors are added into the optimizer. Figure 4(b) indicates that without readout errors, the Wigner distribution of the resulting state is close to a GHZ state. However, with a finite readout error rate, our algorithm drives the system into a state resembling a SSS. When the readout noise is further increased, the SSS transforms into a coherent spin state (CSS). The results agree with the fact that GHZ states are sensitive to single-spin readout errors while SSS are more robust [Davis et al., 2017].

Table 1: Experimental platforms’ relative parameters

Systems	$\bar{f}_{\text{dd}}$	$T_{2}^{\text{(DD)}}$	$\bar{f}_{\text{dd}}T_{2}^{\text{(DD)}}$	$P_{\text{ini}}$	$F_{\text{readout}}$	$\nu$
NV ensemble	$35$ kHz [Zhou et al., 2020b]	$7.9(2)\mu$ s [Zhou et al., 2020b]	0.28	$97.5\%$ [Shields et al., 2015]	$97.5\%$ [Shields et al., 2015]	$2-4$ [Childress et al., 2006]
P1 centers	$0.92$ MHz [Zu et al., 2021]	$4.4\mu$ s [Zu et al., 2021]	4.0	$95\%$ [Degen et al., 2021]	$95\%$ [Degen et al., 2021]	?
Rare-Earth crystals	$1.96$ MHz [Merkel et al., 2021]	$2.5\mu$ s [Merkel et al., 2021]	4.9	$97\%$ [Chen et al., 2020]	$94.6\%$ [Raha et al., 2020]	$2.4\pm 0.1$ [Le Dantec et al., 2021]
Cold Molecules	$52$ Hz [Yan et al., 2013]	$80$ ms [Yan et al., 2013]	4.16	$97\%$ [Cheuk et al., 2020]	$97\%$ [Cheuk et al., 2020]	?

During the state preparation, decoherence ( $T_{2}$ ) reduces entanglement. We assume independent, Markovian dephasing of each spin as described by a Lindblad master equation [Nielsen and Chuang, 2010]. Figure 4(c) shows the CFI for various $T_{2}$ times using the previously optimized gate parameters for 2D square lattice. While a finite $T_{2}$ decreases the CFI, coherence times exceeding $5/f_{\text{dd}}$ result in states with beyond-SQL sensitivity. Here, $f_{\text{dd}}$ denotes the nearest-neighbor interaction strength for 2D square lattice. Since performing optimization with imperfections is numerically expensive, the results in Fig. 4(a), (c), (d) are obtained by optimizing the parameters in the absence of imperfections and using those parameters to compute the CFI under imperfect conditions. Thus, better results are expected if the optimization is directly run on experiments.

Sensitivity in a non-Markovian environment.

In addition to impacts on state preparation, dephasing affects performance in Ramsey interferometry. In the presence of spatially uncorrelated Markovian noise, entanglement does not lead to a beyond-SQL scaling [Demkowicz-Dobrzański et al., 2012,Escher et al., 2011]. In a non-Markovian environment, such as that of a solid-state spin system, this limitation does not hold [Chin et al., 2012,Smirne et al., 2016]. We examine the performance of our optimized states in a non-Markovian noise environment. We adopt a noise model [Chin et al., 2012] in which the amplitude of single-spin coherence reduces according to

\rho_{01}(t)=\rho_{01}(0)e^{-\left(\frac{t}{T_{2}}\right)^{\nu}}

(4)

where $\nu$ is the stretch factor set by the noise properties. The time evolution under Ramsey propagation is simulated with a generalized Lindblad master equation [SM, ,Smirne et al., 2016]. The sensing performance of optimized states is characterized by the square of the signal-noise-ratio SNR ${}^{2}\propto\text{CFI}_{\omega}/t_{\text{R}}$ [SM, ]. Figure 4(d) shows their performance compared to the CSS and the GHZ states for a $\nu=2$ and $\nu=4$ noise exponent [Childress et al., 2006]. The created entangled states provide an advantage over uncorrelated states. For small spin numbers, the SNR follows the HL scaling [Chin et al., 2012].

Proposed experimental platforms.

Candidate systems for realizing the proposed variational approach need to possess long $T_{2}$ coherence time, strong dipolar-interacting strength, and high initialization and readout fidelity. Recent developments in solid-state spin systems and ultracold molecules have demonstrated coherence times that exceed dipolar coupling times ( $1/\bar{f}_{\text{dd}}$ ) as well as high-fidelity spin initialization and readout. Table S1 lists the experimentally observed parameters for different candidate systems, including Nitrogen Vacancy (NV) ensembles, P1 centers in diamond, rare-earth doped crystals, and ultracold molecule tweezer systems [SM, ].

Conclusion and Outlook.

This work introduces a variational circuit that generates entangled metrological states in a dipolar-interacting spin system without requiring knowledge of the actual spin configuration. The required system parameters are within the reach of several experimental platforms. While this study remains limited to small system sizes ( $N\leq 10$ , limited by computational resource), our results are of immediate interest to nanoscale quantum sensing where spatial resolution is paramount and the finite sensor size limits the number of spins that can be utilized. Extending our results to $N>10$ can either be achieved by utilizing symmetries in regular arrays or directly testing our optimization algorithms on an actual experimental platform. The developed method is also potentially applicable for preparing other relevant highly entangled states in quantum computing and quantum communication.

We thank D. DeMille, D. Freedmann, A. Bleszynski Jayich, S. Kolkowitz, T. Li, Z. Li, R. Kaubruegger, Y. Huang, Q. Xuan, Z. Zhang, S. von Kugelgen, C-J. Yu, Y. Bao, and P. Gokhale for helpful discussions. T-X.Z., M.K., F.T.C., A.C., L.J. and P.M. acknowledge support by Q-NEXT Grant No. DOE 1F-60579. T-X.Z. and P.M. acknowledge support by National Science Foundation (NSF) Grant No. OMA-1936118 and OIA-2040520, and NSF QuBBE QLCI (NSF OMA- 2121044). S.Z. acknowledges funding provided by the Institute for Quantum Information and Matter, an NSF Physics Frontiers Center (NSF Grant PHY-1733907). Z.M. and F.T.C acknowledge support by EPiQC, an NSF Expedition in Computing, under grants CCF-1730082/1730449; in part by STAQ under grant NSF Phy-1818914; in part by the US DOE Office of Advanced Scientific Computing Research, Accelerated Research for Quantum Computing Program.; and in part by NSF OMA-2016136. The authors are also grateful for the support of the University of Chicago Research Computing Center for assistance with the numerical simulations carried out in this work.

Disclosure: F.T.C. is the Chief Scientist for Super.tech and an advisor to QCI.

References

Degen et al. (2017) C. L. Degen, F. Reinhard, and P. Cappellaro, Reviews of modern physics 89, 035002 (2017).
Barry et al. (2020) J. F. Barry, J. M. Schloss, E. Bauch, M. J. Turner, C. A. Hart, L. M. Pham, and R. L. Walsworth, Reviews of Modern Physics 92, 015004 (2020).
Schirhagl et al. (2014) R. Schirhagl, K. Chang, M. Loretz, and C. L. Degen, Annual review of physical chemistry 65, 83 (2014).
Tetienne (2021) J.-P. Tetienne, Nature Physics 17, 1074 (2021).
Hensen et al. (2015) B. Hensen, H. Bernien, A. E. Dréau, A. Reiserer, N. Kalb, M. S. Blok, J. Ruitenberg, R. F. Vermeulen, R. N. Schouten, C. Abellán, et al., Nature 526, 682 (2015).
Marti et al. (2018) G. E. Marti, R. B. Hutson, A. Goban, S. L. Campbell, N. Poli, and J. Ye, Physical review letters 120, 103201 (2018).
Yan et al. (2013) B. Yan, S. A. Moses, B. Gadway, J. P. Covey, K. R. Hazzard, A. M. Rey, D. S. Jin, and J. Ye, Nature 501, 521 (2013).
Childress et al. (2006) L. Childress, M. G. Dutt, J. Taylor, A. Zibrov, F. Jelezko, J. Wrachtrup, P. Hemmer, and M. Lukin, Science 314, 281 (2006).
Waugh et al. (1968) J. S. Waugh, L. M. Huber, and U. Haeberlen, Physical Review Letters 20, 180 (1968).
Mehring (2012) M. Mehring, Principles of high resolution NMR in solids (Springer Science & Business Media, 2012).
Choi et al. (2020) J. Choi, H. Zhou, H. S. Knowles, R. Landig, S. Choi, and M. D. Lukin, Physical Review X 10, 031002 (2020).
Pedrozo-Peñafiel et al. (2020) E. Pedrozo-Peñafiel, S. Colombo, C. Shu, A. F. Adiyatullin, Z. Li, E. Mendez, B. Braverman, A. Kawasaki, D. Akamatsu, Y. Xiao, and V. Vuletić, Nature 588, 414 (2020).
Kitagawa and Ueda (1993) M. Kitagawa and M. Ueda, Physical Review A 47, 5138 (1993).
Bilitewski et al. (2021) T. Bilitewski, L. De Marco, J.-R. Li, K. Matsuda, W. G. Tobias, G. Valtolina, J. Ye, and A. M. Rey, Physical Review Letters 126, 113401 (2021).
Chen et al. (2020) S. Chen, M. Raha, C. M. Phenicie, S. Ourari, and J. D. Thompson, Science 370, 592 (2020).
Neumann et al. (2008) P. Neumann, N. Mizuochi, F. Rempp, P. Hemmer, H. Watanabe, S. Yamasaki, V. Jacques, T. Gaebel, F. Jelezko, and J. Wrachtrup, science 320, 1326 (2008).
Cappellaro and Lukin (2009) P. Cappellaro and M. D. Lukin, Physical Review A 80, 032311 (2009).
Choi et al. (2017a) S. Choi, N. Y. Yao, and M. D. Lukin, arXiv:1801.00042 (2017a).
Cerezo et al. (2021) M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, et al., Nature Reviews Physics , 1 (2021).
Kokail et al. (2019) C. Kokail, C. Maier, R. van Bijnen, T. Brydges, M. K. Joshi, P. Jurcevic, C. A. Muschik, P. Silvi, R. Blatt, C. F. Roos, et al., Nature 569, 355 (2019).
Li and Benjamin (2017) Y. Li and S. C. Benjamin, Physical Review X 7, 021050 (2017).
Zhou et al. (2020a) L. Zhou, S.-T. Wang, S. Choi, H. Pichler, and M. D. Lukin, Physical Review X 10, 021067 (2020a).
Kaubruegger et al. (2019) R. Kaubruegger, P. Silvi, C. Kokail, R. van Bijnen, A. M. Rey, J. Ye, A. M. Kaufman, and P. Zoller, Physical review letters 123, 260505 (2019).
Kaubruegger et al. (2021) R. Kaubruegger, D. V. Vasilyev, M. Schulte, K. Hammerer, and P. Zoller, Physical review X 11, 2160 (2021).
Marciniak et al. (2021) C. D. Marciniak, T. Feldker, I. Pogorelov, R. Kaubruegger, D. V. Vasilyev, R. van Bijnen, P. Schindler, P. Zoller, R. Blatt, and T. Monz, arXiv:2107.01860 (2021).
Borish et al. (2020) V. Borish, O. Marković, J. A. Hines, S. V. Rajagopal, and M. Schleier-Smith, Physical review letters 124, 063601 (2020).
Bernien et al. (2017) H. Bernien, S. Schwartz, A. Keesling, H. Levine, A. Omran, H. Pichler, S. Choi, A. S. Zibrov, M. Endres, M. Greiner, et al., Nature 551, 579 (2017).
Zhou et al. (2020b) H. Zhou, J. Choi, S. Choi, R. Landig, A. M. Douglas, J. Isoya, F. Jelezko, S. Onoda, H. Sumiya, P. Cappellaro, H. S. Knowles, H. Park, and M. D. Lukin, Physical review X 10, 031003 (2020b).
Hansen (2016) N. Hansen, arXiv:1604.00772 (2016).
(30) See Supplemental Material for additional details.
Schirmer et al. (2001) S. G. Schirmer, H. Fu, and A. I. Solomon, Physical Review A 63, 063410 (2001).
Albertini and D’Alessandro (2002) F. Albertini and D. D’Alessandro, Linear algebra and its applications 350, 213 (2002).
Albertini and D’Alessandro (2021a) F. Albertini and D. D’Alessandro, ScienceDirect: Systems & Control Letters 151, 104913 (2021a).
Kobayashi et al. (2011) H. Kobayashi, B. L. Mark, and W. Turin, Probability, random processes, and statistical analysis (Cambridge University Press, 2011).
Hillery et al. (1984) M. Hillery, R. F. O’Connell, M. O. Scully, and E. P. Wigner, Physics reports 106, 121 (1984).
Dowling et al. (1994) J. P. Dowling, G. S. Agarwal, and W. P. Schleich, Physical Review A 49, 4101 (1994).
Koczor et al. (2020) B. Koczor, R. Zeier, and S. J. Glaser, Physical Review A 102, 062421 (2020).
Giovannetti et al. (2006) V. Giovannetti, S. Lloyd, and L. Maccone, Physical review letters 96, 010401 (2006).
Nielsen and Chuang (2010) M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, 2010).
Davis et al. (2017) E. Davis, G. Bentsen, T. Li, and M. Schleier-Smith, Advances in Photonics of Quantum Computing, Memory, and Communication X 10118, 101180Z (2017).
Shields et al. (2015) B. J. Shields, Q. P. Unterreithmeier, N. P. de Leon, H. Park, and M. D. Lukin, Physical review letters 114, 136402 (2015).
Zu et al. (2021) C. Zu, F. Machado, B. Ye, S. Choi, B. Kobrin, T. Mittiga, S. Hsieh, P. Bhattacharyya, M. Markham, D. Twitchen, A. Jarmola, D. Budker, C. R. Laumann, J. E. Moore, and N. Y. Yao, Nature 597, 45–50 (2021).
Degen et al. (2021) M. Degen, S. Loenen, H. Bartling, C. Bradley, A. Meinsma, M. Markham, D. Twitchen, and T. Taminiau, Nature Communications 12, 3470 (2021).
Merkel et al. (2021) B. Merkel, P. C. Fariña, and A. Reiserer, Physical Review Letters 127, 030501 (2021).
Raha et al. (2020) M. Raha, S. Chen, C. M. Phenicie, S. Ourari, A. M. Dibos, and J. D. Thompson, Nature communications 11, 1605 (2020).
Le Dantec et al. (2021) M. Le Dantec, M. Rančić, S. Lin, E. Billaud, V. Ranjan, D. Flanigan, S. Bertaina, T. Chanelière, P. Goldner, A. Erb, et al., Science advances 7, eabj9786 (2021).
Cheuk et al. (2020) L. W. Cheuk, L. Anderegg, Y. Bao, S. Burchesky, S. S. Yu, W. Ketterle, K.-K. Ni, and J. M. Doyle, Physical review letters 125, 043401 (2020).
Demkowicz-Dobrzański et al. (2012) R. Demkowicz-Dobrzański, J. Kołodyński, and M. Guţă, Nature communications 3, 1063 (2012).
Escher et al. (2011) B. Escher, R. de Matos Filho, and L. Davidovich, Nature Physics 7, 406 (2011).
Chin et al. (2012) A. W. Chin, S. F. Huelga, and M. B. Plenio, Physical review letters 109, 233601 (2012).
Smirne et al. (2016) A. Smirne, J. Kołodyński, S. F. Huelga, and R. Demkowicz-Dobrzański, Physical review letters 116, 120801 (2016).
Kucsko et al. (2018) G. Kucsko, S. Choi, J. Choi, P. C. Maurer, H. Zhou, R. Landig, H. Sumiya, S. Onoda, J. Isoya, F. Jelezko, et al., Physical review letters 121, 023601 (2018).
(53) D. Barskiy, “Dipole-dipole interactions in NMR: explained.” (unpublished).
Choi et al. (2017b) S. Choi, J. Choi, R. Landig, G. Kucsko, H. Zhou, J. Isoya, F. Jelezko, S. Onoda, H. Sumiya, V. Khemani, et al., Nature 543, 221 (2017b).
Sutton and Barto (2018) R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction (MIT press, 2018).
Silver et al. (2017) D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., nature 550, 354 (2017).
Peng et al. (2021) P. Peng, X. Huang, C. Yin, L. Joseph, C. Ramanathan, and P. Cappellaro, arXiv:2102.13161 (2021).
Braunstein and Caves (1994) S. L. Braunstein and C. M. Caves, Physical Review Letters 72, 3439 (1994).
Strobel et al. (2014) H. Strobel, W. Muessel, D. Linnemann, T. Zibold, D. B. Hume, L. Pezzè, A. Smerzi, and M. K. Oberthaler, Science 345, 424 (2014).
Meyer et al. (2021) J. J. Meyer, J. Borregaard, and J. Eisert, NPJ Quantum Information 7, 1 (2021).
Schuld et al. (2019) M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and N. Killoran, Physical Review A 99, 032331 (2019).
(62) D. Panchenko, “Properties of MLE: consistency, asymptotic normality. Fisher information.” (unpublished).
Holevo (1993) A. S. Holevo, Reports on mathematical physics 32, 211 (1993).
Johansson et al. (2012) J. R. Johansson, P. D. Nation, and F. Nori, Computer Physics Communications 183, 1760 (2012).
Wineland et al. (1994) D. J. Wineland, J. J. Bollinger, W. M. Itano, and D. Heinzen, Physical Review A 50, 67 (1994).
Pezzé and Smerzi (2009) L. Pezzé and A. Smerzi, Physical review letters 102, 100401 (2009).
Hyllus et al. (2010) P. Hyllus, O. Gühne, and A. Smerzi, Physical Review A 82, 012337 (2010).
Pezze et al. (2018) L. Pezze, A. Smerzi, M. K. Oberthaler, R. Schmied, and P. Treutlein, Reviews of Modern Physics 90, 035005 (2018).
d’Alessandro (2021) D. d’Alessandro, Introduction to quantum control and dynamics (Chapman and hall/CRC, 2021).
Wang et al. (2016) X. Wang, D. Burgarth, and S. Schirmer, Physical Review A 94, 052319 (2016).
Ramakrishna and Rabitz (1996) V. Ramakrishna and H. Rabitz, Physical Review A 54, 1715 (1996).
Polack et al. (2009) T. Polack, H. Suchowski, and D. J. Tannor, Physical Review A 79, 053403 (2009).
Chen et al. (2017) J. Chen, H. Zhou, C. Duan, and X. Peng, Physical Review A 95, 032340 (2017).
Albertini and D’Alessandro (2018) F. Albertini and D. D’Alessandro, Journal of Mathematical Physics 59, 052102 (2018).
Albertini and D’Alessandro (2021b) F. Albertini and D. D’Alessandro, Systems & Control Letters 151, 104913 (2021b).
Gao et al. (2013) Y. Gao, H. Zhou, D. Zou, X. Peng, and J. Du, Physical Review A 87, 032335 (2013).

Supplemental Material for: Preparation of Metrological States in Dipolar Interacting Spin Systems

Designing the variational circuit

In this section, we discuss how to choose the experimentally realizable elementary gates in the variational sequence in the entangler based on limited quantum resource [Kaubruegger et al., 2019, Kaubruegger et al., 2021].

Entanglement generation gates from two-body interaction Hamiltonian and global rotations

Consider a two-body interaction Hamiltonian:

\displaystyle H_{\text{int}}=\sum_{i<j}V_{ij}\left(J^{\text{I}}S_{zi}S_{zj}+J^{\text{S}}\bm{S}_{i}\cdot\bm{S}_{j}\right).

(S1)

In this Hamiltonian, $\bm{S}=(S_{x},S_{y},S_{z})$ is the vector of spin-1/2 operators, $V_{ij}$ is the interaction strength between spin $i$ and $j$ which depends on their locations, and $J^{\text{I}}(\neq 0),J^{\text{S}}$ are the Ising and symmetric coupling constant respectively.

The elementary gates in each layer of the variational circuit for preparing metrological states (Fig.1(c) main text) include two free evolutions under the interaction Hamiltonian $D(\tau),D(\tau^{\prime})$ , one global rotation along the $x$ -axis $R_{x}(\vartheta)$ and two fixed $\pi/2$ rotations $R_{y}(-\frac{\pi}{2}),R_{y}(\frac{\pi}{2})$ along the $y$ -axis. We define the interaction gate in the $z$ -direction as

\displaystyle D_{z}(\tau)\equiv\exp(-i\tau H_{\text{int}}/\hbar)=\exp[-i\tau\sum_{i<j}V_{ij}\left(J^{\text{I}}S_{zi}S_{zj}+J^{\text{S}}\bm{S}_{i}\cdot\bm{S}_{j}\right)/\hbar].

(S2)

The interaction gates in other directions can be obtained by $\pi/2$ rotations:

	$\displaystyle D_{x,y}(\tau)$	$\displaystyle=R_{y,x}(\pi/2)D_{z}(\tau)R_{y,x}(-\pi/2)$
		$\displaystyle=\exp[-i\tau\sum_{i<j}V_{ij}\left(J^{\text{I}}S_{x,yi}S_{x,yj}+J^{\text{S}}\bm{S}_{i}\cdot\bm{S}_{j}\right)/\hbar].$		(S3)

In Eqs. (S2) (Entanglement generation gates from two-body interaction Hamiltonian and global rotations), the symmetric interaction term stay unchanged because inner product is conserved under global rotation and the ‘direction of interaction’ is only determined by the Ising term. Using these definitions, we simplify the gate set in each layer as

	$\displaystyle\mathcal{U}_{i}$	$\displaystyle=R_{y}\left(\frac{\pi}{2}\right)D\left(\tau^{\prime}_{i}\right)R_{y}\left(-\frac{\pi}{2}\right)R_{x}\left(\vartheta_{i}\right)D\left(\tau_{i}\right)$
		$\displaystyle=D_{x}\left(\tau^{\prime}_{i}\right)R_{x}\left(\vartheta_{i}\right)D_{z}\left(\tau_{i}\right).$		(S4)

In the next two subsections, it will be shown that the sequence in Eq. (Entanglement generation gates from two-body interaction Hamiltonian and global rotations) is the most general gate set that uses only global rotations and preserves the collective spin direction along $x$ -direction.

Preservation of the collective spin direction

Define the x-parity operator $P_{x}\equiv\Pi_{i}^{N}\sigma_{xi}=P_{x}^{\dagger}$ , with $P_{x}^{2}=I$ . This operator describes the parity of a state in $x$ -direction and is related to the global $\pi$ rotation along $x$ -axis up to a phase constant, $R_{x}(\pi)=\exp(-i\pi\sum_{i}S_{xi})=(-i)^{N}\Pi_{i}^{N}\sigma_{xi}$ . Applying the x-parity operator onto individual spin’s angular momentum operator gives $P_{x}S_{\mu j}P_{x}=(\sigma_{x}S_{\mu}\sigma_{x})_{j}=\pm S_{\mu j}$ . Thus the interaction gates along $x$ - and $z$ -direction conserve the x-parity, $P_{x}D_{x,z}P_{x}=D_{x,z}$ . Similarly, the only global rotation that conserves x-parity for arbitrary angles is $R_{x}(\vartheta)$ . Then, based on Eq.(1) in the main text, the unitary operator of the whole control sequence conserves the x-parity

$\displaystyle P_{x}\mathcal{S}(\bm{\theta})P_{x}$	$\displaystyle=P_{x}\mathcal{U}_{m}...\mathcal{U}_{2}\mathcal{U}_{1}P_{x}$
	$\displaystyle=P_{x}\Pi_{i}[D_{x}\left(\tau^{\prime}_{i}\right)R_{x}\left(\vartheta_{i}\right)D_{z}\left(\tau_{i}\right)]P_{x}$
	$\displaystyle=\mathcal{S}(\bm{\theta}).$	(S5)

The initial spin state pointing to the $+x$ -direction is an eigenstate of $P_{x}$ : $P_{x}\ket{\uparrow_{x}}^{\otimes N}=\ket{\uparrow_{x}}^{\otimes N}$ . Thus, any state produced by this variational circuit remains an eigenstate of $P_{x}$ :

$\displaystyle P_{x}\ket{\Psi(\bm{\theta})}$	$\displaystyle=P_{x}\mathcal{S}(\bm{\theta})\ket{\uparrow_{x}}^{\otimes N}$
	$\displaystyle=P_{x}\mathcal{S}(\bm{\theta})P_{x}P_{x}\ket{\uparrow_{x}}^{\otimes N}$
	$\displaystyle=\ket{\Psi(\bm{\theta})}.$	(S6)

Now consider the expectation value of the total spin angular momentum operator $J_{\mu}\equiv\sum_{i}S_{\mu i}$ $(\mu\in\{x,y,z\})$ :

$\displaystyle\langle J_{y,z}\rangle$	$\displaystyle=\bra{\Psi(\bm{\theta})}J_{y,z}\ket{\Psi(\bm{\theta})}$
	$\displaystyle=\bra{\Psi(\bm{\theta})}P_{x}P_{x}J_{y,z}P_{x}P_{x}\ket{\Psi(\bm{\theta})}$
	$\displaystyle=-\langle J_{y,z}\rangle=0.$	(S7)

Thus, the collective spin direction $\langle\bm{J}\rangle/|\langle\bm{J}\rangle|$ always points along the $x$ -direction.

Choosing the most general gate set

To preserve the collective spin direction along $x$ -axis, the global rotation and interaction gates that can be chosen are $R_{x}$ , $D_{x}$ , $D_{\perp}$ where $D_{\perp}$ stands for the interaction gates along any direction perpendicular to the $x$ -direction. Combining $R_{x}$ and $D_{z}$ can generate any $D_{\perp}$ , thus the simplest gate set fulfilling all the requirements is $D_{x}R_{x}D_{z}$ , as described by Eq.(1) in the main text.

The derivations and results in this section about selecting the variational sequence agree with ref.[Kaubruegger et al., 2019]. However, the interaction Hamiltonian we discuss here is more general. In Eq. (S1), when $J^{\text{I}}=1,J^{\text{S}}=0$ , the interaction becomes Ising type interaction which is equivalent to the Rydberg interaction in ref.[Kaubruegger et al., 2019,Kaubruegger et al., 2021]. The Ising interaction can also describe spin systems with large local disorder. The optimization results are shown in the next section. When $J^{\text{I}}=3,J^{\text{S}}=-1$ , Eq. (S1) becomes the dipolar interaction Hamiltonian between spin-1/2 particles. When $J^{\text{I}}=2,J^{\text{S}}=-1$ , it becomes the dipolar interaction Hamiltonian between spin-1 particles (such as NV centers). The simulation results for this case are shown in the next section. When $J^{\text{I}}=1,J^{\text{S}}=-1$ , the interaction can describe the dipolar interaction between cold molecules [Yan et al., 2013].

Optimization algorithm: Covariance Matrix Adaptation Evolution Strategy

The optimization in the $3m$ dimensional parameter space is highly non-convex (Fig.1(d) in main text) due to the large inhomogeneity of the interaction strength. In our setting, the previously used Dividing Rectangles algorithm [Kaubruegger et al., 2019,Kokail et al., 2019] cannot converge to a beyond-SQL result despite large number of iterations. We address this challenge by using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) as our optimization algorithm [Hansen, 2016]. CMA-ES balances the exploration and exploitation process when searching in the parameter space so that convergence is reached after less than approximately 2,000 generations for $N,m\leq 10$ . This corresponds to about $10^{8}$ repetitions of the Ramsey experiment, which can be further reduced if collective measurement observables are measured.

We reduce the complexity of the optimization by restricting $\tau_{i}$ within $[0,1/\bar{f}_{\text{dd}}]$ where $\bar{f}_{\text{dd}}$ is the average nearest-neighbor interaction strength for the considered spin configuration. Setting a large parameter searching range for the interaction gates’ time $\tau_{i}$ would potentially ensure the global maximum CFI location is included in the parameter space. However, when the upper bound of $\tau_{i}$ is much bigger than $1/\bar{f}_{\text{dd}}$ , the evolution of neighboring spin pairs is fast when sweeping $\tau_{i}$ . This would introduce a huge amount of local maximum points in the parameter searching so that it is impractical for the black-box optimization algorithm to converge to that global maximum point.

Optimization results of different types of dipole-dipole interaction Hamiltonian

The magnetic dipole-dipole interaction Hamiltonian under secular approximation has the general form [Kucsko et al., 2018, Barskiy, ]:

\displaystyle H_{\text{dd}}=\sum_{i<j}V_{ij}(2S_{zi}S_{zj}-S_{xi}S_{xj}-S_{yi}S_{yj})

(S8)

with

\displaystyle V_{ij}=\sum_{i<j}\frac{\mu_{0}}{4\pi}\frac{\gamma_{i}\gamma_{j}\hbar^{2}}{\left|\bm{r}_{i}-\bm{r}_{j}\right|^{3}}\frac{(1-3\cos\beta_{ij})}{2}

(S9)

where $\mu_{0}$ is the vacuum permeability, $\gamma$ is the geomagnetic ratio of the spin, $\beta_{ij}$ is the angle between the line segment connecting ( $\bm{r_{i}}$ , $\bm{r_{j}}$ ) and the direction of the bias external magnetic field (along z-direction in this case). Eq. (S8) is able to describe the dipolar interaction for the spin systems with arbitrary spin number as long as the spin angular momentum operators $S_{\mu}$ obey the commutation relation $[S_{i},S_{j}]=i\epsilon_{ijk}S_{k}$ . It applies to the spin-1/2 systems we discussed in the main text and Nitrogen-Vacancy (NV) centers which are spin-1 systems.

NV ensemble

Here we consider NV ensemble and only $\ket{m_{s}=1}$ and $\ket{m_{s}=0}$ are used as a 2-level system. The spin-1 operators are

\displaystyle S_{x}^{(1)}=\frac{1}{\sqrt{2}}\begin{pmatrix}0&1&0\\ 1&0&1\\ 0&1&0\\ \end{pmatrix},S_{y}^{(1)}=\frac{1}{\sqrt{2}}\begin{pmatrix}0&-i&0\\ i&0&-i\\ 0&i&0\\ \end{pmatrix},S_{z}^{(1)}=\begin{pmatrix}1&0&0\\ 0&0&0\\ 0&0&-1\\ \end{pmatrix}.

(S10)

If we only take the $\ket{m_{s}=1}$ , $\ket{m_{s}=0}$ subspace into consideration, the relations between the ‘truncated’ spin-1 operators and the spin-1/2 operators are:

\displaystyle S_{y}^{(1)}=\sqrt{2}S_{y}^{(\frac{1}{2})},S_{y}^{(1)}=\sqrt{2}S_{y}^{(\frac{1}{2})},S_{z}^{(1)}=\frac{I}{2}+S_{x}^{(\frac{1}{2})}

(S11)

Plugging Eq. (S11) into Eq. (S8), we get the effective dipole-dipole interaction Hamiltonian for NV ensemble $\ket{m_{s}=1}$ , $\ket{m_{s}=0}$ subspace [Kucsko et al., 2018,Choi et al., 2017b]:

\displaystyle H_{\text{DD,NV}}=\sum_{i<j}V_{ij}(S^{(\frac{1}{2})}_{zi}S^{(\frac{1}{2})}_{zj}-S^{(\frac{1}{2})}_{xi}S^{(\frac{1}{2})}_{xj}-S^{(\frac{1}{2})}_{yi}S^{(\frac{1}{2})}_{yj})

(S12)

Fig. S1(a) shows the Classical Fisher Information (CFI) optimization results for 2D square lattice spin configuration. They are similar to the results we get in Fig.(2) of the main text for spin-1/2 systems.

Ising type interaction (large local disorder)

When the system has large local disorder, the flip-flop terms in the dipolar interaction Hamiltonian Eq. (S8) are suppressed because of the large energy gap:

$\displaystyle H_{\text{DD,Ising}}$	$\displaystyle=\sum_{i}\delta_{i}S^{(\frac{1}{2})}_{zi}+\sum_{i<j}V_{ij}(2S^{(\frac{1}{2})}_{zi}S^{(\frac{1}{2})}_{zj}-S^{(\frac{1}{2})}_{xi}S^{(\frac{1}{2})}_{xj}-S^{(\frac{1}{2})}_{yi}S^{(\frac{1}{2})}_{yj})$
	$\displaystyle=\sum_{i}\delta_{i}S^{(\frac{1}{2})}_{zi}+\sum_{i<j}2V_{ij}(S^{(\frac{1}{2})}_{zi}S^{(\frac{1}{2})}_{zj}-S^{(\frac{1}{2})}_{+i}S^{(\frac{1}{2})}_{-j}-S^{(\frac{1}{2})}_{-i}S^{(\frac{1}{2})}_{+j})$
	$\displaystyle\approx\sum_{i}\delta_{i}S^{(\frac{1}{2})}_{zi}+\sum_{i<j}2V_{ij}S^{(\frac{1}{2})}_{zi}S^{(\frac{1}{2})}_{zj}.$	(S13)

This location-dependent single-spin energy shift ( $\delta_{i}$ ) can be canceled by spin-echo pulse sequence where the interaction gate $D(\tau)$ needs to be applied:

	$\displaystyle D(\tau)$	$\displaystyle=R_{x}(\pi)\exp[-i\tau H_{\text{DD,Ising}}]R_{x}(\pi)\exp[-i\tau H_{\text{DD,Ising}}]$
		$\displaystyle=\exp[-i\tau\sum_{i<j}2V_{ij}S^{(\frac{1}{2})}_{zi}S^{(\frac{1}{2})}_{zj}].$		(S14)

Eq. (Ising type interaction (large local disorder)) is also valid when the local disorder $\delta_{i}$ is comparable with the interaction strength $V_{ij}$ . If there is local disorder in the dipolar-interacting spin ensemble, applying spin-echo will generate the interaction gate $D(\tau)$ where the local disorder terms are canceled.

The CFI optimization results by using the effective Ising type interaction Hamiltonian $H_{\text{DD,Ising}}=\sum_{i<j}2V_{ij}S^{(\frac{1}{2})}_{zi}S^{(\frac{1}{2})}_{zj}$ is shown in Fig. S1 (b).

From Fig. S1, the CFI results close to the Heisenberg Limit are observed, indicating that the variational method can be applied to different kinds of spins in solid state systems and generate highly entangled state for high-precision quantum metrology. We also observe that for shallow variational circuits, the CFI ‘oscillation’ between even and odd spin numbers only appears when there are flip-flop terms in the Hamiltonian. For Ising type interaction, the ‘oscillation’ disappears.

Optimization results by using $P_{z}^{\text{tot}}$ , $P_{z}^{\pi}$ as measurement bases

The optimization results shown in Fig.2 in the main text are obtain by using $P_{z}$ as the measurement basis for the CFI (cost function) calculation. Although measuring all the diagonal elements in the density matrix of the resulting states provides the maximum information one can get from single-qubit measurement and a large Hilbert space for the optimizer, it leads to an exponentially large ( $2^{N}$ ) experimental repetition number when the CFI needs to be estimated from experimental data. Thus, we test the variational method on two other measurement bases which require less repetitions for readout.

The measurement basis on total spin polarization along $z$ -direction is given by

\displaystyle P_{z}^{\text{tot}}\equiv\ket{J=N/2,J_{z}}\bra{J=N/2,J_{z}}

(S15)

where $J$ is the total spin angular momentum quantum number and $J_{z}$ is the total spin angular momentum projection quantum number that runs from $N/2$ to $-N/2$ . $P_{z}^{\text{tot}}$ has $N+1$ outcomes, so it scales linear with the system size.

The optimization results by using the CFI on $P_{z}^{\text{tot}}$ as cost function are shown in Fig. S2 (a)(b). Surprisingly, compared to the results by using $P_{z}$ , the optimization results from using $P_{z}^{\text{tot}}$ are improved by about a factor of $1.5\sim 2$ for the 3D-random spin configuration. Since all the information one can extract from $P_{z}^{\text{tot}}$ are contained in $P_{z}$ , we attribute this improvement to the simpler parameter space structure that $P_{z}^{\text{tot}}$ provides to the optimizer. Less local maximum points in the parameter space will help the optimizer to converge to a high CFI point, especially when the dimension of the parameter space ( $3m$ ) is large.

Parity of the spin ensemble,

\displaystyle P_{z}^{\pi}\equiv\Pi_{i}^{N}\sigma_{zi},

(S16)

provides a constant ( $2$ ) dimensional outcome space for experimental readout. Improvements are also observed in 2D square lattice and 3D-random spin configurations (Fig. S2(c)(d)).

Supplementary Data

Complete CFI data for Fig.2 in main text

The complete data for dipolar-interacting spin systems’ CFI optimization is shown in this section. Fig. S3(a) shows the 50-cases averaged optimization results for 3D-random spin configurations, the variational circuit layer number $m$ is chosen from 1 to 10. The optimized CFI results are approaching to the Heisenberg Limit (HL) when more layers ( $m$ ) are used. However, when $m>7$ , the CFI results stop increasing. This CFI ‘saturation’ effect might be caused by two reasons. First, when $m$ is larger, the number of the local maximum points in the high dimensional parameter space increases. This could potentially cause the optimizer to stuck in the local maximum point. Sometimes, take $N=7,m=10$ data in Fig. S3(a) as an example, adding more variational layers even leads to a lower CFI optimization result. The ‘local maximum’ problem could be solved by more advanced and powerful optimization algorithms, such as reinforcement learning [Sutton and Barto, 2018; Silver et al., 2017; Peng et al., 2021], and more computational resources. Second, the ‘saturation’ effect reflects the global maximum CFI one can reach, no matter what kind of optimization algorithm is applied. It’s still an open question what is the highest CFI the spin ensemble could reach for a given configuration.

Fig. S3(b)-(d) show the CFI optimization result for 1D chain, 2D square lattice and 2D symmetric cycle spin configurations. The results of regular patterns are better than those of 3D-random pattern.

Required layers to reach given CFI for 2D square lattice

As shown by the schematic on the left, the distances between spin No.4 and spin No.5, 7, and 9 are the same, so the interaction strengths between each pair are the same. Similarly, the distance between spin No.4 and spin No.2, 3, 6, and 8 are the same (smaller). Therefore, the plateau features in Fig. S4 are likely due to this symmetry: adding one more spin to the lattice does not require an extra layer to reach a given percentage of the CFI.

Orders of interaction

Due to the decaying feature ( $\frac{1}{r^{3}}$ ) of dipolar interaction strength, the resulting states might be mainly generated by nearest-neighbor interaction. For studying ‘how much’ interaction is essential for generating the resulting entangled states, we calculate the overlap (state fidelity [Nielsen and Chuang, 2010]) between the original state and the new state, which is generated by using the cutoff Hamiltonian and optimized parameters. A cutoff interaction strength $f_{\text{cutoff}}$ is chosen, and all the pairwise potential $V_{ij}$ smaller than $f_{\text{cutoff}}$ are set to zero in the cutoff Hamiltonian. Fig. S5 shows the relation between the state fidelity $F$ versus $f_{\text{cutoff}}$ . A state fidelity value less than 1 is observed when $f_{\text{cutoff}}$ is set to be equal to the averaged nearest-neighbor interaction strength $f_{\text{dd}}$ . This result reflects higher order interactions in the spins ensemble are utilized for the entangled state generation.

Non-Markovian noise sensing performance

Optimized states with different readout fidelity

We run the optimization with imperfect readout for $N=4$ and $N=10$ 2D square lattice spin configurations. The optimized states resemble GHZ states (high RF), SSS (low RF), CSS (RF close to 50%). For $N=4$ case, the Gaussian state appears for RF lower than 92%, but for $N=10$ case, the Gaussian states appears when RF is about 96%. We expected that for large spin system with finite RF, Gaussian states (e.g. SSS) are advantageous for quantum-enhanced metrology.

Relative experimental parameter table (full)

Table S1: Experimental platforms’ data

System	$T_{2}^{\text{(best)}}$	$T_{2}^{\text{(DD)}}$	$\bar{f}_{\text{dd}}$	$P_{\text{ini}}$	$F_{\text{readout}}$	$\nu$
NV ensemble	$1.58(7)$ s^a	$7.9(2)\mu$ s^b	$35$ kHz^b	$97.5\%$ ^c	$97.5\%$ ^c	$2-4$ ^b
P1 centers	$0.8$ ms^e(DEER)	$4.4\mu$ s^f	$0.7$ kHz^e, $0.92$ MHz^f	$95\%$ ^e	$95\%$ ^e	?
Rare-Earth crystals	$23.2\pm 0.5ms$ ^g	$2.5\mu$ s^h	$1.96$ MHz^h	$97\%$ ⁱ	$94.6\%$ ^j	$2.4\pm 0.1^{\text{g}}$
Cold Molecules	$~{}1$ s^k	$80$ ms^l	$52$ Hz^l, $1.5$ kHz^m	$97\%^{\text{m}}$	$97\%^{\text{m}}$	?

${}^{\text{a}}$ T.H.Taminiau, NComm 2018, ${}^{\text{b}}$ H.Zhou, PRX 2020, B.J.Shields, ${}^{\text{c}}$ M.D.Lukin, PRL 2015, ${}^{\text{d}}$ L.Childress Science 2006
${}^{\text{e}}$ T.H.Taminiau, NComm 2021, ${}^{\text{f}}$ N.Yao, Nature 2021
${}^{\text{e}}$ P.Bertet, Science advances 2021, ${}^{\text{h}}$ A.Reiserer, PRL 2021, ${}^{\text{i}}$ J.Thompson, Science 2020, ${}^{\text{j}}$ J.Thompson, NComm 2020
${}^{\text{k}}$ M.R Tarbutt PRL 2020, ${}^{\text{l}}$ B.Yan, J.Ye, Nature 2013, ${}^{\text{m}}$ J Doyle, PRL 2020

Based on the simulation results shown in Fig.4(c) in main text, we need $\bar{f}_{dd}T_{2}\geq 5$ to generate metrological states that beat the SQL. It’s worth mentioning that the $T_{2}$ in this situation stands for the coherence time without the dipole-dipole interaction’s influence. During the state preparation step, the dipolar interactions between the spins are included in the system Hamiltonian for the entanglement generation ( $D$ gate in Fig.1(c) in main text). Thus, the $T_{2}$ ^(DD) in Table S1 is a lower bound and $T_{2}$ ^(best) is a more precise estimation for the spin coherence time.

Supplementary Derivations

CFI with respect to angle and frequency

In general, the Classical Fisher Information (CFI) measures the sensitivity of a statistical model to small changes of a parameter $\theta$ [Braunstein and Caves, 1994, Strobel et al., 2014]. Let $Z$ be a random variable and $P_{z}(\theta)\equiv P(z|\theta)$ be its probability distribution which depends on $\theta$ . Let $\Theta$ be an unbiased estimator of $\theta$ , i.e.

\displaystyle\theta=\langle\Theta\rangle=\sum_{z}\Theta\cdot P_{z}(\theta).

(S17)

From Eq. (S17) and the fact that the sum of probabilities of all outcomes is $1$ ,

	$\displaystyle 1$	$\displaystyle=\frac{\partial\langle\Theta\rangle}{\partial\theta}=\frac{\partial}{\partial\theta}\sum_{z}\Theta P_{z}(\theta),$		(S18)
	$\displaystyle 0$	$\displaystyle=\frac{\partial}{\partial\theta}\sum_{z}P_{z}(\theta).$		(S19)

Subtracting Eq. (S19) multiplied by $\theta$ from Eq. (S18), we get

$\displaystyle 1$	$\displaystyle=\sum_{z}(\Theta-\theta)\frac{\partial}{\partial\theta}P_{z}(\theta)$
	$\displaystyle=\sum_{z}P_{z}(\theta)(\Theta-\theta)\frac{1}{P_{z}(\theta)}\frac{\partial}{\partial\theta}P_{z}(\theta)$
	$\displaystyle=\langle(\Theta-\theta)\frac{1}{P_{z}(\theta)}\frac{\partial}{\partial\theta}P_{z}(\theta)\rangle$
	$\displaystyle=\langle(\Theta-\theta)\frac{\partial}{\partial\theta}\log P_{z}(\theta)\rangle.$	(S20)

Letting $X=\Theta-\theta$ and $Y=\frac{\partial}{\partial\theta}\log P_{z}(\theta)$ , by the Cauchy-Schwartz inequality for random variables: $\langle XY\rangle^{2}\leq\langle X^{2}\rangle\langle Y^{2}\rangle$ , we have

\displaystyle\langle(\Theta-\theta)^{2}\rangle\bigg{\langle}\left(\frac{\partial}{\partial\theta}\log P_{z}(\theta)\right)^{2}\bigg{\rangle}\geq 1,

(S21)

where

$\displaystyle\langle(\Theta-\theta)^{2}\rangle$	$\displaystyle=\langle\Theta^{2}\rangle-(2\theta\langle\Theta\rangle-\langle\theta^{2}\rangle)$
	$\displaystyle=\langle\Theta^{2}\rangle-(2\theta^{2}-\theta^{2})$
	$\displaystyle=\langle\Theta^{2}\rangle-\langle\Theta\rangle^{2}$
	$\displaystyle=\Delta\Theta^{2}$	(S22)

is the variance of $\Theta$ . Defining

\displaystyle\mathrm{CFI}=\sum_{z}P_{z}(\theta)\left(\frac{\partial}{\partial\theta}\log P_{z}(\theta)\right)^{2},

(S23)

we have

\displaystyle\Delta\Theta^{2}\geq\frac{1}{\mathrm{CFI}}.

(S24)

If the measurement is repeated $M$ times, then by the additive property of CFI, we obtain the Cramér-Rao bound:

\Delta\Theta^{2}\geq\frac{1}{M\cdot\mathrm{CFI}}.

(S25)

In our variational circuit, we use CFI with respect to an infinitesimal angle $\phi$ as the cost function to generate entangled states. In our program, we use a method similar to parameter shift to calculate the $\mathrm{CFI}_{\phi}$ of our optimized states [Cerezo et al., 2021,Meyer et al., 2021,Schuld et al., 2019]. In the following notation,

1.

$z$ represents a multi-qubit state in the z-basis;
2.

$\mathcal{U}(\phi)=e^{-i\phi J_{y}}$ is the rotation operator where $\phi$ is a small angle;
3.

$\psi$ is the state we create from the variational circuit;
4.

$P_{z}(\theta)$ is the probability of measuring the state $z$ with the state after rotation.

Then

$\displaystyle\frac{\partial}{\partial\phi}P_{z}(\phi)\Bigr{\rvert}_{\phi\rightarrow 0}=$	$\displaystyle\frac{\partial}{\partial\phi}\|\langle z\|\mathcal{U}(\phi)\|\psi\rangle\|^{2}\Bigr{\rvert}_{\phi\rightarrow 0}$
$\displaystyle=$	$\displaystyle\frac{\partial}{\partial\phi}\langle\psi\|\mathcal{U}^{\dagger}(\phi)\|z\rangle\langle z\|\mathcal{U}(\phi)\|\psi\rangle\Bigr{\rvert}_{\phi\rightarrow 0}$
$\displaystyle=$	$\displaystyle\langle\psi\|\mathcal{U}^{\dagger}(\phi)iJ_{y}\|z\rangle\langle z\|\mathcal{U}(\phi)\|\psi\rangle\Bigr{\rvert}_{\phi\rightarrow 0}+\langle\psi\|\mathcal{U}^{\dagger}(\phi)\|z\rangle\langle z\|\mathcal{U}(\phi)(-i)J_{y}\|\psi\rangle\Bigr{\rvert}_{\phi\rightarrow 0}$
$\displaystyle=$	$\displaystyle i\bra{\psi}\bigg{(}J_{y}\ket{z}\bra{z}-\ket{z}\bra{z}J_{y}\bigg{)}\ket{\psi}.$	(S26)

Note that assuming the rotation operator $\mathcal{U}(\phi)=e^{-i\phi J_{y}}\equiv\mathcal{U}_{y}(\phi)$ along $y$ -axis is for calculation simplicity. In experiments, the signal (e.g. the external B-field) usually induces a rotation along $z$ -axis, $\mathcal{U}_{z}(\phi)=e^{-i\phi J_{z}}$ . It’s equivalent to assume that the prepared state is firstly rotated by a $R_{x}(\pi/2)$ pulse and then accumulates a signal $\phi$ along $y$ -axis, or firstly accumulates a signal along $z$ -axis and then rotated by $R_{x}(-\pi/2)$ pulse [Strobel et al., 2014]. In another word, $R_{x}(-\pi/2)\mathcal{U}_{z}(\phi)=\mathcal{U}_{y}(\phi)R_{x}(\pi/2)$ , so the signal accumulation process we assumed in the calculation is able to simulate the experiments.

After creating the entangled states, we want to know how useful they are in a Ramsey spectroscopy, where the signal we want to detect is a frequency $\omega$ . By the same calculation as above except the difference that we take derivative with respect to $\omega=\frac{\phi}{t_{\text{R}}}$ where $t_{\text{R}}$ is the Ramsey sensing time, we have

\displaystyle\mathrm{CFI}_{\omega}=\mathrm{CFI}_{\phi}\cdot t_{\text{R}}^{2}.

(S27)

Relation beteen $\text{CFI}_{\omega}$ and SNR in single qubit Ramsey experiment

We illustrate the Ramsey protocol for a single qubit.

1.

The qubit is initialized into the ground state $\ket{0}$ .
2.

A $\frac{\pi}{2}$ pulse along the y-direction is applied to transform it into a superposition state $\frac{1}{\sqrt{2}}(\ket{0}+\ket{1})$ . Its matrix form is

$\displaystyle\rho(t)=\frac{1}{2}\begin{pmatrix}1&1\\ 1&1\end{pmatrix}.$ (S28)

After evolving under noise and a signal with frequency $\omega$ for time $t$ , its state becomes

\displaystyle\rho(t)=\frac{1}{2}\begin{pmatrix}1&e^{-i\omega t}e^{-2\gamma t}\\ e^{i\omega t}e^{-2\gamma t}&1\end{pmatrix}

(S29)

where $\gamma$ is the decoherence rate.

4.

A second $\frac{\pi}{2}$ pulse along the x-direction is applied for readout. The qubit is then in the state

$\displaystyle R_{x}\left(\frac{\pi}{2}\right)\rho(t)R_{x}^{\dagger}\left(\frac{\pi}{2}\right)$ (S30)

.
5.

After the rotation, the probability of the qubit being in the ground state is

$\displaystyle P_{0}=\frac{1}{2}+\frac{1}{2}e^{-2\gamma t}\sin{\omega t}.$ (S31)

The CFI with respect to $\omega$ is

\displaystyle\mathrm{CFI}_{\omega}

\displaystyle=\frac{1}{P_{0}}\left(\frac{\partial P_{0}}{\partial\omega}\right)^{2}+\frac{1}{P_{1}}\left(\frac{\partial P_{1}}{\partial\omega}\right)^{2}=\frac{t^{2}\cos^{2}{\omega t}}{e^{4\gamma t}-\sin^{2}{\omega t}}.

(S32)

Assuming only quantum projection noise, the Signal-to-Noise Ratio (SNR) is $\frac{\delta P_{0}}{\sqrt{\frac{1}{M}P_{0}(1-P_{0})}}$ where $M$ is the total number of measurements. Then

\displaystyle\mathrm{SNR}^{2}=\frac{Mt^{2}\cos^{2}{\omega t}\delta\omega^{2}}{e^{4\gamma t}-\sin^{2}{\omega t}}.

(S33)

Assuming no time overhead, i.e., $M=\frac{T_{\text{tot}}}{t_{\text{R}}}$ where $T_{\text{tot}}$ is the total measurement time and $t_{\text{R}}$ is the time between Ramsey pulses, we obtain the relationship

\mathrm{CFI}_{\omega}\cdot\frac{T_{\text{tot}}}{t_{\text{R}}}\cdot\delta\omega^{2}=\mathrm{SNR}^{2}.

(S34)

In unit time ( $T_{\text{tot}}=1$ ), when SNR $=1$ , the smallest signal we can measure is

\displaystyle\delta\omega=\frac{1}{\sqrt{M\cdot\mathrm{CFI}_{\omega}}},

(S35)

leading to the saturated Cramér-Rao bound.

Maximum Likelihood Estimator

Since a measurement collapses a quantum state to an eigenstate of the observable, it’s impossible to directly measure $P(\theta)$ . In experiments, we repeat the sequence to obtain the results for estimating the $P(\theta)$ and then get an estimate value of $\theta$ . To understand the relation between the variance of the estimation and CFI, we introduce the Maximum Likelihood Estimator (MLE), which has asymptotic properties to saturate the Cramér-Rao bound. Below we summarize the proof given in [Panchenko, ].

Let $\bm{X}=\{X_{1},X_{2},...,X_{M}\}$ be a collection of independent and identically distributed (i.i.d.) random variables with a parametric family of probability distributions $\{P(X|\theta)|\theta\in\Theta\}$ , where $\theta$ is an unknown parameter and $\Theta$ is the parameter space. Let $\bm{x}=\{x_{1},x_{2},...,x_{M}|x_{i}\in X_{i}\}$ be the experimental data set from $M$ repetitions. The goal is to estimate $\theta$ (the signal we want to measure) from $\bm{x}$ , i.e., find $\theta$ that is most likely to produce the outcome $\bm{x}$ . Thus, the normalized log-likelihood function is defined as

\displaystyle L_{M}(\theta)

\displaystyle=\frac{1}{M}\log P(\bm{X}|\theta)=\frac{1}{M}\log\prod_{i=1}^{M}P(X_{i}|\theta)=\frac{1}{M}\sum_{i=1}^{M}\log P(X_{i}|\theta).

(S36)

A MLE maximizes the log-likelihood function

\Theta_{\mathrm{MLE}}=\operatorname*{argmax}_{\theta\in\Theta}L_{M}(\theta).

(S37)

In the following, we first show that

1.

$\Theta_{\mathrm{MLE}}$ converges to the true parameter $\theta_{0}$ ;
2.

the distribution of $\sqrt{M}(\Theta_{\mathrm{MLE}}-\theta_{0})$ tends to a normal distribution $\mathcal{N}\left(0,\frac{1}{\mathrm{CFI}_{\theta_{0}}}\right)$ as $M$ increases.

In other words, not only does the MLE converge to the true parameter, it converges at a rate $\frac{1}{\sqrt{M}}$ .

Define

\displaystyle L(\theta)=\langle\log P(\bm{X}|\theta)\rangle_{\theta_{0}}

(S38)

which denotes the expected log-likelihood function with respect to $\theta_{0}$ , then by the Weak Law of Large Numbers (WLLN), the average outcomes from a large number of trials should approach the expected value:

\forall\theta,L_{M}(\theta)\xrightarrow{\mathit{M\rightarrow\infty}}L(\theta).

(S39)

In fact, $\theta_{0}$ maximizes $L(\theta)$ :

$\displaystyle\forall\theta,L(\theta)-L(\theta_{0})$	$\displaystyle=\langle\log P(\bm{X}\|\theta)\rangle_{\theta_{0}}-\langle\log P(\bm{X}\|\theta_{0})\rangle_{\theta_{0}}$
	$\displaystyle=\bigg{\langle}\log\frac{P(\bm{X}\|\theta)}{P(\bm{X}\|\theta_{0})}\bigg{\rangle}_{\theta_{0}}$
	$\displaystyle\leq\bigg{\langle}\frac{P(\bm{X}\|\theta)}{P(\bm{X}\|\theta_{0})}-1\bigg{\rangle}_{\theta_{0}}$
	$\displaystyle=\sum_{\bm{x}\in\bm{X}}\left(\frac{P(\bm{x}\|\theta)}{P(\bm{x}\|\theta_{0})}-1\right)P(\bm{x}\|\theta_{0})$
	$\displaystyle=1-1=0.$	(S40)

Moreover, we show that $\theta_{0}$ is the unique maximizer. Jensen’s inequality states that for a strictly convex function $f$ and a random variable $Y$ ,

\displaystyle\langle f(Y)\rangle>f(\langle Y\rangle).

(S41)

Taking $f(y)=-\log y$ and $P(\bm{X}|\theta)\neq P(\bm{X}|\theta_{0})$ , we have

\bigg{\langle}-\log\frac{P(\bm{X}|\theta)}{P(\bm{X}|\theta_{0})}\bigg{\rangle}_{\theta_{0}}>-\log\langle\frac{P(\bm{X}|\theta)}{P(\bm{X}|\theta_{0})}\bigg{\rangle}_{\theta_{0}}=0,

(S42)

L(\theta_{0})>L(\theta).

(S43)

Therefore, since

1.

$\Theta_{\mathrm{MLE}}$ maximizes $L_{M}(\theta)$ ,
2.

$\theta_{0}$ maximizes $L(\theta)$ , and
3.

$L_{M}(\theta)\xrightarrow{\mathit{M\rightarrow\infty}}L(\theta)$ ,

$\Theta_{\mathrm{MLE}}$ converges to $\theta_{0}$ .

Now we use this property to prove that the distribution of $\Theta_{\mathrm{MLE}}$ tends to the desired normal distribution, where we will apply the Central Limit Theorem (CLT): Suppose $\bm{X}=\{X_{1},...,X_{M}\}$ is a sequence of i.i.d. random variables with $\langle X_{i}\rangle=\mu$ and $\mathrm{Var}(X_{i})=\sigma^{2}<\infty$ . Then as $M\rightarrow\infty$ , the random variable $\sqrt{M}(\bar{\bm{X}}-\mu)$ converges in distribution to a normal $\mathcal{N}(0,\sigma^{2})$ .

We start with the Mean Value Theorem for the function $L_{M}^{\prime}$ , the derivative of $L_{M}$ (continuous by assumption), on the interval $[\Theta_{\mathrm{MLE}},\theta_{0}]$ :

		$\displaystyle 0=L_{M}^{\prime}(\Theta_{\mathrm{MLE}})=L_{M}^{\prime}(\theta_{0})+L_{M}^{\prime\prime}(\theta_{1})(\theta_{0}-\Theta_{\mathrm{MLE}})$
		$\displaystyle\implies\theta_{0}-\Theta_{\mathrm{MLE}}=-\frac{L_{M}^{\prime}(\theta_{0})}{L_{M}^{\prime\prime}(\theta_{1})}$
		$\displaystyle\implies\sqrt{M}(\theta_{0}-\Theta_{\mathrm{MLE}})=-\sqrt{M}\frac{L_{M}^{\prime}(\theta_{0})}{L_{M}^{\prime\prime}(\theta_{1})}$		(S44)

for some $\theta_{1}\in[\Theta_{\mathrm{MLE}},\theta_{0}]$ . We analyze the numerator and denominator respectively. The numerator

$\displaystyle L_{M}^{\prime}(\theta_{0})=$	$\displaystyle\frac{1}{M}\sum_{i=1}^{M}\left(\log P(X_{i}\|\theta_{0})\right)^{\prime}$
$\displaystyle=$	$\displaystyle\frac{1}{M}\sum_{i=1}^{M}\left(\log P(X_{i}\|\theta_{0})\right)^{\prime}-L^{\prime}(\theta_{0})$
$\displaystyle=$	$\displaystyle\frac{1}{M}\sum_{i=1}^{M}\left(\log P(X_{i}\|\theta_{0})\right)^{\prime}-\langle\left(\log P(\bm{X}\|\theta_{0})\right)^{\prime}\rangle_{\theta_{0}}$
$\displaystyle=$	$\displaystyle\frac{1}{M}\left(\sum_{i=1}^{M}\log P(X_{i}\|\theta_{0})\right)^{\prime}-\langle\left(\log P(X_{i}\|\theta_{0})\right)^{\prime}\rangle_{\theta_{0}}$	(S45)

where the last equality is obtained from the linearity of expected value and derivative. By the CLT, the distribution of $\sqrt{M}L_{M}^{\prime}(\theta_{0})$ converges to

\mathcal{N}\bigg{(}0,\mathrm{Var}_{\theta_{0}}(\log P(X_{i}|\theta_{0}))^{\prime}\bigg{)}

(S46)

where the variance

$\displaystyle\mathrm{Var}_{\theta_{0}}$	$\displaystyle(\log P(X_{i}\|\theta_{0}))^{\prime}=\langle[(\log P(X_{i}\|\theta_{0}))^{\prime}]^{2}\rangle_{\theta_{0}}-\langle(\log P(X_{i}\|\theta_{0})^{\prime}\rangle^{2}_{\theta_{0}}$
	$\displaystyle=\sum_{x\in X_{1}}P(x\|\theta_{0})\left(\frac{P^{\prime}(x\|\theta_{0})}{P(x\|\theta_{0})}\right)^{2}-(L^{\prime}(\theta_{0}))^{2}$
	$\displaystyle=\mathrm{CFI}_{\theta_{0}}$	(S47)

by the definition of CFI and that $\theta_{0}$ maximizes $L(\theta)$ . By the consistency property, $\Theta_{\mathrm{MLE}}$ converges to $\theta_{0}$ , and thus $\theta_{1}$ converges to $\theta_{0}$ . The denominator

\displaystyle L_{M}^{\prime\prime}(\theta_{1})

\displaystyle\rightarrow L_{M}^{\prime\prime}(\theta_{0})=\frac{1}{M}\sum_{i=1}^{M}[\log P(X_{i}|\theta_{0})]^{\prime\prime}\rightarrow\langle[\log P(X_{1}|\theta_{0})]^{\prime\prime}\rangle_{\theta_{0}}

(S48)

by the WLLN. We further show that Eq. (S48) is in fact the additive inverse of CFI:

$\displaystyle\langle[\log P(X_{1}$	$\displaystyle\|\theta_{0})]^{\prime\prime}\rangle_{\theta_{0}}=\bigg{\langle}\frac{\partial^{2}}{\partial\theta^{2}}\log P(X_{1}\|\theta_{0})\bigg{\rangle}_{\theta_{0}}$
	$\displaystyle=\sum_{x\in X_{1}}[\log P(x\|\theta_{0})]^{\prime\prime}P(x\|\theta_{0})$
	$\displaystyle=\sum_{x\in X_{1}}\left(\frac{P^{\prime\prime}(x\|\theta_{0})}{P(x\|\theta_{0})}-\left(\frac{P^{\prime}(x\|\theta_{0})}{P(x\|\theta_{0})}\right)^{2}\right)P(x\|\theta_{0})$
	$\displaystyle=\sum_{x\in X_{1}}P^{\prime\prime}(x\|\theta_{0})-\sum_{x\in X_{1}}\frac{(P^{\prime}(x\|\theta_{0}))^{2}}{P(x\|\theta_{0})}$
	$\displaystyle=0-\mathrm{CFI}_{\theta_{0}}=-\mathrm{CFI}_{\theta_{0}}.$	(S49)

Finally, Eq. (Maximum Likelihood Estimator) becomes

	$\displaystyle\sqrt{M}(\theta_{0}-\Theta_{\mathrm{MLE}})$	$\displaystyle\overset{p}{\to}\mathcal{N}\left(0,\frac{\mathrm{CFI}_{\theta_{0}}}{\mathrm{CFI}_{\theta_{0}}^{2}}\right)=\mathcal{N}\left(0,\frac{1}{\mathrm{CFI}_{\theta_{0}}}\right)$
	$\displaystyle\implies\Theta_{\mathrm{MLE}}$	$\displaystyle\overset{p}{\to}\mathcal{N}\left(\theta_{0},\frac{1}{M\cdot\mathrm{CFI}_{\theta_{0}}}\right)$		(S50)

Thus, the MLE is asymptotically unbiased and saturates the Cramér-Rao bound.

Master equation for a non-Markovian environment

To simulate the performance of our optimized states during the Ramsey measurement with non-Markovian noise, we use a time-local master equation given by [Smirne et al., 2016]. A brief summary of the derivation is given below.

1.

Let $\mathcal{L}(\mathbb{C}^{d})$ be the Hilbert space of linear operators acting on $\mathbb{C}^{d}$ , where the inner product is defined as $\langle\sigma,\tau\rangle=\text{Tr}(\sigma^{\dagger}\tau)$ (the Hilbert-Schmidt inner product).
2.

Let $\mathcal{LL}(\mathbb{C}^{d})$ be the Hilbert space of linear operators acting on $\mathcal{L}(\mathbb{C}^{d})$ which has dimension $d^{2}\times d^{2}$ . Let $\{l_{i}\}_{i=1,...,d^{2}}$ be an orthonormal basis of $\mathcal{LL}(\mathbb{C}^{d})$ . Then the action of $\Lambda\in\mathcal{LL}(\mathbb{C}^{d})$ on $\tau\in\mathcal{L}(\mathbb{C}^{d})$ can be expressed as

$\displaystyle\Lambda[\tau]=\sum_{ij=1}^{d^{2}}\langle l_{i},\Lambda[l_{j}]\rangle\langle l_{j},\tau\rangle l_{i}.$ (S51)

Thus, $\Lambda$ has a unique correspondence with the matrix $\mathsf{\Lambda}$ with entries

$\displaystyle\mathsf{\Lambda}_{ij}\equiv\langle l_{i},\Lambda[l_{j}]\rangle.$ (S52)
3.

$\Lambda\in\mathcal{LL}(\mathbb{C}^{d})$ is trace- and hermicity-preserving if and only if its matrix representation $\mathsf{\Lambda}$ can be written as

$\displaystyle\begin{pmatrix}1&\bm{0}\\ \mathsf{\bm{m}}&\mathsf{\bm{M}}\end{pmatrix},$ (S53)

where $\bm{0}$ is the zero row vector of length $d^{2}-1$ , $\mathsf{\bm{m}}$ is a real column vector of length $d^{2}-1$ , and $\mathsf{\bm{M}}$ is a $(d^{2}-1)(d^{2}-1)$ real matrix.
4.

For a single qubit, any operator $\rho$ on $\mathbb{C}^{2}$ can be written as

$\displaystyle\rho=\frac{1}{2}(\bm{I}+\bm{v}\cdot\bm{\sigma})$ (S54)

where $\bm{v}$ is a three-dimensional real vector and $\bm{\sigma}$ is the vector of Pauli matrices. Then a map $\Lambda$ whose matrix representation is given by Eq. (S53) acting on $\rho$ gives

$\displaystyle\Lambda[\rho]=\frac{1}{2}(\bm{I}+(\mathsf{\bm{m}}+\mathsf{\bm{M}}\bm{v})\cdot\bm{\sigma}).$ (S55)
5.

The noisy evolution of a state $\rho$ is described by

$\displaystyle\rho(t)=\Lambda(t)[\rho(0)].$ (S56)

The time local master equation satisfies

$\displaystyle\frac{d}{dt}\rho(t)=\Xi(t)[\rho(t)].$ (S57)

So

$\displaystyle\Xi(t)=\frac{d\Lambda(t)}{dt}\circ\Lambda(t)^{-1}$ (S58)

with the corresponding matrix representation

$\displaystyle\mathsf{\Xi}(t)=\frac{d\mathsf{\Lambda}(t)}{dt}\mathsf{\Lambda}(t)^{-1}.$ (S59)

Consider the evolution of one qubit described by $\Lambda(t)=\mathcal{U}(t)\circ\Gamma(t)$ . $\mathcal{U}(t)$ is defined as

\displaystyle\mathcal{U}(t)[\rho(0)]\equiv U(t)\rho(0)U^{\dagger}(t)

(S60)

where $U(t)=e^{-i\frac{\omega t}{2}\sigma_{z}}$ represents the signal accumulation. By Eq. (S51) and Eq. (S60), the matrix representation of $\mathcal{U}(t)$ is

\displaystyle\mathsf{U}(t)=\begin{pmatrix}1&0&0&0\\ 0&\cos{\omega t}&-\sin{\omega t}&0\\ 0&\sin{\omega t}&\cos{\omega t}&0\\ 0&0&0&1\end{pmatrix}.

(S61)

$\Gamma(t)$ represents the noise which is trace- and hermicity- preserving, i.e., has the form in Eq. (S53).

Solving the commutation relation that gives phase covariant qubit map [Smirne et al., 2016,Holevo, 1993]

\displaystyle[\mathcal{U}(t),\Gamma(t)]=0\iff[\mathsf{U}(t),\mathsf{\Gamma}(t)]=0,

(S62)

we obtain the matrix representation of $\Lambda(t)$ :

\displaystyle\mathsf{\Lambda}(t)=\begin{pmatrix}1&0&0&0\\ 0&\eta_{\perp}(t)\cos{\omega t}&-\eta_{\perp}(t)\sin{\omega t}&0\\ 0&\eta_{\perp}(t)\sin{\omega t}&\eta_{\perp}(t)\cos{\omega t}&0\\ \kappa(t)&0&0&\eta_{\parallel}(t)\end{pmatrix},

(S63)

where $\mathsf{\bm{m}}=(0,0,\kappa(t))^{T}$ describes a translation along the $z$ -axis, and $\mathsf{\bm{M}}=\begin{pmatrix}\eta_{\perp}(t)\cos{\omega t}&-\eta_{\perp}(t)\sin{\omega t}&0\\ \eta_{\perp}(t)\sin{\omega t}&\eta_{\perp}(t)\cos{\omega t}&0\\ 0&0&\eta_{\parallel}(t)\end{pmatrix}$ describes a rotation along the $z$ -axis and a contraction characterized by $\eta_{\perp}$ and $\eta_{\parallel}$ .

By Eq. (S59), we obtain the time-local master equation for a single qubit:

$\displaystyle\Xi(t)[\rho(t)]=$	$\displaystyle-\frac{i}{2}\omega[\sigma_{z},\rho(t)]$
	$\displaystyle+\gamma_{+}(t)(\sigma_{+}\rho(t)\sigma_{-}-\frac{1}{2}\{\sigma_{-}\sigma_{+},\rho(t)\})$
	$\displaystyle+\gamma_{-}(t)(\sigma_{-}\rho(t)\sigma_{+}-\frac{1}{2}\{\sigma_{+}\sigma_{-},\rho(t)\})$
	$\displaystyle+\gamma_{z}(t)(\sigma_{z}\rho(t)\sigma_{z}-\rho(t)),$	(S64)

where

$\displaystyle\gamma_{+}(t)$	$\displaystyle=\frac{1}{2}\left(\kappa^{\mathrm{{}^{\prime}}}(t)-\frac{\eta^{\mathrm{{}^{\prime}}}_{\mathrm{\parallel}}(t)}{\eta_{\parallel}(t)}(\kappa(t)+1)\right),$	(S65)
$\displaystyle\gamma_{-}(t)$	$\displaystyle=-\frac{1}{2}\left(\kappa^{\mathrm{{}^{\prime}}}(t)+\frac{\eta^{\mathrm{{}^{\prime}}}_{\mathrm{\parallel}}(t)}{\eta_{\parallel}(t)}(1-\kappa(t))\right),$	(S66)
$\displaystyle\gamma_{z}(t)$	$\displaystyle=\frac{1}{4}\left(\frac{\eta^{\mathrm{{}^{\prime}}}_{\mathrm{\parallel}}(t)}{\eta_{\parallel}(t)}-2\frac{\eta^{\mathrm{{}^{\prime}}}_{\mathrm{\perp}}(t)}{\eta_{\perp}(t)}\right).$	(S67)

Considering only $T_{2}$ noise, $\gamma_{-}(t)=\gamma_{+}(t)=0$ , $\eta_{\parallel}$ is constant, and

\displaystyle\eta_{\perp}(t)=e^{-(\frac{t}{T_{2}})^{\nu}},

(S68)

where $\nu$ is the stretch character which equals $1$ for Markovian noise. Then

\displaystyle\gamma_{z}(t)=\frac{\nu}{2}\frac{t^{\nu-1}}{T_{2}^{\nu}}.

(S69)

We further need to express $\Xi(t)$ as a superoperator acting on the vectorization of $\rho(t)$ . Defining the vectorization of a matrix as the map

\displaystyle\rho=\sum_{i,j}\rho_{ij}\ket{i}\bra{j}\mapsto\ket{\rho}=\sum_{i,j}\rho_{ij}\ket{j}\otimes\ket{i}.

(S70)

Define the left and right multiplication superoperators by $\mathcal{L}(A)[\rho]=A\rho$ and $\mathcal{R}(A)[\rho]=\rho A$ so that $[A,\rho]=\mathcal{L}(A)[\rho]-\mathcal{R}(A)[\rho]$ . By this definition, we can calculate the matrix representation $\mathcal{L}(A)=I\otimes A$ and $\mathcal{R}(A)=A^{T}\otimes I$ . Using the superoperator notation, we can express $\Xi(t)$ as

\displaystyle\Xi(t)=

\displaystyle-\frac{i}{2}\omega(I\otimes\sigma_{z}-\sigma_{z}\otimes I)+\gamma_{z}(t)(\sigma_{z}\otimes\sigma_{z}-I\otimes I).

(S71)

With this expression, we numerically simulate the evolution of our entangled states under non-Markovian noise by using the Time-Dependent Master Equation Solver in QuTip [Johansson et al., 2012].

Performance of metrological states in a non-Markovian environment

To calculate the derivative of probability with respect to $\omega$ in the calculation for $\mathrm{CFI}_{\omega}$ , we use a method similar to parameter shift that utilizes the property that the signal accumulation operator ( $\mathcal{U}(\omega)=e^{-i\omega tJ_{z}}$ ) and the noisy operator commutes. In the following notation,

1.

$z$ represents a multi-qubit state in the $z$ basis;
2.

$\mathcal{U}(\omega)$ is the effective signal accumulation operator: $\mathcal{U}(\omega)=e^{-i\omega tJ_{y}}$ ;
3.

$\rho$ is the state density matrix of our optimized state after the noisy evolution without signal for some Ramsey time and a $\frac{\pi}{2}$ pulse along the $x$ direction (here we switch the order of the signal accumulation and the second pulse of the Ramsey protocol [Strobel et al., 2014]);
4.

$P(z|\omega)$ is the probability of measuring the state $z$ with our rotated optimized state after the noisy evolution and signal accumulation.

Then

$\displaystyle\frac{\partial}{\partial\omega}P(z\|\omega)\Bigr{\rvert}_{\omega\rightarrow 0}$	$\displaystyle=\frac{\partial}{\partial\omega}\mathrm{Tr}[\ket{z}\bra{z}\mathcal{U}(\omega)\rho\mathcal{U}^{\dagger}(\omega)]\Bigr{\rvert}_{\omega\rightarrow 0}$
	$\displaystyle=\frac{\partial}{\partial\omega}\mathrm{Tr}[\mathcal{U}^{\dagger}(\omega)\ket{z}\bra{z}\mathcal{U}(\omega)\rho]\Bigr{\rvert}_{\omega\rightarrow 0}$
	$\displaystyle=\mathrm{Tr}[\frac{\partial}{\partial\omega}\mathcal{U}^{\dagger}(\omega)\ket{z}\bra{z}\mathcal{U}(\omega)\rho]\Bigr{\rvert}_{\omega\rightarrow 0}+\mathrm{Tr}[\mathcal{U}^{\dagger}(\omega)\ket{z}\bra{z}\frac{\partial}{\partial\omega}\mathcal{U}(\omega)\rho]\Bigr{\rvert}_{\omega\rightarrow 0}$
	$\displaystyle={it\mathrm{Tr}[(J_{y}\ket{z}\bra{z}-\ket{z}\bra{z}J_{y})\rho]}.$	(S72)

From Eq. (S34), since $T_{\text{tot}}$ and $\delta\omega^{2}$ are constants, SNR is proportional to $\sqrt{\frac{\mathrm{CFI}_{\omega}}{t_{\text{R}}}}$ . Thus we choose $\frac{\mathrm{CFI}_{\omega}}{t_{\text{R}}}$ as the result we show in Fig.4(d) in the main text.

Time Overhead

In experiments, the time overhead, including the state preparation and readout time, reduces the repetition number of the sensing sequence and thus decreases the sensitivity. If we consider a nonzero time overhead, i.e., $M=\frac{T_{\text{tot}}}{t_{\text{R}}+t_{\text{oh}}}$ , the expression for $\mathrm{SNR}^{2}$ for an uncorrelated spin state becomes

\displaystyle\mathrm{SNR}^{2}=\frac{T_{\text{tot}}t_{\text{R}}^{2}\cos^{2}(\omega t_{\text{R}})\delta\omega^{2}}{(t_{\text{R}}+t_{\text{oh}})\left(e^{2\left(\frac{t_{\text{R}}}{T_{2}}\right)^{\nu}}-\sin^{2}(\omega t_{\text{R}})\right)}.

(S73)

If $t_{\text{oh}}>>t_{\text{R}}$ , we ignore the term $t_{\text{R}}$ in the denominator and

\displaystyle\mathrm{SNR}^{2}\propto\frac{t_{\text{R}}^{2}}{e^{2\left(\frac{t_{\text{R}}}{T_{2}}\right)^{\nu}}}.

(S74)

Taking the derivative of Eq. (S74) with respect to $t_{\text{R}}$ gives us the best $t_{\text{R}}$ if the time overhead is significantly larger:

\displaystyle t_{\text{R}}=\frac{T_{2}}{\nu^{\frac{1}{\nu}}}.

(S75)

Similarly, the same calculations for a GHZ state where the decay term in Eq. (S73) becomes $e^{2n\left(\frac{t_{\text{R}}}{T_{2}}\right)^{\nu}}$ show that the best Ramsey sensing time is

\displaystyle t_{\text{R}}=\frac{T_{2}}{(n\nu)^{\frac{2}{\nu}}}.

(S76)

Plugging Eq. (S75) and Eq. (S76) into Eq. (S74), we find that the ratio of the $\mathrm{SNR}^{2}$ of a GHZ state to that of an uncorrelated spin state is $n^{1-\frac{2}{\nu}}$ . Thus, only when

\displaystyle\nu>2

(S77)

do GHZ states provide an advantage in SNR over uncorrelated spin states when $t_{\text{oh}}>>t_{\text{R}}$ . We compare the SNR of the states generated by the optimizer with that of the CSS and GHZ states when $\nu=2,3,4$ . Fig. S8 shows that when we assume a long time overhead, the generated entangled states are less sensitive than CSS when $\nu=2$ and $\nu=3$ for large spin numbers.

State preparation time comparing to adiabatic method

State preparation time is one of the major components of the time overhead in the generalized Ramsey sensing sequence which influences the sensitivity. The state preparation time of the variational method depends on the circuit layer number $m$ , system size $N$ and is proportional to the inverse of average interaction strength $1/\bar{f}_{\text{dd}}$ . The adiabatic method [Cappellaro and Lukin, 2009] is an alternative approach to generate entangled states for quantum metrology in dipolar-interacting spin systems by only using single-qubit rotations (global pulses).

To compare the performance of our variational method with the adiabatic method, we derive the relation between the squeezing parameter (Wineland parameter [Wineland et al., 1994]) and CFI. Without loss of generality, we consider a SSS with collective spin direction $+x$ and is squeezed along the $y$ -axis (such as the 3rd Wigner distribution shown in Fig. S7(b)). In this case, the squeezing parameter is

\displaystyle\xi^{2}=N\frac{(\Delta J_{y})^{2}}{|\langle J_{x}\rangle|^{2}},

(S78)

where $(\Delta O)^{2}\equiv\langle O^{2}\rangle-\langle O\rangle^{2}$ and $N$ is the number of spins. According to the uncertainty principle,

\displaystyle(\Delta J_{y})^{2}(\Delta J_{z})^{2}\geq\frac{1}{4}|\langle J_{x}\rangle|^{2}.

(S79)

The relation between the squeezing parameter and total spin angular momentum uncertainty projection in $z$ -direction is

\displaystyle 4(\Delta J_{z})^{2}\geq\frac{|\langle J_{x}\rangle|^{2}}{(\Delta J_{y})^{2}}=N/\xi^{2}.

(S80)

It’s been proven that for a pure Gaussian state, the quantum Fisher information (QFI) is directly related to the variance of the projected spin angular momentum [Braunstein and Caves, 1994,Pezzé and Smerzi, 2009,Hyllus et al., 2010]:

\displaystyle\text{QFI}=4(\Delta J_{z})^{2}.

(S81)

Combining Eq. (S80) and Eq. (S81), we obtain the relation between CFI and squeezing parameter of a SSS:

\displaystyle\text{CFI}\leq\text{QFI}\geq N/\xi^{2}.

(S82)

The first inequality in Eq. (S82) is saturated by measuring the SSS along the direction where it is squeezed ( $y$ -axis, or equivalently measuring it in $z$ -basis after applying a $R_{x}(\frac{\pi}{2})$ pulse [Pedrozo-Peñafiel et al., 2020]). The second inequality originates from the uncertainty principle (Eq. (S79)). Since the optimal SSS saturate the Heisenberg uncertainty relation [Pezze et al., 2018] and the SSS generated by the adiabatic method [Cappellaro and Lukin, 2009] belongs to these states, we obtain the relation between the squeezing parameter and CFI

\displaystyle\text{CFI}=N/\xi^{2}.

(S83)

Based on the data shown in Fig.3 from ref. (Cappellaro and Lukin, 2009), it takes about $200\mu\text{s}$ for the adiabatic method to prepare an 8-spin SSS with $\xi^{2}=0.4$ which corresponds to $\text{CFI}=20$ . The 2D spin density $8/(30\text{nm}\times 30\text{nm})$ corresponds to $\bar{f}_{\text{dd}}=43.5\text{kHz}$ . According to Fig.3(d) in the main text, the variational method is able to prepare an 8-spin entangled state with $\text{CFI}\approx 20$ by a 4-layer circuit with $\bar{f}_{\text{dd}}T=0.8$ . Plugging in the same average nearest neighbor dipolar interaction strength $\bar{f}_{\text{dd}}$ , we finally calculated the state preparation time of the variational method is $T=18.4\mu\text{s}$ , which is about 11 times faster than the adiabatic method under the same condition.

Controllability

Since all the black-box optimization algorithms cannot ensure that the optimized result is the global maximum/minimum point of in the parameter space, it is sill an open question that if the variational method is able to find the ’best’ metrological state for a given spin configuration or not. In this section, we’re interested in the theoretically achievable controllability of dipolar interacting spin systems. The question is, given any (possibly infinite) arbitrary sequence of evolution under each Hamiltonian governing the dynamics of our system, can we drive any arbitrary unitary operator? Quantum control systems of the general form

H(t)=H_{0}+\sum_{k=1}^{K}u_{k}(t)H_{k},

(S84)

governed by the Schrödinger equation, $i\frac{d}{dt}\ket{\psi(t)}=H(t)\ket{\psi(t)},$ have been studied extensively [d’Alessandro, 2021,Schirmer et al., 2001]. $H_{0}$ is the unperturbed or free evolution Hamiltonian, $H_{k}$ are the control interactions, and $u_{k}(t)$ are the piecewise continuous control fields. There are several distinct but related notions of controllability that have different conditions for ‘full’ controllability. The notion of ‘operator’ or ‘complete’ controllability is the strictest condition and is defined as above. For generic interacting spin systems, all of these notions are equivalent. Complete controllability is equivalent to universal quantum computation (UQC) in quantum information processing (QIP) [Wang et al., 2016,Ramakrishna and Rabitz, 1996].

Controllability Test

The way we investigate the controllability of a generic system (S84) is by examining the so-called ‘dynamical Lie algebra’ $\mathcal{L}_{0}\subseteq u(\mathcal{N})$ or $su(\mathcal{N})$ generated by the operators $\{-iH_{0},-iH_{1},\ldots,-iH_{K}\},$ which are represented by $\mathcal{N}\times\mathcal{N}$ matrices in a basis we choose [d’Alessandro, 2021,Schirmer et al., 2001].

A quantum system of the form (S84) is completely controllable if either $\mathcal{L}_{0}\cong{}u(\mathcal{N})$ or $\mathcal{L}_{0}\cong{}su(\mathcal{N})$ [d’Alessandro, 2021], where $u(\mathcal{N})$ is the unitary Lie algebra represented by the set of skew-Hermitian $\mathcal{N}\times\mathcal{N}$ matrices and $su(\mathcal{N})$ is the special unitary Lie algebra represented by the same set of matrices with the extra condition that they are traceless. Note that $\dim{u(\mathcal{N})}=\mathcal{N}^{2}$ and $\dim{su(\mathcal{N})}=\mathcal{N}^{2}-1$ , and the difference of $1$ comes from counting identity operation ( $I$ ) as a dimension or not. We must find a basis for $\mathcal{L}_{0}$ by iteratively taking the Lie bracket $[\cdot,\cdot]$ of $H_{0},H_{1},\ldots,H_{K}$ until we have a set of $\dim{\mathcal{L}_{0}}$ linearly independent matrices, where the Lie bracket is the commutator $[A,B]=AB-BA$ for matrices $A$ and $B$ . Ref.[d’Alessandro, 2021] and ref.[Schirmer et al., 2001] present an algorithm for generating this basis. Thus, if $\dim{\mathcal{L}_{0}}=\mathcal{N}^{2}$ or $\mathcal{N}^{2}-1$ we can say that the system is completely controllable. Note that for generic spin systems $\mathcal{N}=2^{N}$ for $N$ spins.

Algorithm. Generating

\mathcal{L}_{0}

and finding

\dim{\mathcal{L}_{0}}

Input: Hamiltonians

I\equiv\{H_{0},H_{1},\ldots,H_{K}\}

B\equiv

maximal linearly independent subset of

I

r\equiv\absolutevalue{B}

3. If

r=N^{2}

then

O\equiv B

else

O\equiv\{\}

4. If

r=N^{2}

\absolutevalue{B}=0

then terminate

C\equiv[O,B]\cup[B,B]

, where

\,\quad[S_{1},S_{2}]\equiv\{[s_{1},s_{2}]\,|\,s_{1}\in S_{1},s_{2}\in S_{2}\}

O=O\cup B

B=

maximal linearly independent extension of

O

with

elements from

C

r=r+\absolutevalue{B}

; Go to 4

Output: basis

O

\mathcal{L}_{0}

and

\dim{\mathcal{L}_{0}}=r

Table S2: Implementation of [Schirmer et al., 2001]’s algorithm with a few physically motivated modifications. Note

\absolutevalue{S}

indicates the cardinality of set

S

Controllability of Dipolar Interacting Spin Systems

We write our system in the form (S84) by defining the free evolution Hamiltonian to be the dipolar interaction $H_{\text{dd}}$ and two control interactions $J_{x}$ and $J_{y}$ , as these operators are generators of rotation, with respective independent control fields $\theta_{x}(t)$ and $\theta_{y}(t)$ :

H(t)=H_{\text{dd}}+\theta_{x}(t)J_{x}+\theta_{y}(t)J_{y}.

(S85)

Ref.[Schirmer et al., 2001,Polack et al., 2009] demonstrate that we cannot achieve complete controllability with global controls due to inherent symmetries, so we know that $\dim{\mathcal{L}_{0}}<4^{N}-1$ .

However, complete controllability is a rather strict condition. Not being able to drive any arbitrary unitary does not mean we cannot drive unitaries that produce metrological states.

In fact, ref.[Chen et al., 2017] demonstrate for a long-range Ising spin model (all-to-all interactions) with global controls that metrological states, such as the GHZ and W states are reachable. Ref.[Albertini and D’Alessandro, 2018] extend their result for symmetric Ising spin networks with global controls and demonstrate that one can reach any state that preserves spin permutation invariance. This is known as subspace controllability. The dimension of their dynamical Lie algebra, $\mathcal{L}^{\text{Ising}}\equiv\mathcal{L}^{\text{PI}}\cap su(2^{N})$ , is shown to be $\binom{N+3}{N}-1$ . This is relevant to our system because [Albertini and D’Alessandro, 2021b] show that if we replace the Ising interaction with a more general two body interaction—which includes $H_{\text{dd}}$ —the dimension of the dynamical Lie algebra is necessarily greater than or equal to that of the symmetric Ising case, and it is therefore subspace controllable. This means that we can write $\binom{N+3}{N}-1\leq\dim{\mathcal{L}_{0}}<4^{N}-1$ and say that $\mathcal{L}_{0}$ is subspace controllable but not completely controllable. Therefore, we can achieve arbitrary permutation invariant states, including metrological states such as a GHZ state.

Lie algebra dimension	$N=2$	$N=3$	$N=4$	$N=5$
Completely controllable: $4^{N}$ (or $4^{N}-1$ )	16	64	256	1024
$H_{\text{dd}}$	9	39	225
Symmetric Ising: $\binom{N+3}{N}-1$	9	19	34	55

Table S3: Lie algebra dimensions for the complete controllable system, dipolar interacting system and symmetric Ising system (lower bound for subspace controllability). Dipolar interacting spin systems’

\dim{\mathcal{L}_{0}}

is calculated using an implementation of [Schirmer et al., 2001]’s algorithm, and is necessarily bounded by the complete and subspace controllability dimensions. Lie algebra dimensions for dipolar interacting systems are only calculated up to

N=4

due to stability issues stemming from numerical errors in how matrix rank is calculated.

Finding Reachable States

$\mathcal{L}_{0}$ is associated with a Lie group $e^{\mathcal{L}_{0}}$ by the Lie group–Lie algebra correspondence [d’Alessandro, 2021]. The Lie algebra $u(\mathcal{N})$ corresponds to the Lie group $U(\mathcal{N})$ , and $su(\mathcal{N})$ corresponds to $SU(\mathcal{N})$ . We can define $\mathcal{R}\equiv e^{\mathcal{L}_{0}}$ as the reachable set of unitaries we can drive under $\{H_{k}\}_{k=0,\ldots,K}$ , and so starting from an initial state $\ket{\psi_{0}},\mathcal{R}_{\ket{\psi_{0}}}$ is the set of states we can reach.

As demonstrated in the previous section, our dynamical Lie algebra is a superset of $\mathcal{L}^{\text{Ising}}$ and a strict subset of $su(2^{N})$ , so we can write $e^{\mathcal{L}^{\text{Ising}}}\subseteq e^{\mathcal{L}_{0}}\subset SU(2^{N})$ . Because $\ket{\text{GHZ}}\in\mathcal{R}_{\ket{0}^{\otimes N}}^{\text{Ising}}$ we can write $\ket{\text{GHZ}}\in\mathcal{R}_{\ket{0}^{\otimes N}}^{\text{dipolar}}$ . In fact, this is true for any permutation invariant state, which includes all metrological states we’re interested in.

While we know that metrological states are in the reachable set, determining the parameters that drive the unitaries to produce those states is a highly convex optimization problem equivalent to our variational circuit, using state fidelity between the ideal state and the current state instead of CFI as the cost function. That is we optimize the output unitary of the variational circuit,

\mathcal{S}(\bm{\theta})=e^{-i\frac{\pi}{2}J_{y}}\prod_{i=1}^{m}e^{-i\tau_{i}H_{\text{dd}}}e^{-i\vartheta_{i}J_{x}}e^{i\frac{\pi}{2}J_{y}}e^{-i\tau_{i}^{\prime}H_{\text{dd}}}e^{-i\frac{\pi}{2}J_{y}},

(S86)

where $m$ is the (possibly infinite) number of layers, for state fidelity,

\mathcal{F}(\ket{\text{GHZ}},\mathcal{S}(\bm{\theta})\ket{0}^{\otimes N})=\absolutevalue{\bra{\text{GHZ}}{\mathcal{S}(\bm{\theta})\ket{0}^{\otimes N}}}^{2},

(S87)

for pure states. If there exists some $\bm{\theta}$ such that $\mathcal{F}(\ket{\text{GHZ}},\mathcal{S}(\bm{\theta})\ket{0}^{\otimes N})=1$ , then we can say that $\ket{\text{GHZ}}\in\mathcal{R}_{\ket{0}^{\otimes N}}^{\text{dipolar}}$ . From the previous section, we know such a $\bm{\theta}$ must exist, but it may be the case that $m\rightarrow\infty$ , in which case it is not possible to find this exactly. This is the method employed in ref.[Chen et al., 2017,Gao et al., 2013] to demonstrate the reachability of GHZ and W states for Ising spin models. Our variational circuit method represents an improvement in the efficiency of searching for such metrological states.

$\displaystyle\forall\theta,L(\theta)-L(\theta_{0})$	$\displaystyle=\langle\log P(\bm{X}\|\theta)\rangle_{\theta_{0}}-\langle\log P(\bm{X}\|\theta_{0})\rangle_{\theta_{0}}$
	$\displaystyle=\bigg{\langle}\log\frac{P(\bm{X}\|\theta)}{P(\bm{X}\|\theta_{0})}\bigg{\rangle}_{\theta_{0}}$
	$\displaystyle\leq\bigg{\langle}\frac{P(\bm{X}\|\theta)}{P(\bm{X}\|\theta_{0})}-1\bigg{\rangle}_{\theta_{0}}$
	$\displaystyle=\sum_{\bm{x}\in\bm{X}}\left(\frac{P(\bm{x}\|\theta)}{P(\bm{x}\|\theta_{0})}-1\right)P(\bm{x}\|\theta_{0})$
	$\displaystyle=1-1=0.$	(S40)

$\displaystyle\langle[\log P(X_{1}$	$\displaystyle\|\theta_{0})]^{\prime\prime}\rangle_{\theta_{0}}=\bigg{\langle}\frac{\partial^{2}}{\partial\theta^{2}}\log P(X_{1}\|\theta_{0})\bigg{\rangle}_{\theta_{0}}$
	$\displaystyle=\sum_{x\in X_{1}}[\log P(x\|\theta_{0})]^{\prime\prime}P(x\|\theta_{0})$
	$\displaystyle=\sum_{x\in X_{1}}\left(\frac{P^{\prime\prime}(x\|\theta_{0})}{P(x\|\theta_{0})}-\left(\frac{P^{\prime}(x\|\theta_{0})}{P(x\|\theta_{0})}\right)^{2}\right)P(x\|\theta_{0})$
	$\displaystyle=\sum_{x\in X_{1}}P^{\prime\prime}(x\|\theta_{0})-\sum_{x\in X_{1}}\frac{(P^{\prime}(x\|\theta_{0}))^{2}}{P(x\|\theta_{0})}$
	$\displaystyle=0-\mathrm{CFI}_{\theta_{0}}=-\mathrm{CFI}_{\theta_{0}}.$	(S49)