This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Machine-learning-accelerated Bose-Einstein condensation

Zachary Vendeiro    Joshua Ramette    Alyssa Rudelis    Michelle Chong    Josiah Sinclair    Luke Stewart    Alban Urvoy Current affiliation: Laboratoire Kastler Brossel, Sorbonne Université, CNRS, ENS-Université PSL, Collège de France, 4 place Jussieu, 75005 Paris, France    Vladan Vuletić [email protected] Department of Physics, MIT-Harvard Center for Ultracold Atoms and Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
Abstract

Machine learning is emerging as a technology that can enhance physics experiment execution and data analysis. Here, we apply machine learning to accelerate the production of a Bose-Einstein condensate (BEC) of Rb87{}^{87}\mathrm{Rb} atoms by Bayesian optimization of up to 55 control parameters. This approach enables us to prepare BECs of 2.8×1032.8\times 10^{3} optically trapped Rb87{}^{87}\mathrm{Rb} atoms from a room-temperature gas in 575ms575\ \mathrm{ms}. The algorithm achieves the fast BEC preparation by applying highly efficient Raman cooling to near quantum degeneracy, followed by a brief final evaporation. We anticipate that many other physics experiments with complex nonlinear system dynamics can be significantly enhanced by a similar machine-learning approach.

Recently, researchers have begun applying machine learning techniques to atomic physics experiments, e.g., to enhance data processing for imaging [1, 2, 3, 4, 5], determine the ground state and dynamics of many-body systems [6, 7], or to identify phases and phase transitions [8, 9, 10]. One promising practical application of machine learning to atomic physics is in the optimization of control sequences with many parameters and nonlinear dynamics [11, 12, 13, 14, 15, 16], and in particular to one of the workhorses of atomic physics, Bose-Einstein condensates (BECs) [11, 13, 14, 15, 16].

With few exceptions [17], experiments on BECs end with a destructive measurement, which requires repeated BEC preparation. Approaches to increase the BEC production rate, and associated signal-to-noise ratio of the experiments, have generally relied heavily on hardware improvements [18, 19, 20, 21, 22], or used atomic species with narrower optical transitions [18, 21, 22] than offered by the most widely utilized alkali atoms. For alkali atoms, the tight confinement of atom-chip magnetic traps has enabled fast evaporation sequences, with a complex multi-layer atom-chip achieving BEC preparation times of 850ms850\ \mathrm{ms} for 4×1044\times 10^{4} atoms [19]. Non-alkali atoms featuring narrow optical transitions can be used to reach lower temperatures in narrow-line MOTs [18, 21, 22]. That approach, combined with a dynamically tunable optical dipole trap, has recently been used to prepare BECs of 2×1042\times 10^{4} erbium atoms in under 700ms700\ \mathrm{ms} [22].

In this Letter, we demonstrate a complementary approach where, in a simple experimental setup with a broad-line MOT for a standard alkali atom, machine learning is leveraged to optimize a complex nonlinear laser and evaporative cooling process to quantum degeneracy. Controlling a sequence with up to 55 interdependent experimental parameters, Bayesian optimization [23, 11, 12] finds parameter values which cool a gas from room temperature into the quantum degenerate regime in 575ms575\ \mathrm{ms}, creating a BEC containing NBEC=2.8×103N_{\mathrm{BEC}}=2.8\times 10^{3} atoms. To our knowledge, this is the fastest BEC creation to date. We identify some of the physical strategies discovered by the algorithm, and also investigate how the choice of cost function impacts the trade-off between final atom number and the purity of the created BEC.

Our apparatus employs only a single MOT directly loaded from a Rb87{}^{87}\mathrm{Rb} background vapor, a crossed optical dipole trap, and two Raman cooling beams as depicted in Fig. 1(a). No Zeeman slower, two-dimensional MOT, atom chip [19], dynamic trap shaping [21], or strobing [24, 22] are necessary. Using Raman cooling in a crossed optical dipole trap (cODT), a method that can reach very high phase-space density and even condensation [25], the algorithm achieves a cooling slope of 16 orders of magnitude improvement in phase space density (PSD) per order of magnitude in atom loss (γ=16\gamma=16) up to the threshold to quantum degeneracy. This is significantly better than the γ=7\gamma=7 value we could obtain with extensive manual optimization under similar conditions [25].

Refer to caption
Figure 1: (a) Setup showing 10641064-nm horizontal (waist whw_{h}=18μ\mum, beam slightly tilted downward) and vertical (wvw_{v}=14μ\mum) optical-trapping, 795795-nm Raman coupling (wRw_{R}=500μ\mum) and optical pumping (wxw_{x}=30μ\mum, wy1mmw_{y}\approx 1\ \mathrm{mm}), and 780780-nm absorption-imaging beams. (b) Absorption image used to extract the cost function for a set of parameter values 𝐗\mathbf{X}. (c) Bayesian optimization with a neural network. The model Cp(𝐗)C_{\mathrm{p}}(\mathbf{X}) (orange solid line) attempts to predict the actual system performance C(𝐗)C(\mathbf{X}) (blue dashed line). The algorithm uses the model to predict optimal parameter values 𝐗i+1\mathbf{X}_{i+1} (open diamond), tests those values, and performs a new iteration with an updated model.

Atomic physics methods.—The Raman cooling implementation used in this work is similar to that of Ref. [25]. Cooling proceeds in a cODT formed by intersecting two non-interfering 1064nm1064\ \mathrm{nm} beams, one horizontal and one vertical, see Fig. 1(a). Two 795795-nm beams drive the Raman cooling: the optical pumping beam and the Raman coupling beam. Raman cooling [26] provides sub-Doppler cooling by driving velocity-selective Raman transitions between hyperfine states, here the |F=2,mF=2|F=2,m_{F}=-2\rangle and |2,1|2,-1\rangle states of Rb87{}^{87}\mathrm{Rb} [25]. The Raman transitions are non-dissipative so entropy is removed from the atomic gas in the form of spontaneously scattered photons as atoms are optically pumped back to the dark state |2,2|2,-2\rangle. Light-assisted collisions, which typically prohibit laser cooling at high atomic densities, are suppressed by detuning the optical pumping light 4.33GHz4.33\ \mathrm{GHz} to the red of the D1D_{1} F=2F=2F=2\rightarrow F^{\prime}=2^{\prime} transition, where a local minimum of light-induced loss was observed [25].

The cooling dynamics are controlled via five actuators: (i) the horizontal PyP_{y} and (ii) vertical PzP_{z} trap beam powers which set the trap depth and frequencies, (iii) the Raman coupling beam power PRP_{\mathrm{R}} which tunes the Raman rate, (iv) the power PpP_{\mathrm{p}} of the optical pumping beam which sets the optical-pumping rate (and also Raman rate), and (v) the magnetic field BzB_{z} which adjusts the resonant velocity class for the Raman transition. The cooling procedure is divided into stages during which the controls are linearly ramped, with the endpoints of each ramp constituting the optimization parameters.

Optimization scheme.—The optimization problem can be formulated as the minimization of a cost function CC, which maps a set of parameter values 𝐗M\mathbf{X}\in\mathbb{R}^{M} to a corresponding cost value C(𝐗)C(\mathbf{X})\in\mathbb{R}, where MM is the number of optimization parameters. The cost CC quantifies the results, and is generally a priori unknown, but can be extracted from measurements. Bayesian optimization is well-suited for this type of problem as it can tolerate noise in the measured cost and typically requires testing fewer values of 𝐗\mathbf{X} than other optimization methods [11, 12, 13, 14, 15, 16].

Bayesian optimization begins with collecting a training dataset by experimentally measuring the cost Cm(𝐗i)C_{\mathrm{m}}(\mathbf{X}_{i}) for various values of sets of parameter values 𝐗i\mathbf{X}_{i}. The 𝐗i\mathbf{X}_{i} used to construct the training dataset are chosen by a training algorithm, which can implement another optimization algorithm or can select 𝐗i\mathbf{X}_{i} randomly. A model of the cost function is then fit to the training dataset which approximates the unknown true cost function C(𝐗)C(\mathbf{X}). Although Bayesian optimization typically uses a Gaussian process for its model [23], the present work uses neural networks [12, 27], which were chosen for their significantly faster fitting time for our typical number of optimization parameters. Once the model is fit, a standard numerical optimization algorithm is applied to the modeled cost function Cp(𝐗)C_{\mathrm{p}}(\mathbf{X}) to determine which value 𝐗i+1\mathbf{X}_{i+1} for the next iteration is predicted to yield the minimal cost, as depicted in Fig. 1(c). Optionally this numerical optimization can be constrained to a trust region (a smaller volume of parameter space centered around the 𝐗i\mathbf{X}_{i} which yielded the best cost measured thus far). The predicted optimal value 𝐗i+1\mathbf{X}_{i+1} is then tested by experimentally measuring the corresponding cost Cm(𝐗i+1)C_{\mathrm{m}}(\mathbf{X}_{i+1}). The next iteration begins by retraining the model with the new result, and making a new prediction for the optimal value of 𝐗\mathbf{X} with the updated model. The algorithm iterates until it reaches a termination criterion, such as a set maximum number of iterations, or a set number of consecutive iterations that fail to return better results. All optimization in this work was performed with the open-source packages M-LOOP [11, 12] to implement the Bayesian optimization and Labscript [28] for experimental control. Additional implementation details are included in Appendix A.

Cost function.—Since the optimization transitions the gas from the classical into the quantum degenerate regime, the final state of the gas depends strongly on how the cost function is chosen as a combination of the two experimentally accessible parameters: atom number NN and temperature TT. The classical phase space density PSDc\mathrm{PSD}_{\mathrm{c}} is defined as PSDcncpλdB3\mathrm{PSD}_{\mathrm{c}}\equiv n_{\mathrm{cp}}\lambda_{\mathrm{dB}}^{3}, where λdB\lambda_{\mathrm{dB}} is the thermal de Broglie wavelength and ncpn_{\mathrm{cp}} is the calculated peak number density neglecting bosonic statistics (See Appendix B for calculation details). The value of PSDc\mathrm{PSD}_{\mathrm{c}} is nearly equal to the true PSD\mathrm{PSD} when PSD1\mathrm{PSD}\ll 1, while at the threshold to condensation, PSDc1\mathrm{PSD}_{\mathrm{c}}\sim 1. Since the temperature TT is more difficult to determine in the quantum degenerate regime, and also requires a fit to the data with potential convergence problems, we instead measure NN and the peak optical depth OD\mathrm{OD} in an absorption image. Generally ensembles with larger PSDc\mathrm{PSD}_{\mathrm{c}} have a larger atom number NN and less expansion energy, which leads to a larger peak optical depth OD\mathrm{OD} for a given NN. Guided by this, we explored cost functions of the form

C(𝐗)f(N/N1)OD3Nα9/5,C(\mathbf{X})\propto-f(N/N_{1})\mathrm{OD}^{3}N^{\alpha-9/5}, (1)

where f(N/N1)f(N/N_{1}) is a smoothed Heaviside step function with N1N_{1} chosen near the detection noise floor (see Appendix A). The parameter α\alpha in the cost function tunes the trade-off between optimizing for larger atom number or lower temperature. For a pure BEC after sufficient time-of-flight (TOF) expansion, |C/f|\lvert C/f\rvert scales as (NBEC)α(N_{\mathrm{BEC}})^{\alpha} (see Appendix C). For a thermal cloud, |C/f|\lvert C/f\rvert is proportional to PSDc\mathrm{PSD}_{\mathrm{c}} when α=1/5\alpha=-1/5, although that value of α\alpha is unsuitable for condensation as increasing the atom number in the BEC requires α>0\alpha>0.

Optimization procedure.—The sequence begins with a separately optimized 9999-ms long MOT loading and compression period. The trap beam powers are ramped to their initial Raman cooling values during the last 10ms10\ \mathrm{ms} of the MOT compression and then the magnetic field is adjusted to its initial Raman cooling value in 1ms1\ \mathrm{ms}, at which point the horizontal dipole trap holds typically N=2.7×105N=2.7\times 10^{5} atoms. We then added 100100-ms stages of Raman cooling one by one and optimized them individually. After five stages, the algorithm tended to turn off the Raman cooling by turning down PpP_{\mathrm{p}} or PRP_{\mathrm{R}} or by tuning the magnetic field BzB_{z} such that the Raman transition became off-resonant. We then added up to six shorter 3030-ms long stages in which the optical pumping and Raman coupling beams were turned off, and the algorithm performed evaporative cooling. Due to the reduced number of parameters, we were able to optimize the evaporation stages simultaneously, which produced a BEC. Subsequently we shortened the Raman cooling and evaporation stages with parameter values fixed until only a small and impure BEC was produced, and then ran a global reoptimization. In this global optimization stage, all 42 of the Raman cooling and evaporation parameters were reoptimized simultaneously using the previous values as the initial guess for 𝐗\mathbf{X}. Often a trust region set to one tenth of the allowed range for each parameter was used. This kept the optimizer focused in regions of parameter space which produced a measurable signal, as adjusting even a single parameter too far would often result in the loss of all atoms. We repeated this sequence shortening and reoptimization procedure until the algorithm failed to find parameters that could produce sufficiently pure BECs.

Refer to caption
Figure 2: Control waveforms (a-b) and measured trap and atomic-gas properties (c-e) of the optimized sequence. Gray, blue and oranges shadings mark the MOT loading, Raman cooling, and evaporation periods, respectively. The Raman beam power has been multiplied by 10310^{3} for better visibility. νx,νy,νz,νh\nu_{x},\nu_{y},\nu_{z},\nu_{h} are the trap vibrations frequencies in the x,y,zx,y,z directions and in the horizontal trap, respectively; νc\nu_{\mathrm{c}} is the atomic collision rate. PSDc\mathrm{PSD}_{\mathrm{c}} does not account for bosonic statistics and changes slowly while the BEC forms quickly above threshold. Calculations assume thermal equilibrium.

The required beam powers generally varied over several orders of magnitude, so the logarithm of their powers were used as entries in 𝐗\mathbf{X}, while the magnetic-field control parameter BzB_{z} was kept a linear parameter. A feedforward adjustment was included in the BzB_{z} control values to account for the light shift of the |2,1|2,-1\rangle state by the optical pumping beam. We averaged over five repetitions of the experiment for each set of parameter values tested. The number of iterations per optimization varied but was typically 1000{\sim}1000 (including the initial training), and required several hours, both for the single-stage optimizations and the full-sequence optimizations. A simpler optimization procedure was also attempted which did not involve optimizations of individual stages. Instead the sequence was divided into ten 100ms100\ \mathrm{ms} stages and all 55 parameters were optimized from scratch simultaneously. That approach combined with the shortening and reoptimizing procedure successfully produced a similar BEC, albeit in slightly longer time (650ms650\ \mathrm{ms} vs 575ms575\ \mathrm{ms}), possibly due to the optimization becoming trapped in a local optimum (see Appendix A for further discussion).

Refer to caption
Figure 3: Results of the 575ms575\ \mathrm{ms} optimized sequence. (a) PSDc\mathrm{PSD}_{\mathrm{c}} vs atom number NN. Initial cooling until PSDc101\mathrm{PSD}_{\mathrm{c}}\sim 10^{-1} is very efficient with γ16\gamma\approx 16 (gray line). The performance of the much slower (3 s-long) manually optimized sequence of Ref. [25] is shown for comparison (γ7\gamma\approx 7). (b) Cross section of 24-ms TOF image (inset) shows a BEC (orange fit) with small thermal wings.

Results and physical interpretation.—The best discovered 575575-ms long control sequence and corresponding results are depicted in Fig. 2 and Fig. 3. Notably, the algorithm discovered gray molasses [29, 30] in the MOT phase, which it applies at the end of the compression sequence. This outperforms the bright molasses [31, 32] that was previously used in the manually optimized compression sequence, with the gray molasses loading a similar number of atoms ten times faster. After the MOT loading stage and transfer into the cODT, five 63{\sim}63-ms long stages of Raman cooling follow, and then the optical pumping and Raman beams are ramped off, followed by six 27{\sim}27-ms long evaporation stages. As observed in previous work [12, 13, 14], the ramps produced by Bayesian optimization are non-monotonic and appear non-intuitive, but they outperform the routines we found by manual optimization. A reason for the non-monotonic waveforms may be that the cost function includes many local minima. The optimization can settle into any one of these local optima randomly, and produce complex but specific waveforms, as observed in Ref. [12]. Despite the non-monotonic ramps, PSDc\mathrm{PSD}_{\mathrm{c}} increases smoothly exponentially during this part of the sequence (Fig. 2(e)), due in part to the finite thermalization rate.

By shortening the sequence we are asking the algorithm to maximize the cooling speed, which is limited by the lower of the collisional rate νc\nu_{\mathrm{c}} and the trap vibration frequencies νx,y,z\nu_{x,y,z} [33]. When the gas is still hot, we have νcνx,y,z\nu_{\mathrm{c}}\ll\nu_{x,y,z}, and the algorithm employs Raman cooling to increase the density and collision rate (Fig. 2(d)). However, when νc\nu_{\mathrm{c}} approaches the lowest trap vibration frequency νy\nu_{y} near the time t=225t=225 ms, the algorithm starts to reduce the Raman rate, and a little later the optical pumping rate, in order to reduce light induced collisions that scale with νc\nu_{\mathrm{c}}, rather than the trap vibration frequency. Subsequently, for times t>225t>225 ms, the cooling proceeds near optimally, with the collision rate close to, but a little below, the trap vibration frequencies. Furthermore, as the system approaches condensation near t=410t=410 ms, the collision rate is somewhat lowered to reduce light-induced atom loss (Fig. 2(c)).

Another effect limiting the cooling speed is the loading of the atoms from the single horizontal trap, in which the sample is initially prepared, into the crossed dipole trap (see the movie in SM [34]). Initially, the vertical-beam power PzP_{z} is held low to avoid creating a high-density dimple region which would lead to excess loss during Raman cooling. Later, PzP_{z} is ramped up to gather atoms from the horizontal trap beam into the overlap region of the cODT in order to increase the collision rate and speed up evaporative cooling. The relatively sudden ramping of the trap power up and then back down visible in Fig. 2a likely involves an optimal-control-like process since the trap compression and relaxation are faster than the axial period of the horizontal trap of 200\sim 200 ms.

The optimization tended to turn off the Raman cooling after five stages because the cloud temperature TT was below the effective recoil temperature [25] where Raman cooling, even with optimal parameters, becomes too slow, while leading to trap loss and heating due to light-assisted collisions [35]. The Bayesian optimization recognized this and shut down the Raman cooling at this point, with the atomic gas close to condensation. Subsequently, at higher compression which is primarily achieved by increasing the vertical beam power, the horizontal trap power is reduced and atoms are efficiently evaporated along the direction of gravity in the tilted potential [20] (see the movie in the SM [34]). Note also that once the atoms have been loaded into the crossed-trap region (after t=350t=350 ms), the algorithm makes all trap vibration frequencies similar, which provides the fastest overall thermalization, and hence largest cooling speed.

The BEC is fully prepared at the end of the evaporation stages, 575ms575\ \mathrm{ms} after the start of the MOT loading. The final cloud contains 3.7×1033.7\times 10^{3} total atoms and is shown in Fig. 3(b). A bimodal fit of the cloud indicates that 2.8×1032.8\times 10^{3} atoms (76 %) are in the BEC. Although the sequence was optimized for speed rather than efficiency, the initial cooling occurs with a logarithmic slope γ=d(logPSDc)/d(logN)16\gamma=\mathrm{d}(\log\mathrm{PSD}_{\mathrm{c}})/\mathrm{d}(\log N)\approx 16.

Refer to caption
Figure 4: Cross sections of 24-ms TOF images (200 averages) optimized for different cost function parameter α\alpha (see main text) with 1-s long sequences, demonstrating the trade off between optimizing for atom number or temperature. Also plotted are the results of optimizing for atom number NN only. Inset: Condensate fraction NBEC/NN_{\mathrm{BEC}}/N vs NN for different α\alpha.

Cost function impact.—The atomic gases produced by sequences optimized for different values of α\alpha are presented in Fig. 4, as well as the results when optimizing for total atom number (NN). Larger values of α\alpha result in more atoms, but at higher temperature and lower condensate fraction, while smaller values of α\alpha produce purer BECs, but with fewer atoms overall. Setting α\alpha to 0.50.5 was found to make a reasonable compromise (orange curve in Fig. 4); so that value was used for the final full-sequence optimization which yielded the data presented in Fig. 2.

Outlook.—In conclusion, we have demonstrated that Raman cooling with far detuned optical-pumping light combined with a final evaporation can rapidly produce BECs with a comparatively simple apparatus, even with a standard alkali atom which lacks narrow optical transitions. Bayesian optimization greatly eased the search for a short sequence to BEC, quickly discovering initially unintuitive yet high-performing sequences. Inspection of the parameters chosen by the algorithm reveals several physical strategies, such as adjusting a collision rate close to, but below the trap vibration frequencies to maximize the thermalization and cooling speed while minimizing density-dependent atom loss, non-adiabatic loading into the crossed-trap dimple, and creating a nearly isotropic trap for efficient evaporation. In future applications, faster condensation can likely be achieved by including dynamical tuning of trap size [21], while user intervention may be further reduced by factoring the sequence length into the assigned cost [14]. We anticipate that many other experimental procedures in atomic physics and beyond can be improved by machine learning.

Acknowledgements.
The authors would like to thank Martin Zwierlein for inspiring physics discussions, and Michael Hush, Harry Slatyer, Philip Starkey, Christopher Billington, and Russell Anderson for stimulating discussions and software assistance. This work was in part supported by the NSF, NSF CUA, NASA, DoE, and MURI through AFOSR.

Appendix A Bayesian Optimization Implementation

In M-LOOP’s implementation of Bayesian optimization, the training algorithm used to pick parameters and generate a training dataset is also run periodically even after the training dataset is complete [11, 12]. In particular, once sufficient training data is acquired, three independent neural networks are trained. Each neural net is fully connected and consists of an input layer with one node for each optimization parameter, followed by five hidden layers with 64 nodes each, and then an output layer with a single node. Once the training has completed, each neural network is used to generate a set of parameter values 𝐗\mathbf{X} which it predicts to be optimal, and each of those three 𝐗\mathbf{X} are experimentally tested. Then another iteration of the training algorithm is performed and the 𝐗\mathbf{X} it suggests are also tested. The results from all four of these measurements are included in the next training of the neural nets for the subsequent Bayesian optimization iteration. The additional iterations of the training algorithm are intended to encourage parameter space exploration and provide unbiased data [11, 12].

In this work, the absorption images used to measure the cost function were generally taken after 1.5 to 8 ms of time-of-flight (TOF) expansion. We averaged over five repetitions of the experiment for each set of parameter values tested, which took 10s{\sim}10\ \mathrm{s} accounting for experimental and analysis overhead. Simply taking the largest optical depth measured in any single pixel of an absorption image as OD\mathrm{OD} makes it prone to noise, so OD\mathrm{OD} was set to the average OD of several pixels with the largest OD to reduce noise. To compare different sequences on an equal footing during optimizations, the trap beams were always ramped to a fixed power setting before releasing the atoms for TOF imaging. This final fixed ramp is only necessary during optimizations and is omitted from the sequence once the optimizations are complete. The smoothed Heaviside step function f(N/N1)f(N/N_{1}) included in the cost function ensures that the cost does not diverge at low NN while having little effect when NN is above the measurement noise floor. The form of f(N/N1)f(N/N_{1}) is inspired by the expression for the excited state population of a two-level system in thermal equilibrium and it is defined as

f(N/N1)={(2eN1/N+1)N>00N0f(N/N_{1})=\begin{cases}\left(\frac{2}{e^{N_{1}/N}+1}\right)&N>0\\ 0&N\leq 0\end{cases} (2)

For many of the optimizations in this work, particularly those with tens of parameters, the cost function landscape is “sparse” in the sense that most sets of parameter values yield poor results with a signal below the measurement noise floor. Thus the actual performance for such 𝐗\mathbf{X} cannot be measured, and testing them provides little information to the model. This leads to large regions of parameter space where there is no measurable signal and the direction towards better values cannot be inferred. There are two notable consequences of this. Firstly, for such optimizations it is generally necessary to provide initial values to the optimization which give a nonzero signal. Without a good starting point, the training dataset will often only include measurements dominated by noise, making it exceedingly unlikely for the Bayesian optimization to succeed. Secondly, for such optimizations it is generally helpful to specify a trust region. This limits the extent of excursions as the optimizer explores parameter space, making it more likely to test parameter values which yield a measurable signal. However, this does come at the cost that it makes it less likely for the optimizer to jump from one local minimum to another better minimum. We often performed the same optimization with and without a trust region in parallel. This could be done without significantly extending the duration of optimizations because the analysis for each iteration typically took longer than the time required to perform the experiment. Thus one optimization could run experiments while the other analyzed its most recent results. For optimizations with many parameters, the results with a trust region were typically as good as or better than those without. This is likely a consequence of that fact that, given the sparsity of the cost function landscape, it is unlikely for the optimizer to discover another local optimum. Thus it is better for the optimizer to focus on modeling the region of parameter space around the local optimum rather than fruitlessly searching for another local optimum.

The sparsity of the cost landscape and necessity for providing initial parameters which produce measurable results posed a difficulty when we optimized an entire sequence from scratch at once (rather than initially adding one cooling stage at a time). We resolved this by reducing the time of flight to 1.5ms1.5\ \mathrm{ms} for the first optimization. With such a short time of flight, even poor parameter values could produce clouds with a peak optical depth above the measurement noise floor. Due to the finite dynamic range of the absorption imaging, the results of this first optimization produced a cloud which saturated the measurement and thus made it impossible to accurately quantify performance for the best-performing values. The next optimizations were performed with the same sequence duration, but the time of flight increased to 5ms5\ \mathrm{ms} then 8ms8\ \mathrm{ms}. This made it possible to better discern differences between high-performing sets of parameter values at the cost of increasing the performance required to produce a signal above the noise floor and thus increasing the sparsity of the cost function landscape. The procedure of shortening then re-optimizing the sequence was then applied, resulting in the sequence presented in Fig. 5(b), which produced a BEC in 650ms650\ \mathrm{ms}. The parameter α\alpha was set to 0.50.5 throughout this procedure.

Refer to caption
Figure 5: (a) The control waveforms for the 1 second sequences corresponding to the results presented in Fig. 4, as well as the initial waveform used as the starting point for each of those optimizations. (b) The control waveforms for the 650ms650\ \mathrm{ms} 10-stage sequence optimized from scratch rather than stage-by-stage initially, which was optimized with α=0.5\alpha=0.5. Note the differing limits for the xx-axes between (a) and (b). The sequences in (a) include a 50ms50\ \mathrm{ms} magnetic coil ramp duration which was later reduced to 1ms1\ \mathrm{ms}. The MOT sections of the sequences have been omitted for simplicity. Note that the waveforms in (a) are mostly qualitatively similar despite being optimized for different cost functions. The waveforms in (b) are more qualitatively distinct from those in (a), even for the orange curves which were also optimized with α=0.5\alpha=0.5. This suggests that the independent optimizations likely become trapped around disparate local minima. On the other hand, tuning the cost function while providing the same initial parameter values each time typically causes only smaller deviations around the initial values.

Although it is not strictly fair to do so due to the differing parameterizations, it is still informative to compare the control waveforms of the independently-optimized 650ms650\ \mathrm{ms} sequence to those from Fig. 4. These waveforms are presented in Fig. 5. The sequences of Fig. 4 optimized for different α\alpha all had fairly similar waveforms. On the other hand, the 650ms650\ \mathrm{ms} sequence had a qualitatively different waveform. For example, it lacks the sudden rise and drop in vertical trap power towards the end of the sequence present in the other waveforms. This suggests that it has converged to a qualitatively-different local optimum. On the other hand, the sequences of Fig. 4 were all optimized with a trust region and the same initial 𝐗\mathbf{X}. Thus those optimizations primarily performed a local search, only slightly tuning 𝐗\mathbf{X} to tailor the sequence for their particular value of α\alpha. Although there are small differences in parameterization, the fact that two different optimizations with the same value of α\alpha produce sequences that differ more than optimizations with the same initial 𝐗\mathbf{X} but different α\alpha supports the notion that the cost function landscape includes multiple local minima, as suggested in the main text.

Appendix B Calculations of Atomic Gas Properties

The classical phase space density is defined as PSDc=ncpλdB3\mathrm{PSD}_{\mathrm{c}}=n_{\mathrm{cp}}\lambda_{\mathrm{dB}}^{3} where ncpn_{\mathrm{cp}} is the peak number density calculated for a classical gas (i.e. neglecting Bosonic statistics) and λdB=h/2πmkBT\lambda_{\mathrm{dB}}=h/\sqrt{2\pi mk_{\mathrm{B}}T} is the de Broglie wavelength. Here hh is the Planck constant, mm is the mass of an atom, and kBk_{\mathrm{B}} is the Boltzmann constant. To calculate PSDc\mathrm{PSD}_{\mathrm{c}} for a cloud, its atom number NN and temperature TT are measured and it is assumed to be in thermal equilibrium. The value of λdB\lambda_{\mathrm{dB}} is easily calculated from the measured temperature. The partition function Z=fB(𝐱)dVZ=\int f_{\mathrm{B}}(\mathbf{x})\mathrm{d}V is then calculated by numerically integrating the Boltzmann factor fB(𝐱)=exp[U(𝐱)/(kBT)]f_{\mathrm{B}}(\mathbf{x})=\exp\left[-U(\mathbf{x})/(k_{\mathrm{B}}T)\right] over the trap volume, where U(𝐱)U(\mathbf{x}) is the trap potential at position 𝐱\mathbf{x}. The U(𝐱)U(\mathbf{x}) is taken to be the sum of two Gaussian beams, one for each cODT beam, and gravity is neglected for simplicity. Each Gaussian beam with peak depth Ui,0U_{i,0} and waist wi,0w_{i,0} contributes a potential of the form

Ui(𝐱)=Ui,0(wi,0wi(z))2exp(2(r)2wi(z)2)U_{i}(\mathbf{x})=U_{i,0}\left(\frac{w_{i,0}}{w_{i}(z^{\prime})}\right)^{2}\exp\left(\frac{-2(r^{\prime})^{2}}{w_{i}(z^{\prime})^{2}}\right) (3)

where wi(z)=wi,01+(z/zR)2w_{i}(z^{\prime})=w_{i,0}\sqrt{1+(z^{\prime}/z_{\mathrm{R}})^{2}} is the spatially-varying beam width and zR=πwi,02/λz_{\mathrm{R}}=\pi w_{i,0}^{2}/\lambda is the Rayleigh range. The primed coordinates zz^{\prime} and rr^{\prime} are taken to be along and perpendicular to the beam’s propagation direction respectively. The value of ncpn_{\mathrm{cp}} can be calculated as NfB(𝐱0)/ZNf_{\mathrm{B}}(\mathbf{x}_{0})/Z where 𝐱0\mathbf{x}_{0} is the position of the bottom of the trap. Finally PSDc\mathrm{PSD}_{\mathrm{c}} is evaluated from its definition in terms of ncpn_{\mathrm{cp}} and λdB\lambda_{\mathrm{dB}}. Notably, for much of the sequence the atomic cloud extends out of the cODT region and into the wings of the horizontal ODT, in which case the trap potential seen by the cloud is not harmonic. Thus the well-known result PSDc=N(ω¯)3/(kBT)3\mathrm{PSD}_{\mathrm{c}}=N(\hbar\bar{\omega})^{3}/(k_{\mathrm{B}}T)^{3} for a harmonic trap with geometric mean trap frequency ω¯\bar{\omega} cannot be used for most of the sequence.

Calculation of the mean collision rate νc\nu_{\mathrm{c}} requires averaging the collision rate ncσvrmsn_{\mathrm{c}}\sigma v_{\mathrm{rms}} over the cloud where σ\sigma is the atomic collision cross section and vrmsv_{\mathrm{rms}} is the root-mean-square relative velocity of atoms in the cloud. The value of ncn_{\mathrm{c}} varies over the trap and obeys nc(x)=NfB(x)/Zn_{\mathrm{c}}(x)=Nf_{\mathrm{B}}(x)/Z, again neglecting Bosonic statistics. From equipartition for a 3D gas, (1/2)μvrms2=(3/2)kBT(1/2)\mu v_{\mathrm{rms}}^{2}=(3/2)k_{\mathrm{B}}T where μ=m/2\mu=m/2 is the reduced mass for two atoms. Thus the value of vrmsv_{\mathrm{rms}} is given by 6kBT/m\sqrt{6k_{\mathrm{B}}T/m}. The local collision rate is averaged by integrating ncσvrmsn_{\mathrm{c}}\sigma v_{\mathrm{rms}} over the cloud, weighted by the 1-atom number density nc/Nn_{\mathrm{c}}/N, yielding

νc=Nσ6kBTm(fB(x)Z)2dV\nu_{\mathrm{c}}=N\sigma\sqrt{\frac{6k_{\mathrm{B}}T}{m}}\int\left(\frac{f_{\mathrm{B}}(x)}{Z}\right)^{2}\mathrm{d}V (4)

The above calculations assume that the cloud is in thermal equilibrium, which is often a good approximation. However, after about 440ms440\ \mathrm{ms} of the final optimized 575ms575\ \mathrm{ms} sequence, the power in the vertical trapping beam PzP_{z} is rapidly increased, as can be seen in Fig. 2(a). This change is likely non-adiabatic for atoms in the wings of the horizontal ODT and the cloud may no longer be in thermal equilibrium. This is likely why the calculated PSDc\mathrm{PSD}_{\mathrm{c}} appears to increase beyond 1{\sim}1 before the appearance of a BEC. Notably this non-adiabatic portion of the sequence occurs only after PSDc\mathrm{PSD}_{\mathrm{c}} has reached 0.40.4, and thus it does not affect the cooling efficiency estimate of γ16\gamma\approx 16 for the cooling up to PSDc=0.1\mathrm{PSD}_{\mathrm{c}}=0.1.

The peak trap depth Ui,0U_{i,0} for each beam was determined from the beam waist wi,0w_{i,0} and radial trap frequency ωi,r\omega_{i,\mathrm{r}} measured for each beam. The beam waists, defined as the radius at which the intensity falls to 1/e21/e^{2} of its peak value, were measured by profiling the trap beams on a separate test setup which focused the light outside of the vacuum chamber. The trap frequencies were directly measured by carefully perturbing the position of a cloud in the cODT and observing its oscillations. Before perturbing the cloud, it was first cooled sufficiently to make it well-confined to the central region of the cODT so that the potential was approximately harmonic. The peak trap depth for each beam could then be calculated as Ui,0=mωi,r2wi,02/4U_{i,0}=m\omega_{i,\mathrm{r}}^{2}w_{i,0}^{2}/4. This expression can be derived by equating the spring constant for the trap in the radial direction k=d2Ui(𝐱)/(dr)2|𝐱=𝐱0k=\mathrm{d}^{2}U_{i}(\mathbf{x})/(\mathrm{d}r^{\prime})^{2}|_{\mathbf{x}=\mathbf{x}_{0}} to its value for a harmonic oscillator k=mωi,r2k=m\omega_{i,\mathrm{r}}^{2}.

Appendix C Cost Scaling

The peak optical depth OD\mathrm{OD} of a pure BEC after sufficient time of flight expansion scales as ODNBEC/A\mathrm{OD}\propto N_{\mathrm{BEC}}/A where AA is the area of the cloud in the image. The area scales in proportion to v¯2\bar{v}^{2} where v¯\bar{v} is the expansion velocity, which is related to the BEC chemical potential via (1/2)mv¯2=(2/7)μ(1/2)m\bar{v}^{2}=(2/7)\mu in a harmonic trap [36]. Thus AμA\propto\mu. Furthermore, the chemical potential for a harmonically-trapped BEC scales as μNBEC2/5\mu\propto N_{\mathrm{BEC}}^{2/5} [36], so ANBEC2/5A\propto N_{\mathrm{BEC}}^{2/5} and ODNBEC3/5\mathrm{OD}\propto N_{\mathrm{BEC}}^{3/5}. The expression OD3Nα9/5\mathrm{OD}^{3}N^{\alpha-9/5} then scales as (NBEC)α(N_{\mathrm{BEC}})^{\alpha}. Notably this scaling also applies to a harmonically-trapped BEC when imaged in situ. There, the BEC radius RR scales as RNBEC1/5R\propto N_{\mathrm{BEC}}^{1/5} [36]. In that case, AR2NBEC2/5A\propto R^{2}\propto N_{\mathrm{BEC}}^{2/5} as before. The same arguments then apply again, indicating that OD3Nα9/5\mathrm{OD}^{3}N^{\alpha-9/5} scales as (NBEC)α(N_{\mathrm{BEC}})^{\alpha} for a harmonically-trapped BEC in situ just as it does for a BEC after long time of flight expansion.

The scaling of OD3Nα9/5\mathrm{OD}^{3}N^{\alpha-9/5} for a purely thermal cloud is also of note. For a harmonically-trapped thermal cloud, the RMS size in a given direction for any time of flight is proportional to T1/2T^{1/2}, so ATA\propto T. Thus ODN/T\mathrm{OD}\propto N/T and OD3Nα9/5\mathrm{OD}^{3}N^{\alpha-9/5} scales in proportion to Nα+(6/5)/T3N^{\alpha+(6/5)}/T^{3}. Clouds with smaller temperatures are favored by the cost function, and clouds with larger atom numbers are favored as long as α>6/5\alpha>-6/5. For the case α=1/5\alpha=-1/5, the value of OD3Nα9/5\mathrm{OD}^{3}N^{\alpha-9/5} scales in proportion to N/T3N/T^{3}, which is proportional to PSDc\mathrm{PSD}_{\mathrm{c}}. That choice of α\alpha was often used when optimizing individual stages before reaching the threshold to BEC. However note that this choice of α\alpha leads to the scaling OD3Nα9/5NBEC1/5\mathrm{OD}^{3}N^{\alpha-9/5}\propto N_{\mathrm{BEC}}^{-1/5} for a pure BEC and is thus not a good choice when the cloud reaches condensation.

Appendix D Raman Cooling Laser

Standard Doppler cooling requires a laser with linewidth narrow compared to the optical transition linewidth in order to achieve optimal temperatures. This places stringent technical requirements for Doppler cooling on narrow optical transitions. By contrast, Raman cooling can achieve similar velocity resolution and associated temperatures with a comparatively broad laser. In this work, the light for the Raman coupling and optical pumping beams, which drive the up-leg and down-leg of the Raman transition respectively, was derived from the same laser. This ensures that any laser frequency noise is common mode between the two legs of the Raman transition and makes it possible to resolve Doppler shifts much smaller than the laser linewidth. A DBR laser diode (Photodigm PH795DBR180TS) without an external cavity was sufficient to generate the Raman cooling light. The forgiving laser linewidth requirements further simplify implementation of our BEC production approach compared to schemes which require Doppler cooling on narrow optical transitions. Thus our approach may be useful even for species which include narrow optical transitions.

References

  • Picard et al. [2019] L. R. Picard, M. J. Mark, F. Ferlaino, and R. van Bijnen, Deep learning-assisted classification of site-resolved quantum gas microscope images, Measurement Science and Technology 31, 025201 (2019).
  • Ding et al. [2019] Z.-H. Ding, J.-M. Cui, Y.-F. Huang, C.-F. Li, T. Tu, and G.-C. Guo, Fast high-fidelity readout of a single trapped-ion qubit via machine-learning methods, Physical Review Applied 12, 014038 (2019).
  • Seif et al. [2018] A. Seif, K. A. Landsman, N. M. Linke, C. Figgatt, C. Monroe, and M. Hafezi, Machine learning assisted readout of trapped-ion qubits, Journal of Physics B: Atomic, Molecular and Optical Physics 51, 174006 (2018).
  • Ness et al. [2020] G. Ness, A. Vainbaum, C. Shkedrov, Y. Florshaim, and Y. Sagi, Single-exposure absorption imaging of ultracold atoms using deep learning, Physical Review Applied 14, 014011 (2020).
  • Guo et al. [2021] S. Guo, A. R. Fritsch, C. Greenberg, I. Spielman, and J. P. Zwolak, Machine-learning enhanced dark soliton detection in bose–einstein condensates, Machine Learning: Science and Technology 2, 035020 (2021).
  • Carleo and Troyer [2017] G. Carleo and M. Troyer, Solving the quantum many-body problem with artificial neural networks, Science 355, 602 (2017).
  • Saito [2017] H. Saito, Solving the Bose–Hubbard model with machine learning, Journal of the Physical Society of Japan 86, 093001 (2017).
  • Wang [2016] L. Wang, Discovering phase transitions with unsupervised learning, Physical Review B 94, 195105 (2016).
  • Carrasquilla and Melko [2017] J. Carrasquilla and R. G. Melko, Machine learning phases of matter, Nature Physics 13, 431 (2017).
  • Torlai et al. [2019] G. Torlai, B. Timar, E. P. L. van Nieuwenburg, H. Levine, A. Omran, A. Keesling, H. Bernien, M. Greiner, V. Vuletić, M. D. Lukin, R. G. Melko, and M. Endres, Integrating neural networks with a quantum simulator for state reconstruction, Phys. Rev. Lett. 123, 230504 (2019).
  • Wigley et al. [2016] P. B. Wigley, P. J. Everitt, A. van den Hengel, J. W. Bastian, M. A. Sooriyabandara, G. D. McDonald, K. S. Hardman, C. D. Quinlivan, P. Manju, C. C. Kuhn, et al., Fast machine-learning online optimization of ultra-cold-atom experiments, Scientific reports 6, 1 (2016).
  • Tranter et al. [2018] A. D. Tranter, H. J. Slatyer, M. R. Hush, A. C. Leung, J. L. Everett, K. V. Paul, P. Vernaz-Gris, P. K. Lam, B. C. Buchler, and G. T. Campbell, Multiparameter optimisation of a magneto-optical trap using deep learning, Nature communications 9, 1 (2018).
  • Nakamura et al. [2019] I. Nakamura, A. Kanemura, T. Nakaso, R. Yamamoto, and T. Fukuhara, Non-standard trajectories found by machine learning for evaporative cooling of 87 Rb atoms, Optics express 27, 20435 (2019).
  • Barker et al. [2020] A. J. Barker, H. Style, K. Luksch, S. Sunami, D. Garrick, F. Hill, C. J. Foot, and E. Bentine, Applying machine learning optimization methods to the production of a quantum gas, Machine Learning: Science and Technology 1, 015007 (2020).
  • Davletov et al. [2020] E. T. Davletov, V. V. Tsyganok, V. A. Khlebnikov, D. A. Pershin, D. V. Shaykin, and A. V. Akimov, Machine learning for achieving Bose-Einstein condensation of thulium atoms, Phys. Rev. A 102, 011302(R) (2020).
  • Wu et al. [2020] Y. Wu, Z. Meng, K. Wen, C. Mi, J. Zhang, and H. Zhai, Active learning approach to optimization of experimental control, Chinese Physics Letters 37, 103201 (2020).
  • Chen et al. [2022] C.-C. Chen, R. González Escudero, J. Minář, B. Pasquiou, S. Bennetts, and F. Schreck, Continuous Bose–Einstein condensation, Nature , 1 (2022).
  • Stellmer et al. [2013] S. Stellmer, R. Grimm, and F. Schreck, Production of quantum-degenerate strontium gases, Phys. Rev. A 87, 013611 (2013).
  • Rudolph et al. [2015] J. Rudolph, W. Herr, C. Grzeschik, T. Sternke, A. Grote, M. Popp, D. Becker, H. Müntinga, H. Ahlers, A. Peters, et al., A high-flux bec source for mobile atom interferometers, New Journal of Physics 17, 065001 (2015).
  • Hung et al. [2008] C.-L. Hung, X. Zhang, N. Gemelke, and C. Chin, Accelerating evaporative cooling of atoms into Bose-Einstein condensation in optical traps, Phys. Rev. A 78, 011604(R) (2008).
  • Roy et al. [2016] R. Roy, A. Green, R. Bowler, and S. Gupta, Rapid cooling to quantum degeneracy in dynamically shaped atom traps, Physical Review A 93, 043403 (2016).
  • Phelps et al. [2020] G. A. Phelps, A. Hébert, A. Krahn, S. Dickerson, F. Öztürk, S. Ebadi, L. Su, and M. Greiner, Sub-second production of a quantum degenerate gas, arXiv preprint arXiv:2007.10807  (2020).
  • Shahriari et al. [2015] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, Taking the human out of the loop: A review of Bayesian optimization, Proceedings of the IEEE 104, 148 (2015).
  • Hutzler et al. [2017] N. R. Hutzler, L. R. Liu, Y. Yu, and K.-K. Ni, Eliminating light shifts for single atom trapping, New Journal of Physics 19, 023007 (2017).
  • Urvoy et al. [2019] A. Urvoy, Z. Vendeiro, J. Ramette, A. Adiyatullin, and V. Vuletić, Direct laser cooling to Bose-Einstein condensation in a dipole trap, Phys. Rev. Lett. 122, 203202 (2019).
  • Kasevich and Chu [1992] M. Kasevich and S. Chu, Laser cooling below a photon recoil with three-level atoms, Physical Review Letters 69, 1741 (1992).
  • Snoek et al. [2015] J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. M. A. Patwary, Prabhat, and R. P. Adams, Scalable bayesian optimization using deep neural networks (2015), arXiv:1502.05700 [stat.ML] .
  • Starkey et al. [2013] P. T. Starkey, C. J. Billington, S. P. Johnstone, M. Jasperse, K. Helmerson, L. D. Turner, and R. P. Anderson, A scripted control system for autonomous hardware-timed experiments, Review of Scientific Instruments 84, 085111 (2013).
  • Weidemüller et al. [1994] M. Weidemüller, T. Esslinger, M. A. Ol’shanii, A. Hemmerich, and T. W. Hänsch, A novel scheme for efficient cooling below the photon recoil limit, EPL (Europhysics Letters) 27, 109 (1994).
  • Boiron et al. [1995] D. Boiron, C. Triché, D. R. Meacher, P. Verkerk, and G. Grynberg, Three-dimensional cooling of cesium atoms in four-beam gray optical molasses, Phys. Rev. A 52, R3425 (1995).
  • Lett et al. [1988] P. D. Lett, R. N. Watts, C. I. Westbrook, W. D. Phillips, P. L. Gould, and H. J. Metcalf, Observation of atoms laser cooled below the Doppler limit, Physical review letters 61, 169 (1988).
  • Dalibard and Cohen-Tannoudji [1989] J. Dalibard and C. Cohen-Tannoudji, Laser cooling below the Doppler limit by polarization gradients: simple theoretical models, JOSA B 6, 2023 (1989).
  • Vuletić et al. [1999] V. Vuletić, A. J. Kerman, C. Chin, and S. Chu, Observation of low-field feshbach resonances in collisions of cesium atoms, Physical review letters 82, 1406 (1999).
  • [34] See Supplemental Material at https://doi.org/10.48550/arXiv.2205.08057 for a movie of the atomic cloud during the cooling sequence.
  • Burnett et al. [1996] K. Burnett, P. S. Julienne, and K.-A. Suominen, Laser-Driven Collisions between Atoms in a Bose-Einstein Condensed Gas, Physical Review Letters 77, 1416 (1996).
  • Dalfovo et al. [1999] F. Dalfovo, S. Giorgini, L. P. Pitaevskii, and S. Stringari, Theory of Bose-Einstein condensation in trapped gases, Reviews of Modern Physics 71, 463 (1999).