Samudra: An AI Global Ocean Emulator for Climate

Abstract

AI emulators for forecasting have emerged as powerful tools that can outperform conventional numerical predictions. The next frontier is to build emulators for long climate simulations with skill across a range of spatiotemporal scales, a particularly important goal for the ocean. Our work builds a skillful global emulator of the ocean component of a state-of-the-art climate model. We emulate key ocean variables, sea surface height, horizontal velocities, temperature, and salinity, across their full depth. We use a modified ConvNeXt UNet architecture trained on multi-depth levels of ocean data. We show that the ocean emulator – Samudra – which exhibits no drift relative to the truth, can reproduce the depth structure of ocean variables and their interannual variability. Samudra is stable for centuries and 150 times faster than the original ocean model. Samudra struggles to capture the correct magnitude of the forcing trends and simultaneously remain stable, requiring further work.

\draftfalse\journalname

Geophysical Research Letters

Courant Institute of Mathematical Sciences, New York University Program in Atmospheric and Oceanic Sciences, Princeton University Center for Data Science, New York University Lamont Doherty Earth Observatory, Columbia University

\correspondingauthor

Surya [email protected]

{keypoints}

We develop a global, 3D, ocean autoregressive machine learning emulator for climate studies.

The emulator, based on a UNet architecture, is stable for centuries, producing accurate climatologies and variability of ocean variables.

The emulator training is robust to changes in seeds and initial conditions in the data.

Plain Language Summary

AI tools are extremely effective in making fast and accurate predictions on weather to seasonal timescales. Capturing decadal to centennial changes, which arise from ocean dynamics, remains an outstanding challenge. We built an advanced AI model called “Samudra” to simulate global ocean behavior. Samudra is trained on simulated data from a state-of-the-art ocean climate model and predicts key ocean features such as sea surface height, currents, temperature, and salinity throughout the ocean’s depth. Samudra can accurately recreate patterns in ocean variables, including year-to-year changes. It is stable over centuries and is 150 times faster than traditional ocean models. However, Samudra still faces challenges in balancing stability with accurately predicting the effects of external factors (like climate trends), and further improvements are needed to address this limitation.

1 Introduction

The recent success of emulators for components of the climate system, primarily the atmosphere, continues to produce remarkable outcomes, achieving state-of-the-art performance for weather prediction tasks [Kochkov \BOthers. (\APACyear2024), Bi \BOthers. (\APACyear2023), Price \BOthers. (\APACyear2023)] and promising results reproducing climate models over decadal [Cachay \BOthers. (\APACyear2024)] to multi-decadal timescales [Watt-Meyer \BOthers. (\APACyear2023)].

Existing work on ocean emulation has mainly been limited to the surface and upper ocean, or to steady forcing. Several works focusing on surface ocean variables show results for timescales of years to a decade [Subel \BBA Zanna (\APACyear2024), Dheeshjith \BOthers. (\APACyear2024), Gray \BOthers. (\APACyear2024)]. Emulators that include subsurface information have focused on the weekly to decadal timescales and at most the upper 1000 $\mathrm{m}$ [Xiong \BOthers. (\APACyear2023), Guo \BOthers. (\APACyear2024), Holmberg \BOthers. (\APACyear2024), Patel \BOthers. (\APACyear2024), Arcomano \BOthers. (\APACyear2023)]. \citeAbire2023ocean explored longer timescales within a simplified ocean model with idealized steady forcing. Finally, a seasonal coupled atmosphere-ocean emulator has shown promising results, considering the upper 300 $\mathrm{m}$ of the ocean [Wang \BOthers. (\APACyear2024)]. These ocean and atmosphere emulators have been used for seasonal forecasts based on reanalysis data, and to build surrogates of numerical models.

Emulators of traditional numerical climate models leverage the computational efficiency of machine learning approaches to reduce the often prohibitive computational cost of running a large number of simulations on the original (usually CPU-based) climate model. One of the main benefits of emulators is the ability to run large ensembles. Such ensembles can be used to probe the likelihood of extreme events, explore the climate response to a range of forcing scenarios (e.g., greenhouse gases), and facilitate the development of numerical models by reducing the number of perturbed parameter experiments typically used for calibration [Maher \BOthers. (\APACyear2021), Mahesh \BOthers. (\APACyear2024)]. Emulators can also accelerate the spin-up integration of numerical models or replace full model components in a coupled setting [Khatiwala (\APACyear2024)]. Finally, emulators can help with data assimilation, replacing an expensive numerical model with a fast surrogate to generate affordable ensembles or an approximate adjoint, maintaining accuracy with reduced cost [Manshausen \BOthers. (\APACyear2024)].

Our goal here is to reproduce the full-depth ocean state for four 3D and one 2D prognostic variables, using a time-dependent realistic atmospheric forcing as input, extending the work of \citeAsubel2024building,dheeshjith2024transfer. At rollout lengths of nearly a decade, our emulator shows considerable skill across several key diagnostics (mean and variance) when compared to the parent numerical model output, which is our ground truth. In particular, both the temperature structure as a function of depth and the El Niño-Southern Oscillation (ENSO) variability are well reproduced by the emulator.

Simultaneously capturing variables with vastly different timescales, such as velocity (which can contain fast fluctuations) and salinity (which typically fluctuates more slowly), is an outstanding issue for long integrations (already encountered by \citeAsubel2024building). To alleviate this problem, we introduce an additional emulator by focusing on the thermodynamic variables (i.e. potential temperature and salinity only). This additional emulator captures the slowly varying changes in potential temperature and salinity on timescales of decades to centuries.

We show that our emulator can retain skill and remain stable for centuries for experiments equivalent to both control and climate-change simulations. However, we also note that this stability is accompanied by a weak response to climate-change forcing. This work demonstrates (to our knowledge) the first ocean emulator capable of reproducing the full-depth (from the surface down to the ocean floor) ocean temperature structure and its variability, while running for multiple centuries in a realistic configuration with time-dependent forcing.

The paper is organized as follows. We discuss the data and all emulator details in Section 2. We explore the properties of the trained emulator on a test dataset and report several multi-decadal experiments with a range of climate forcing in Section 3. We present our conclusions in Section 4.

2 Methods

We built an autoregressive ocean emulator from data generated by a state-of-the-art numerical ocean simulation. Below, we describe the data, the emulator, the architecture, and the training and evaluation of the emulator.

2.1 Data

The data was generated by OM4, [Adcroft \BOthers. (\APACyear2019)], an ocean general circulation model that is the ocean component of the state-of-the-art coupled climate model CM4 [Held \BOthers. (\APACyear2019)]. The circulation model was initialized with hydrography from the World Ocean Atlas [Levitus \BOthers. (\APACyear2015)] and forced with atmospheric reanalysis, following the OMIP-2 protocol, with version 1.4 of the JRA reanalysis [Tsujino \BOthers. (\APACyear2020)]. The model was run for 65 years (1958-2022).

The ocean prognostic variables are potential temperature ( $\theta_{O}$ ), salinity ( $S$ ), sea surface height ( $\operatorname{SSH}$ ), oceanic zonal ( $u$ ), and meridional ( $v$ ) velocity components. The circulation model has 75 degrees of freedom in the vertical for each 3D prognostic variable, which we conservatively remap onto 19 fixed-depth levels of variable thickness - [2.5, 10, 22.5, 40, 65, 105, 165, 250, 375, 550, 775, 1050, 1400, 1850, 2400, 3100, 4000, 5000, 6000] $\mathrm{m}$ to reduce the data size. We also conservatively coarsen the data in time using a 5-day simple average in geopotential coordinates, averaging over the fastest waves resolved by the circulation model (which originally used a 20-minute time-step).

The native horizontal grid for the data has a nominal resolution of $1/4^{\circ}$ , but is curvilinear and has three poles (grid singularities) inland. We further post-process by filtering with an $18\times 18$ cell Gaussian kernel using the gcm-filters package [Loose \BOthers. (\APACyear2022)], and then conservatively interpolate onto a $1^{\circ}\times 1^{\circ}$ global geographic (latitude-longitude) grid using the xESMF package [Zhuang \BOthers. (\APACyear2023)]. Values in land are treated as missing, and missing values are imputed with zeros. Before conservative spatial interpolation, we interpolate the velocities to the center of each cell using the xGCM package [Abernathey \BOthers. (\APACyear2022)] and rotate the velocity vectors so that u and v indicate purely zonal and meridional flow, respectively.

2.2 Ocean Emulator

The variables in the ocean emulator are:

1.

The ocean state $\boldsymbol{\Phi}=(\theta_{O},S,\operatorname{SSH},u,v)$ , which includes all 19 depth levels. We denote the subset of thermodynamics variables as $\boldsymbol{\Phi}_{\text{thermo}}=(\theta_{O},S,\operatorname{SSH})$ , as opposed to the dynamic variables $\boldsymbol{\Phi}_{\text{dynamic}}=(u,v)$ .
2.

Atmosphere boundary conditions $\boldsymbol{\tau}=(\tau_{u},\tau_{v},\operatorname{Q},\operatorname{Q}_{anom})$ , which consist of the zonal, $\tau_{u}$ , and meridional, $\tau_{v}$ , surface ocean stress, and net heat flux downward across the ocean surface $\operatorname{Q}$ (below the sea-ice) and its anomalies $\operatorname{Q}_{anom}$ . The net heat flux is a sum of the short- and long-wave radiative fluxes, sensible and latent heating, heat content of mass transfer, and heat flux due to frazil formation (see K4 and K5 of \citeAgriffies_omip_2016 for a precise definition of the variable ” $\operatorname{hfds}$ ”). The heat flux anomalies are calculated by removing the climatological heat flux computed over the 65-year OM4 dataset.

Our emulator, $\mathcal{F}$ , is built to autoregressively produce multiple future oceanic states given multiple previous oceanic states. Specifically, we use a 2-input - 2-output model configuration. Mathematically,

\displaystyle\tilde{\boldsymbol{\Phi}}_{t+(n+1)\Delta t},\tilde{\boldsymbol{\Phi}}_{t+(n+2)\Delta t}=\mathcal{F}(\tilde{\boldsymbol{\Phi}}_{t+(n-1)\Delta t},\tilde{\boldsymbol{\Phi}}_{t+n\Delta t},\boldsymbol{\tau}_{t+n\Delta t})

(1)

where $n$ is a positive integer and $\tilde{\boldsymbol{\Phi}}$ represents the ocean state predicted by the emulator at time $t$ . A depth-varying land mask is used to set land cells in the model output to zero. We use OM4 ocean states, $\boldsymbol{\Phi}_{t}$ and $\boldsymbol{\Phi}_{t-\Delta t}$ , along with the corresponding atmospheric forcing, $\boldsymbol{\tau}_{t}$ , to produce the first predictions. Subsequent ocean states are recursively produced by using previously generated ocean states as input. We illustrate the rollout process of the emulator in Figure 1a). The use of multiple input states provides additional context to the emulator, similar to the use of model time tendencies in PDE-based numerical integrations. In all of our experiments, $\Delta t=5~{}\mathrm{days}$ .

2.3 Architecture

The emulator is based on the ConvNeXt UNet architecture from [Dheeshjith \BOthers. (\APACyear2024)], where the core blocks of a UNet [Ronneberger \BOthers. (\APACyear2015)] are inspired by ConvNeXt blocks [Liu \BOthers. (\APACyear2022)] adapted from [Karlbauer \BOthers. (\APACyear2023)]. The UNet implements downsampling based on average pooling and upsampling based on bilinear interpolation, which enables it to learn features at multiple scales. Each ConvNext block includes GeLU activations, increased dilation rates, and inverted channel bottlenecks. We did not use inverted channel depths and replaced the large $7\times 7$ kernels with $3\times 3$ kernels. We use batch normalization instead of layer normalization, as it yielded better skill. The encoder and decoder consist of four ConvNeXt blocks, each with channel widths [200, 250, 300, 400]. The dilation rates used for both the encoder and decoder are [1, 2, 4, 8]. Additionally, we include a single ConvNext block (with channel width 400 and dilation 8) in the deepest section of the UNet before upsampling. The total number of model parameters is 135M. We apply periodic (or circular) padding in the longitudinal direction and zero padding at the poles as in [Dheeshjith \BOthers. (\APACyear2024)].

The architecture is modified from \citeAdheeshjith2024transfer to process multiple ocean depth levels (as opposed to surface only). In the surface ocean emulator, which contains only a single depth level, each channel is associated with a variable. In the multi-depth ocean emulator, each channel is associated with a variable and a depth level. Our main emulator $\mathcal{F}_{\text{thermo+dynamic}}$ takes as input four 19-level oceanic variables ( $\theta_{O},S,u,v$ ), the surface variable $\operatorname{SSH}$ and four atmospheric boundary conditions ( $\tau_{u},\tau_{v},\operatorname{Q},\operatorname{Q}_{anom}$ ). It produces five output variables ( $\theta_{O},S,\operatorname{SSH},u,v$ ). As discussed above, we use a 2-input 2-output model configuration and thus, there are $(4\times 19+1)\times 2+4=158$ input and $(4\times 19+1)\times 2=154$ output channels. In addition, we build another emulator $\mathcal{F}_{\text{thermo}}$ that only uses the thermodynamic variables, $\boldsymbol{\Phi}_{\text{thermo}}=(\theta_{O},S,\operatorname{SSH})$ .

2.4 Training Details

We illustrate the training of the model in Figure 1a). We train the emulators using 2900 data samples corresponding to the range 1975-01-03 to 2014-09-20 with the last 50 samples used for validation. Each sample is a 5-day mean of the full ocean state and atmospheric boundary conditions.

Refer to caption — Figure 1: a) Schematic of the model training process, illustrating the mapping from input (ocean states and atmospheric forcing) to output (ocean states rolled out over several time steps). Initially, the ground-truth ocean states, $\boldsymbol{\Phi}_{t}$ and $\boldsymbol{\Phi}_{t-\Delta t}$ , along with the atmospheric forcing, $\boldsymbol{\tau}_{t}$ , are provided as inputs to predict $\boldsymbol{\tilde{\Phi}}_{t+\Delta t}$ and $\boldsymbol{\tilde{\Phi}}_{t+2\Delta t}$ . Predictions, along with ground-truth atmospheric forcing, are then used as inputs for future steps in the unrolling process. b) Time-averaged potential temperature ( $\theta_{O}$ ) depth-latitude profiles over the 8-year test set, comparing the ground truth OM4 (left) and predictions from $\mathcal{F}_{\text{thermo}}$ (middle) and $\mathcal{F}_{\text{thermo+dynamic}}$ (right). c) RMSE of 8-year test set predictions for different initial conditions of the emulators, $\mathcal{F}_{\text{thermo}}$ and $\mathcal{F}_{\text{thermo+dynamic}}$ . Grey dots represent an RMSE instance of a single rollout, including runs from training on 5 unique model seeds per emulator and 2 additional rollouts initialized at states 6 months apart. Horizontal lines indicate the respective mean RMSE. RMSE is calculated over the common periods of each rollout.

We ignore data over 1958-1975 due to the excessive model cooling, while it adjusts from the warm initial conditions. This cooling does not reflect the forcing but rather an interior ocean model adjustment (see \citeAsane2023parameterizing and S3). Note that some regions are still cooling post-1975 in this simulation, which biased some of our testing (see results).

The loss function used for optimization is

\displaystyle\mathcal{L}_{t}=\sum_{n=1}^{PN}\frac{1}{C~{}Y~{}X}\sum_{j=1}^{C}\sum_{k=1}^{Y}\sum_{l=1}^{X}\left(\boldsymbol{\tilde{\Phi}}_{t+n\Delta t}^{[j,k,l]}-\boldsymbol{\Phi}_{t+n\Delta t}^{[j,k,l]}\right)^{2}.

(2)

$\mathcal{L_{\mathrm{t}}}$ is the total mean square error (MSE) loss function at time step t, where $P$ corresponds to the total number of input/output states used by the model in a single step, $N$ is the total number of recurrent passes, $C$ , $Y$ and $X$ are the total number of output channels, height and width, respectively, of a single output state. Here, we set $P=2$ to obtain a 2-input 2-output model configuration and $N=4$ steps.

We use the Adam optimizer with a learning rate of $2e-4$ , which decays to zero using a Cosine scheduler. Our emulators are trained using 4 80GB A100 GPUs for 15 and 12 hours for the models $\mathcal{F}_{\text{thermo+dynamic}}$ and $\mathcal{F}_{\text{thermo}}$ respectively, with a total batch size of 16.

2.5 Evaluation

To evaluate the emulators, we take our initial conditions from 2014-09-30 and produce an 8-year rollout using the corresponding atmospheric forcing. We compare the output from this rollout to held-out OM4 data to evaluate the emulator skill. In addition, we produce longer runs to assess the emulator’s response, similar to control simulations, with arbitrarily long rollouts. The emulator is forced with atmospheric boundary conditions taken from 1990-2000, with a repeat 10-year cycle. This period is chosen specifically because it has a near-zero globally integrated heat flux forcing, which ensures minimal ocean drift. We also performed a 100-year and a 400-year control run (see SI).

We produce predictions using both $\mathcal{F}_{\text{thermo+dynamic}}$ and $\mathcal{F}_{\text{thermo}}$ . All evaluations use a single 40GB A100 GPU. For each year of rollout, $\mathcal{F}_{\text{thermo+dynamic}}$ and $\mathcal{F}_{\text{thermo}}$ take about 90.52 $\mathrm{s}$ and 47.2 $\mathrm{s}$ , respectively. Thus, for the faster emulator, a century rollout takes approximately 1.3 hours. $\mathcal{F}_{\text{thermo}}$ takes approximately half the time to produce the same number of states in the rollout compared to $\mathcal{F}_{\text{thermo+dynamic}}$ .

3 Results

3.1 Full-depth Global Ocean Emulator

We begin by evaluating the emulators $\mathcal{F}_{\text{thermo+dynamic}}$ and $\mathcal{F}_{\text{thermo}}$ against the ground truth to establish a baseline skill. Capturing the full-depth climatological profiles of potential temperature and salinity is a key target of ocean numerical climate models in general and, therefore, a key target for our ocean climate emulators. The structure of the zonal mean of potential temperature (Figure 1b) is captured by the two emulators, demonstrating significant skill at reproducing the profile from OM4 (see S6 for salinity structure). The average mean absolute error (MAE) is 5.7 $\times{10^{-3}}~{}^{\circ}C$ for $\mathcal{F}_{\text{thermo+dynamic}}$ and 4.5 $\times{10^{-3}}~{}^{\circ}C$ for $\mathcal{F}_{\text{thermo}}$ , with a pattern correlation of roughly .99 for both emulators. The outputs show a robust thermocline structure, subtropical gyres, and a region of North Atlantic deep water formation. However, both emulators in the northern hemisphere show too warm and too salty high latitudes (around 55N), too cold and too fresh mid-latitudes, and Arctic signals down to 750 $\mathrm{m}$ depth (Figures S2 and S7). The biases are consistent with underestimating the northward heat transport by the ocean. The potential temperature and salinity biases in the Southern Ocean for the $\mathcal{F}_{\text{thermo+dynamic}}$ emulator are reminiscent of residual transport changes, with opposite signed biases in the Southern Ocean and in the region north of it. The $\mathcal{F}_{\text{thermo}}$ emulator is warmer than $\mathcal{F}_{\text{thermo+dynamic}}$ , at most depths (Fig. S2).

We performed several experiments to test the sensitivity of the emulators to different training choices. The emulators’ skill is unchanged when using different seeds and start dates, so the trained models are statistically reproducible. We measure robustness by calculating the root mean square error (RMSE) of rollouts with 5 different seeds and rollouts initialized with ocean states taken 6 months apart. The RMSEs show little variance across the different trained models (Fig. 1c). The standard deviation of the RMSEs across training seeds in the emulators $\mathcal{F}_{\text{thermo}}$ and $\mathcal{F}_{\text{thermo+dynamic}}$ are 0.0033 and 0.00225, respectively.

The potential-temperature timeseries at 2.5 $\mathrm{m}$ and 775 $\mathrm{m}$ (Figure 2a) are further indicators that both emulators capture the climatological means and the upper ocean response to variable atmospheric forcing. The standard deviation of the 2.5 $\mathrm{m}$ potential temperature for OM4, and the emulators $\mathcal{F}_{\text{thermo}}$ and $\mathcal{F}_{\text{thermo+dynamic}}$ are 6.8 $\times{10^{-2}}~{}^{\circ}C$ , 4.35 $\times{10^{-2}}~{}^{\circ}C$ and 5.26 $\times{10^{-2}}~{}^{\circ}C$ respectively, while the standard deviations of the 775 $\mathrm{m}$ potential temperature are 2.3 $\times{10^{-3}}~{}^{\circ}C$ , 1.0 $\times{10^{-3}}~{}^{\circ}C$ and 2.1 $\times{10^{-3}}~{}^{\circ}C$ , respectively. The emulators capture a large portion of the variability, but with some biases (Fig. 2b). The standard deviations are calculated after removing both the trend and the climatology from the timeseries (See Figure S8 for additional timeseries of potential temperature, along with salinity, zonal velocity, and meridional velocity, and Figure S10 for bias maps).

The emulator can skillfully emulate El Niño-Southern Oscillation (ENSO) response in both warm and cold phases (Figure 2b) and S11). The smallest fluctuations in the Nino 3.4 timeseries are the hardest for the emulators to capture. The emulator responses are in phase with OM4 for all years shown, but the amplitude is altered. $\mathcal{F}_{\text{thermo+dynamic}}$ exhibits higher skill than $\mathcal{F}_{\text{thermo}}$ in capturing the magnitude of ENSO events. We hypothesized that providing the velocities, whose data contain shorter time-scales and larger variability, helps the emulator produce larger ENSO events. $\mathcal{F}_{\text{thermo}}$ still manages to detect the correct phase and structure (Figure 2 b,d) despite producing events with smaller magnitudes, both at the surface and in the upper ocean. The emulators capture the deepening and shoaling of the equatorial thermocline from equatorial Kelvin waves for the strongest events (Figure 2 d, e). The magnitude of subsurface anomalies for the emulators is weaker than for OM4. For the Nino 3.4 timeseries (Figure 2 b), the MAE is 0.0077 $~{}^{\circ}C$ for $\mathcal{F}_{\text{thermo+dynamic}}$ and 0.0124 $~{}^{\circ}C$ for $\mathcal{F}_{\text{thermo}}$ , with correlations of 0.905 and 0.7017, respectively. For the ENSO profiles (Figure 2 (c)-(e)), the MAE is 0.01 $~{}^{\circ}C$ and 0.07 $~{}^{\circ}C$ for the emulators $\mathcal{F}_{\text{thermo+dynamic}}$ and $\mathcal{F}_{\text{thermo}}$ respectively, and their pattern correlations are 0.976 and 0.973, respectively.

For the ocean emulator $\mathcal{F}_{\text{thermo+dynamic}}$ that uses all variables, we noticed that the potential temperature and salinity fields exhibit atypically high spatial variability, with scales more characteristic of velocity so we posit that this results from using velocity inputs. See Figures S16-S17 for maps of variability for our emulators. This result is consistent with \citeAsubel2024building. We hypothesize that this may arise from the large separation in timescales and variability between velocity and potential temperature in the ocean.

Finally, despite capturing the mean and climatology of ocean variables, the emulators struggle to capture the magnitude of the small, but systematic potential temperature trends (Figure S1 global mean $10^{-3}~{}^{\circ}C/yr$ ) over the same 8-year period (Figure 2a and S1, S3); for most depths the trained models underestimate trends by 20% to 50% relative to OM4. Of the two emulators, $\mathcal{F}_{\text{thermo}}$ has higher skill in capturing the global heat changes (Figure S9). The salinity trends in OM4 are weak, due to the small forcing, and to the use of salinity restoring boundary conditions. For both emulators, the trends are 7-8 orders of magnitude less than the mean value, consistent with the numerical representation of variables within the learned models, suggesting that the models conserve properties of the OM4 data although strict conservation is not imposed (Figures S4-S5).

3.2 Long-term stability

We also evaluated, the ability of the emulators to produce long control experiments, without retraining. For these experiments, we use repeat boundary conditions over 10 years (described in Section 2.5) chosen to contribute a near-zero net heat flux, allowing the emulators to run for arbitrarily long periods of time while minimizing potential temperature drift.

Both emulators converge to an equilibrium, maintaining a global mean potential temperature close to OM4 throughout a century of integration (Figure 3a). The global mean temperatures are 3.225 $~{}^{\circ}C/yr$ for $\mathcal{F}_{\text{thermo}}$ and 3.215 $~{}^{\circ}C/yr$ for $\mathcal{F}_{\text{thermo+dynamic}}$ , compared to 3.219 $~{}^{\circ}C/yr$ for OM4. In addition, $\mathcal{F}_{\text{thermo+dynamic}}$ over-predicts the variability in potential temperature, likely extrapolating some fast dynamics via the velocities variables. This issue is exacerbated in the deeper layers of the ocean, which have little variability in the original dataset. The temperature structure is again well preserved for the long rollouts (Figure 3b), with different structures in potential temperature biases (S12) than for the 8-year test data (S2).

We examine the emulators’ skill in reproducing variability over these long timescales. Since we are reusing the same 10-year cycle to drive the emulator, we expected some persistent features to appear when looking at a phenomenon such as the response to ENSO. Although both emulators can produce appropriate Nino 3.4 anomalies for the entire century rollout (Figure 3c) and S13), $\mathcal{F}_{\text{thermo+dynamic}}$ shows stronger peak-to-peak amplitude, but little cycle-to-cycle variability - perhaps due to the strong coupling of velocity with the wind stress forcing, whereas $\mathcal{F}_{\text{thermo}}$ shows more aperiodic variability across years.

To further test stability, we generate a 400-year rollout, with an identical forcing setup as for the century-long run. Both emulators remain stable (Figure S15). $\mathcal{F}_{\text{thermo}}$ has the added benefit of exhibiting long-term aperiodic variability in potential temperature and salinity, despite the repeat forcing, across the centuries. The long experiments were reproduced using a repeat forcing period from the test set i.e. 2014-2022, producing similar results (Figure S19).

4 Discussion

We produce a computationally cheap machine-learning (ML) emulator of a state-of-the-art ocean model, namely OM4 [Adcroft \BOthers. (\APACyear2019)]. The ML architecture consists of a modified ConvNeXt UNet [Dheeshjith \BOthers. (\APACyear2024)]. The reduced order model – Samudra – predicts key ocean variables, sea surface height, temperature, and salinity, across the full depth of the world oceans while remaining stable for centuries. Integrating OM4 for 100 years takes approximately 8 days using 4,671 CPU cores, whereas our fastest (thermo) emulator completes the same task in about 1.3 hours on a single 40GB A100 GPU. This represents approximately a 150x increase in SYPD (simulated years per day) for Samudra compared to OM4. Some of this speed up can be attributed to Samudra: i) using a 5 day time step (vs. 15 minutes in OM4); ii) operating on a spatially coarser grid. However, we note that Samudra makes predictions with the implicit spatial skill of the finer resolution OM4.

The emulator performs well on a range of metrics related to the model climatology and its variability on the test set and long control simulations. The emulator produces accurate climatologies over the last 8 years of the OM4 simulations and is robust to changes in seeds and initial conditions. Furthermore, it can capture variability (e.g., ENSO response to forcing). Therefore, these emulators could be used to study the contemporary ocean and climate at a significant reduction in cost compared to OM4.

The emulator, however, struggles to capture trends under a range of surface heat flux forcings (see Supporting Information), similarly to the surface emulators in \citeAdheeshjith2024transfer. We performed idealized forced experiments using the same repeated atmospheric forcing generated for the control experiment and a spatially uniform linear forcing of varying magnitudes for the surface heat flux. Figure S16 showcases the ocean heat content trends predicted by $\mathcal{F}_{\text{thermo}}$ under linear surface heat flux increases of 1, 0.5, 0.25, and 0 $W/m^{2}$ . The patterns of ocean heat uptake are reminiscent of ocean-only and coupled forced numerical experiments [Todd \BOthers. (\APACyear2020), Couldrey \BOthers. (\APACyear2020)], with dipole patterns in the Southern Ocean and North Atlantic sinking region (Figure S14). However, the magnitude of change is too weak compared to the forcing (Figure S16). Similar behavior of weak generalization under climate change is also observed in the atmosphere climate emulator, ACE [Watt-Meyer \BOthers. (\APACyear2023)], but improved when a slab ocean model is added [Clark \BOthers. (\APACyear2024)].

Here, we could not produce an emulator that simultaneously captures the trends in the test data and remain stable for centuries. Further work is needed to explore the reasons for the issues and would require new numerical simulations.

The lack of generalization reflected in the weak warming trends could be due to the training data. The effects of an initial drift can be alleviated by pruning years 1958 to 1975 from the training data, which removes the bulk of this adjustment period. Yet, different depths and regions adjust more slowly, and some of this continued adjustment may remain in the data since the time scale of equilibration of the model is hundreds of years. Another reason for the trend bias could be the forcing datasets. The atmospheric forcing imposed on the ocean implicitly results from the real ocean-atmosphere coupling. Therefore, the atmospheric forcing has felt a changing ocean circulation, particularly in the North Atlantic [Chemke \BOthers. (\APACyear2020)]. The resulting effect is that the “forcing” applied to the ocean emulator is not entirely decoupled from the ocean response, potentially leading to some biases in the response, as in \citeAtodd2020ocean,couldrey2020causes,zanna2019global. We alleviated these issues by adding an extra forcing input, namely the cumulative heat forcing, which led to a more skillful model capable of capturing the global warming trend. However, this model was unstable under climate-change forcing past 50 years. Alternatively, it is possible that learning to predict the model state directly may not be optimal. We explored learning tendencies, which improved performance for the warming trends but, again, was unstable over long timescales. A challenge going forward is designing faithful emulators capable of capturing trends while remaining stable in long rollouts.

Despite the limited response to future climate forcing, Samudra is skillful at emulating the contemporary ocean and is therefore an affordable emulation of expensive ocean circulation models. Without further modification, Samudra could be used in studies requiring large ensembles (e.g., uncertainty quantification, extreme events) or to enhance and accelerate operational applications (e.g., data assimilation). More opportunities emerge if we consider refining training for Samudra, e.g., to revised versions of OM4 or to other models, which could greatly accelerate climate model development by allowing evaluations of long, yet affordable, rollouts. This includes coupling Samudra with ACE [Watt-Meyer \BOthers. (\APACyear2023)] to emulate CM4.

Open Research Section

The code for training the models along with generating rollouts and plots is available on GitHub at https://github.com/m2lines/Samudra, while the model weights and data are hosted on Hugging Face at https://huggingface.co/M2LInES/Samudra and https://huggingface.co/datasets/M2LInES/Samudra-OM4, respectively. The code is also version tagged and archived at https://doi.org/10.5281/zenodo.15037462 via zenodo.

Acknowledgements.

This research received support through Schmidt Sciences, LLC, under the M²LInES project. We thank all members of the M²LInES team for helpful discussions and their support throughout this project. We gratefully acknowledge Karthik Kashinath and the NVIDIA team for providing us access to NERSC resources, which were instrumental in supporting this work. This research was also supported in part through the NYU IT High Performance Computing resources, services, and staff expertise. We also thank the reviewers for their useful comments.

References

Abernathey \BOthers. (\APACyear2022) \APACinsertmetastarxGCM{APACrefauthors}Abernathey, R\BPBIP., Busecke, J\BPBIJ\BPBIM., Smith, T\BPBIA., Deauna, J\BPBID., Banihirwe, A., Nicholas, T.\BDBLThielen, J. \APACrefYearMonthDay2022\APACmonth11. \APACrefbtitlexgcm. xgcm. \APACaddressPublisherZenodo. {APACrefURL} https://doi.org/10.5281/zenodo.7348619 {APACrefDOI} 10.5281/zenodo.7348619 \PrintBackRefs\CurrentBib
Adcroft \BOthers. (\APACyear2019) \APACinsertmetastaradcroft_2019{APACrefauthors}Adcroft, A., Anderson, W., Balaji, V., Blanton, C., Bushuk, M., Dufour, C\BPBIO.\BDBLZhang, R. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleThe GFDL Global Ocean and Sea Ice Model OM4.0: Model Description and Simulation Features The GFDL Global Ocean and Sea Ice Model OM4.0: Model Description and Simulation Features.\BBCQ \APACjournalVolNumPagesJournal of Advances in Modeling Earth Systems11103167–3211. {APACrefURL} https://onlinelibrary.wiley.com/doi/abs/10.1029/2019MS001726 {APACrefDOI} 10.1029/2019MS001726 \PrintBackRefs\CurrentBib
Arcomano \BOthers. (\APACyear2023) \APACinsertmetastararcomano2023hybrid{APACrefauthors}Arcomano, T., Szunyogh, I., Wikner, A., Hunt, B\BPBIR.\BCBL \BBA Ott, E. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleA hybrid atmospheric model incorporating machine learning can capture dynamical processes not captured by its physics-based component A hybrid atmospheric model incorporating machine learning can capture dynamical processes not captured by its physics-based component.\BBCQ \APACjournalVolNumPagesGeophysical Research Letters508e2022GL102649. \PrintBackRefs\CurrentBib
Bi \BOthers. (\APACyear2023) \APACinsertmetastarbi2023accurate{APACrefauthors}Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X.\BCBL \BBA Tian, Q. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleAccurate medium-range global weather forecasting with 3D neural networks Accurate medium-range global weather forecasting with 3d neural networks.\BBCQ \APACjournalVolNumPagesNature6197970533–538. \PrintBackRefs\CurrentBib
Bire \BOthers. (\APACyear2023) \APACinsertmetastarbire2023ocean{APACrefauthors}Bire, S., Lütjens, B., Azizzadenesheli, K., Anandkumar, A.\BCBL \BBA Hill, C\BPBIN. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleOcean emulation with fourier neural operators: Double gyre Ocean emulation with fourier neural operators: Double gyre.\BBCQ \APACjournalVolNumPagesAuthorea Preprints. \PrintBackRefs\CurrentBib
Cachay \BOthers. (\APACyear2024) \APACinsertmetastarcachay2024probabilistic{APACrefauthors}Cachay, S\BPBIR., Henn, B., Watt-Meyer, O., Bretherton, C\BPBIS.\BCBL \BBA Yu, R. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleProbabilistic Emulation of a Global Climate Model with Spherical DYffusion Probabilistic emulation of a global climate model with spherical dyffusion.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2406.14798. \PrintBackRefs\CurrentBib
Chemke \BOthers. (\APACyear2020) \APACinsertmetastarchemke2020identifying{APACrefauthors}Chemke, R., Zanna, L.\BCBL \BBA Polvani, L\BPBIM. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleIdentifying a human signal in the North Atlantic warming hole Identifying a human signal in the north atlantic warming hole.\BBCQ \APACjournalVolNumPagesNature communications1111540. \PrintBackRefs\CurrentBib
Clark \BOthers. (\APACyear2024) \APACinsertmetastarclark2024ace2{APACrefauthors}Clark, S\BPBIK., Watt-Meyer, O., Kwa, A., McGibbon, J., Henn, B., Perkins, W\BPBIA.\BDBLHarris, L\BPBIM. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleACE2-SOM: Coupling to a slab ocean and learning the sensitivity of climate to changes in CO $\_2$ Ace2-som: Coupling to a slab ocean and learning the sensitivity of climate to changes in co $\_2$ .\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2412.04418. \PrintBackRefs\CurrentBib
Couldrey \BOthers. (\APACyear2020) \APACinsertmetastarcouldrey2020causes{APACrefauthors}Couldrey, M\BPBIP., Gregory, J\BPBIM., Dias, F\BPBIB., Dobrohotoff, P., Domingues, C\BPBIM., Garuba, O.\BDBLothers \APACrefYearMonthDay2020. \BBOQ\APACrefatitleWhat causes the spread of model projections of ocean dynamic sea-level change in response to greenhouse gas forcing? What causes the spread of model projections of ocean dynamic sea-level change in response to greenhouse gas forcing?\BBCQ \APACjournalVolNumPagesClimate Dynamics1–33. \PrintBackRefs\CurrentBib
Dheeshjith \BOthers. (\APACyear2024) \APACinsertmetastardheeshjith2024transfer{APACrefauthors}Dheeshjith, S., Subel, A., Gupta, S., Adcroft, A., Fernandez-Granda, C., Busecke, J.\BCBL \BBA Zanna, L. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleTransfer Learning for Emulating Ocean Climate Variability across $CO\_2$ forcing Transfer learning for emulating ocean climate variability across $co\_2$ forcing.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2405.18585. \PrintBackRefs\CurrentBib
Gray \BOthers. (\APACyear2024) \APACinsertmetastargray2024long{APACrefauthors}Gray, M\BPBIA., Chattopadhyay, A., Wu, T., Lowe, A.\BCBL \BBA He, R. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleLong-term Prediction of the Gulf Stream Meander Using OceanNet: a Principled Neural Operator-based Digital Twin Long-term prediction of the gulf stream meander using oceannet: a principled neural operator-based digital twin.\BBCQ \APACjournalVolNumPagesEGUsphere20241–23. \PrintBackRefs\CurrentBib
Griffies \BOthers. (\APACyear2016) \APACinsertmetastargriffies_omip_2016{APACrefauthors}Griffies, S\BPBIM., Danabasoglu, G., Durack, P\BPBIJ., Adcroft, A\BPBIJ., Balaji, V., Böning, C\BPBIW.\BDBLYeager, S\BPBIG. \APACrefYearMonthDay2016\APACmonth09. \BBOQ\APACrefatitleOMIP contribution to CMIP6: experimental and diagnostic protocol for the physical component of the Ocean Model Intercomparison Project OMIP contribution to CMIP6: experimental and diagnostic protocol for the physical component of the Ocean Model Intercomparison Project.\BBCQ \APACjournalVolNumPagesGeoscientific Model Development993231–3296. {APACrefDOI} https://doi.org/10.5194/gmd-9-3231-2016 \PrintBackRefs\CurrentBib
Guo \BOthers. (\APACyear2024) \APACinsertmetastarguo2024orca{APACrefauthors}Guo, Z., Lyu, P., Ling, F., Luo, J\BHBIJ., Boers, N., Ouyang, W.\BCBL \BBA Bai, L. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions Orca: A global ocean emulator for multi-year to decadal predictions.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2405.15412. \PrintBackRefs\CurrentBib
Held \BOthers. (\APACyear2019) \APACinsertmetastarheld_2019{APACrefauthors}Held, I\BPBIM., Guo, H., Adcroft, A., Dunne, J\BPBIP., Horowitz, L\BPBIW., Krasting, J.\BDBLZadeh, N. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleStructure and Performance of GFDL’s CM4.0 Climate Model Structure and Performance of GFDL’s CM4.0 Climate Model.\BBCQ \APACjournalVolNumPagesJournal of Advances in Modeling Earth Systems11113691–3727. {APACrefURL} https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2019MS001829 {APACrefDOI} 10.1029/2019MS001829 \PrintBackRefs\CurrentBib
Holmberg \BOthers. (\APACyear2024) \APACinsertmetastarholmberg2024regional{APACrefauthors}Holmberg, D., Clementi, E.\BCBL \BBA Roos, T. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleRegional Ocean Forecasting with Hierarchical Graph Neural Networks Regional ocean forecasting with hierarchical graph neural networks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2410.11807. \PrintBackRefs\CurrentBib
Karlbauer \BOthers. (\APACyear2023) \APACinsertmetastarkarlbauer2023advancing{APACrefauthors}Karlbauer, M., Cresswell-Clay, N., Durran, D\BPBIR., Moreno, R\BPBIA., Kurth, T.\BCBL \BBA Butz, M\BPBIV. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleAdvancing Parsimonious Deep Learning Weather Prediction using the HEALPix Mes Advancing parsimonious deep learning weather prediction using the healpix mes.\BBCQ \APACjournalVolNumPagesAuthorea Preprints. \PrintBackRefs\CurrentBib
Khatiwala (\APACyear2024) \APACinsertmetastarkhatiwala2024efficient{APACrefauthors}Khatiwala, S. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleEfficient spin-up of Earth System Models using sequence acceleration Efficient spin-up of earth system models using sequence acceleration.\BBCQ \APACjournalVolNumPagesScience Advances1018eadn2839. \PrintBackRefs\CurrentBib
Kochkov \BOthers. (\APACyear2024) \APACinsertmetastarkochkov2024neural{APACrefauthors}Kochkov, D., Yuval, J., Langmore, I., Norgaard, P., Smith, J., Mooers, G.\BDBLothers \APACrefYearMonthDay2024. \BBOQ\APACrefatitleNeural general circulation models for weather and climate Neural general circulation models for weather and climate.\BBCQ \APACjournalVolNumPagesNature1–7. \PrintBackRefs\CurrentBib
Levitus \BOthers. (\APACyear2015) \APACinsertmetastarwoa2013{APACrefauthors}Levitus, S., Boyer, T., Garcia, H., Locarnini, R., Zweng, M., Mishonov, A.\BDBLSeidov, D. \APACrefYearMonthDay2015. \APACrefbtitleWorld Ocean Atlas 2013 (NCEI Accession 0114815). World ocean atlas 2013 (NCEI accession 0114815). {APACrefDOI} 10.7289/v5f769gt \PrintBackRefs\CurrentBib
Liu \BOthers. (\APACyear2022) \APACinsertmetastarliu2022convnet{APACrefauthors}Liu, Z., Mao, H., Wu, C\BHBIY., Feichtenhofer, C., Darrell, T.\BCBL \BBA Xie, S. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleA convnet for the 2020s A convnet for the 2020s.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF conference on computer vision and pattern recognition Proceedings of the ieee/cvf conference on computer vision and pattern recognition (\BPGS 11976–11986). \PrintBackRefs\CurrentBib
Loose \BOthers. (\APACyear2022) \APACinsertmetastarLoose2022{APACrefauthors}Loose, N., Abernathey, R., Grooms, I., Busecke, J., Guillaumin, A., Yankovsky, E.\BDBLMartin, P. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleGCM-Filters: A Python Package for Diffusion-based Spatial Filtering of Gridded Data Gcm-filters: A python package for diffusion-based spatial filtering of gridded data.\BBCQ \APACjournalVolNumPagesJournal of Open Source Software7703947. {APACrefURL} https://doi.org/10.21105/joss.03947 {APACrefDOI} 10.21105/joss.03947 \PrintBackRefs\CurrentBib
Maher \BOthers. (\APACyear2021) \APACinsertmetastarmaher2021large{APACrefauthors}Maher, N., Milinski, S.\BCBL \BBA Ludwig, R. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleLarge ensemble climate model simulations: introduction, overview, and future prospects for utilising multiple types of large ensemble Large ensemble climate model simulations: introduction, overview, and future prospects for utilising multiple types of large ensemble.\BBCQ \APACjournalVolNumPagesEarth System Dynamics122401–418. \PrintBackRefs\CurrentBib
Mahesh \BOthers. (\APACyear2024) \APACinsertmetastarmahesh2024huge{APACrefauthors}Mahesh, A., Collins, W., Bonev, B., Brenowitz, N., Cohen, Y., Elms, J.\BDBLothers \APACrefYearMonthDay2024. \BBOQ\APACrefatitleHuge ensembles part i: Design of ensemble weather forecasts using spherical fourier neural operators Huge ensembles part i: Design of ensemble weather forecasts using spherical fourier neural operators.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2408.03100. \PrintBackRefs\CurrentBib
Manshausen \BOthers. (\APACyear2024) \APACinsertmetastarmanshausen2024generative{APACrefauthors}Manshausen, P., Cohen, Y., Pathak, J., Pritchard, M., Garg, P., Mardani, M.\BDBLBrenowitz, N. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleGenerative data assimilation of sparse weather station observations at kilometer scales Generative data assimilation of sparse weather station observations at kilometer scales.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2406.16947. \PrintBackRefs\CurrentBib
Patel \BOthers. (\APACyear2024) \APACinsertmetastarpatel2024exploring{APACrefauthors}Patel, D., Arcomano, T., Hunt, B., Szunyogh, I.\BCBL \BBA Ott, E. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleExploring the Potential of Hybrid Machine-Learning/Physics-Based Modeling for Atmospheric/Oceanic Prediction Beyond the Medium Range Exploring the potential of hybrid machine-learning/physics-based modeling for atmospheric/oceanic prediction beyond the medium range.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2405.19518. \PrintBackRefs\CurrentBib
Price \BOthers. (\APACyear2023) \APACinsertmetastarprice2023gencast{APACrefauthors}Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T\BPBIR., El-Kadi, A., Masters, D.\BDBLothers \APACrefYearMonthDay2023. \BBOQ\APACrefatitleGencast: Diffusion-based ensemble forecasting for medium-range weather Gencast: Diffusion-based ensemble forecasting for medium-range weather.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2312.15796. \PrintBackRefs\CurrentBib
Ronneberger \BOthers. (\APACyear2015) \APACinsertmetastarronneberger2015u{APACrefauthors}Ronneberger, O., Fischer, P.\BCBL \BBA Brox, T. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleU-net: Convolutional networks for biomedical image segmentation U-net: Convolutional networks for biomedical image segmentation.\BBCQ \BIn \APACrefbtitleMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 Medical image computing and computer-assisted intervention–miccai 2015: 18th international conference, munich, germany, october 5-9, 2015, proceedings, part iii 18 (\BPGS 234–241). \PrintBackRefs\CurrentBib
Sane \BOthers. (\APACyear2023) \APACinsertmetastarsane2023parameterizing{APACrefauthors}Sane, A., Reichl, B\BPBIG., Adcroft, A.\BCBL \BBA Zanna, L. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleParameterizing vertical mixing coefficients in the ocean surface boundary layer using neural networks Parameterizing vertical mixing coefficients in the ocean surface boundary layer using neural networks.\BBCQ \APACjournalVolNumPagesJournal of Advances in Modeling Earth Systems1510e2023MS003890. \PrintBackRefs\CurrentBib
Subel \BBA Zanna (\APACyear2024) \APACinsertmetastarsubel2024building{APACrefauthors}Subel, A.\BCBT \BBA Zanna, L. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleBuilding ocean climate emulators Building ocean climate emulators.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2402.04342. \PrintBackRefs\CurrentBib
Todd \BOthers. (\APACyear2020) \APACinsertmetastartodd2020ocean{APACrefauthors}Todd, A., Zanna, L., Couldrey, M., Gregory, J., Wu, Q., Church, J\BPBIA.\BDBLothers \APACrefYearMonthDay2020. \BBOQ\APACrefatitleOcean-only FAFMIP: Understanding regional patterns of ocean heat content and dynamic sea level change Ocean-only fafmip: Understanding regional patterns of ocean heat content and dynamic sea level change.\BBCQ \APACjournalVolNumPagesJournal of Advances in Modeling Earth Systems128e2019MS002027. \PrintBackRefs\CurrentBib
Tsujino \BOthers. (\APACyear2020) \APACinsertmetastartsujino_2020{APACrefauthors}Tsujino, H., Urakawa, L\BPBIS., Griffies, S\BPBIM., Danabasoglu, G., Adcroft, A\BPBIJ., Amaral, A\BPBIE.\BDBLYu, Z. \APACrefYearMonthDay2020\APACmonth08. \BBOQ\APACrefatitleEvaluation of global ocean–sea-ice model simulations based on the experimental protocols of the Ocean Model Intercomparison Project phase 2 (OMIP-2) Evaluation of global ocean–sea-ice model simulations based on the experimental protocols of the Ocean Model Intercomparison Project phase 2 (OMIP-2).\BBCQ \APACjournalVolNumPagesGeoscientific Model Development1383643–3708. {APACrefURL} https://gmd.copernicus.org/articles/13/3643/2020/ {APACrefDOI} https://doi.org/10.5194/gmd-13-3643-2020 \PrintBackRefs\CurrentBib
Wang \BOthers. (\APACyear2024) \APACinsertmetastarwang2024coupled{APACrefauthors}Wang, C., Pritchard, M\BPBIS., Brenowitz, N., Cohen, Y., Bonev, B., Kurth, T.\BDBLPathak, J. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleCoupled Ocean-Atmosphere Dynamics in a Machine Learning Earth System Model Coupled ocean-atmosphere dynamics in a machine learning earth system model.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2406.08632. \PrintBackRefs\CurrentBib
Watt-Meyer \BOthers. (\APACyear2023) \APACinsertmetastarwatt2023ace{APACrefauthors}Watt-Meyer, O., Dresdner, G., McGibbon, J., Clark, S\BPBIK., Henn, B., Duncan, J.\BDBLothers \APACrefYearMonthDay2023. \BBOQ\APACrefatitleACE: A fast, skillful learned global atmospheric model for climate prediction Ace: A fast, skillful learned global atmospheric model for climate prediction.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2310.02074. \PrintBackRefs\CurrentBib
Xiong \BOthers. (\APACyear2023) \APACinsertmetastarxiong2023ai{APACrefauthors}Xiong, W., Xiang, Y., Wu, H., Zhou, S., Sun, Y., Ma, M.\BCBL \BBA Huang, X. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleAI-GOMS: Large AI-Driven Global Ocean Modeling System Ai-goms: Large ai-driven global ocean modeling system.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2308.03152. \PrintBackRefs\CurrentBib
Zanna \BOthers. (\APACyear2019) \APACinsertmetastarzanna2019global{APACrefauthors}Zanna, L., Khatiwala, S., Gregory, J\BPBIM., Ison, J.\BCBL \BBA Heimbach, P. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleGlobal reconstruction of historical ocean heat storage and transport Global reconstruction of historical ocean heat storage and transport.\BBCQ \APACjournalVolNumPagesProceedings of the National Academy of Sciences11641126–1131. \PrintBackRefs\CurrentBib
Zhuang \BOthers. (\APACyear2023) \APACinsertmetastarxesmf{APACrefauthors}Zhuang, J., raphael dussin, Huard, D., Bourgault, P., Banihirwe, A., Raynaud, S.\BDBLLi, X. \APACrefYearMonthDay2023\APACmonth09. \APACrefbtitlepangeo-data/xESMF: v0.8.2. pangeo-data/xesmf: v0.8.2. \APACaddressPublisherZenodo. {APACrefURL} https://doi.org/10.5281/zenodo.8356796 {APACrefDOI} 10.5281/zenodo.8356796 \PrintBackRefs\CurrentBib

Supporting Information

Text S1. Here we describe how we calculated $\operatorname{Q}_{anom}$ .

\operatorname{Q}_{anom}(t,y,x)=\operatorname{Q}(t,y,x)-Clim(\operatorname{Q})(t,y,x)

(3)

where Clim is the climatology of $\operatorname{Q}$ over the entire data.

Text S2. Calculation of Metrics

Consider a predicted ocean state $\boldsymbol{\tilde{\Phi}}_{t}^{[j,k,l]}$ , its corresponding ground truth state $\boldsymbol{\Phi}_{t}^{[j,k,l]}$ at time $t$ , channel $j$ , latitude $k$ and longitude $l$ , and the normalized volume $V(j,k,l)$ at channel $j$ , latitude $k$ and longitude $l$ .