"The Enstrophy Reward"

Subgrid-scale closures are the necessary lies of climate modeling. When you simulate atmospheric or oceanic turbulence at affordable resolution, the small-scale dynamics below your grid must be represented by approximate models. These closures are typically tuned offline against high-resolution reference data. The problem is that offline-tuned closures tend to over-diffuse: they smooth out the energy cascades that produce extreme weather events. The simulation is stable but the extremes are missing.

Reinforcement learning, trained online with an enstrophy spectrum reward, recovers the extremes.

The key innovation is what the learning signal is: the enstrophy spectrum, estimated from a few high-fidelity samples. Enstrophy — the integral of squared vorticity — characterizes the energy cascade structure of two-dimensional and geostrophic turbulence. By rewarding closures that preserve the enstrophy spectrum rather than minimizing pointwise error, the learning process targets the cascade dynamics that generate extreme events rather than the average flow that suppresses them.

The result is simulations with up to five orders of magnitude fewer degrees of freedom that still capture the statistics of extreme events. The learned closures are adaptive — they adjust their behavior based on local flow conditions rather than applying a fixed dissipation rate. This adaptivity is why they outperform conventional closures, which must be conservative enough to ensure stability across all conditions and therefore over-diffuse everywhere to avoid blowing up anywhere.

The deeper lesson is about what you reward. Minimize mean-squared error, and you get closures that reproduce the average. Preserve the spectrum, and you get closures that reproduce the structure. The extreme events are not in the mean — they are in the cascade. Reward the cascade, and the extremes follow.