"The Thermodynamic Curriculum"

Curriculum learning in reinforcement learning means presenting tasks in a sequence that accelerates training. Start easy, increase difficulty, reach the target task. The schedule is typically designed by hand or tuned empirically. There is no principled answer to what "optimal" means for a curriculum, because there is no clear objective function for the ordering of tasks.

Non-equilibrium thermodynamics provides one.

The framework treats reward parameters as coordinates on a task manifold. An RL agent at equilibrium with a given task occupies a point on this manifold. Changing the task parameters pushes the agent out of equilibrium. The agent adapts — explores, updates its policy — and re-equilibrates at the new task. A curriculum is a path through the task manifold, and different paths require different amounts of adaptation effort.

The minimum excess work principle from thermodynamics says the cheapest path between two equilibrium states — the one that dissipates the least energy — is a geodesic on the manifold equipped with the Fisher information metric. Applied to curriculum learning: the optimal curriculum minimizes total adaptation effort by following the shortest path in task space, where distance is measured by how much the agent's policy distribution changes.

The resulting algorithm, MEW (Minimum Excess Work), schedules temperature annealing in maximum-entropy RL by computing geodesics on the task manifold. The optimal schedule is neither linear nor exponential — it follows the manifold's curvature, slowing down where the policy landscape changes rapidly and speeding up where it is flat.

The deeper claim: learning efficiency is a thermodynamic quantity. The cost of adaptation is dissipated work, and the best curriculum is the one that wastes the least. The connection is not metaphorical. The same equations govern both.