The Minimum Description Length principle selects models by measuring the total cost of encoding both the model and the data given the model. Shorter total description means better model. MDL is a selection criterion — you train several models, compute their description lengths, and pick the shortest. It operates after training, as a judge.
This paper turns the judge into a driver.
The MDL principle is embedded directly into the training dynamics through a geometric construction. The model's learned representations define a manifold — the cognitive manifold — whose geometry evolves during training via Ricci flow. Ricci flow smooths the manifold by moving toward constant curvature, which in the information-geometric setting corresponds to simplifying the model's internal representation. But pure Ricci flow doesn't know about data fidelity. The innovation is coupling the flow with an MDL Drive term that penalizes description length as the manifold evolves.
The result is autonomous compression during training. The model simultaneously fits the data and simplifies itself, with the balance controlled by the geometric dynamics rather than by a hyperparameter. The theoretical guarantees are strong: monotonic decrease of description length during training, finite topological phase transitions where the representation undergoes qualitative simplification, and exponential convergence under convexity assumptions.
The shift from selection to optimization matters. As a selection criterion, MDL evaluates models that already exist. As a training force, it shapes models that are being created. The compression is not applied post-hoc — it is constitutive of the learning process. The model that emerges is not a good model that was then compressed, but a model whose structure was determined by the compression principle from the first gradient step.
The description length is not measured. It is minimized — and the minimization is the training.