Central Institute of Mental HealthHeidelbergTübingenApr 28, 2026arXiv:2604.25904

Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

Andre Herz, Daniel Durstewitz, Georgia Koppe

AI Summary

This paper analyzes the discrepancy between Identity Teacher Forcing (ITF) and marginal likelihood optimization for training recurrent neural networks to model chaotic dynamical systems. By comparing the objective-induced curvatures of ITF and marginal likelihood in a probabilistic switching augmentation of Almost-Linear RNNs, the authors show that ITF inflates curvature due to conditioning on a single forced regime path. Experiments on Lorenz-63 demonstrate that while evidence fine-tuning improves held-out evidence, it can degrade dynamical quantities of interest compared to ITF-pretrained models.

Key Contribution

Teacher forcing, while effective for training RNNs on chaotic systems, fundamentally mismatches the optimization geometry of the true marginal likelihood, potentially harming the learned dynamics.

Abstract

Identity teacher forcing (ITF) enables stable training of deterministic recurrent surrogates for chaotic dynamical systems and has been highly effective for dynamical systems reconstruction (DSR) with recurrent neural networks (RNNs), including interpretable almost-linear RNNs (AL-RNNs). However, as an intervention-based prediction loss (and thus a generalized Bayes update), teacher forcing need not match the free-running model's marginal likelihood geometry. We compare the objective-induced curvatures of ITF and marginal likelihood in a probabilistic switching augmentation of AL-RNNs, estimating ambiguity-aware observed information via Louis' identity. In the switching setting studied here, conditioning on a single forced regime path (as ITF does) inflates curvature, while marginal likelihood curvature is reduced by a missing-information correction when multiple switching explanations remain plausible. In Lorenz-63 experiments, windowed evidence fine-tuning improves held-out evidence but can degrade dynamical quantities of interest (QoIs) relative to ITF-pretrained models.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

Related Papers