Search papers, labs, and topics across Lattice.
The paper investigates the counter-intuitive degradation in geometric reasoning performance of Multimodal Large Language Models (MLLMs) when Supervised Fine-Tuning (SFT) is applied to interleaved plot-solution data. It identifies that SFT leads to distributional alignment, causing the model to reproduce the format of interleaved plotting without internalizing the causal dependency between the plot and reasoning steps. To address this, the authors propose Faire, a reinforcement learning framework that enforces causal constraints to achieve functional alignment between plotting and reasoning. Experiments demonstrate that Faire significantly improves the model's geometric reasoning capabilities by effectively internalizing the plotting process.
SFT on interleaved plot-solution data *hurts* geometric reasoning in MLLMs, but a novel RL framework called Faire flips the script to achieve state-of-the-art performance by enforcing causal constraints.
Solving complex geometric problems inherently requires interleaved reasoning: a tight alternation between constructing diagrams and performing logical deductions. Although recent Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities in visual generation and plotting, we identify a counter-intuitive and underexplored phenomenon. Naively applying Supervised Fine-Tuning (SFT) on interleaved plot-solution data leads to a substantial degradation in reasoning performance compared to text-only baselines. We argue that this failure stems from a fundamental limitation of SFT, which primarily induces distributional alignment: the model learns to reproduce the surface format of interleaved plotting but fails to internalize the causal dependency between the generated plot and reasoning steps. To overcome this limitation, we propose Faire (Functional alignment for interleaved reasoning), a reinforcement learning framework that enforces three casual constraints to move beyond superficial imitation toward functional alignment. Extensive experiments show that Faire induces a qualitative shift in model behavior in which the plotting is effectively internalized, yielding competitive performance on challenging geometric reasoning benchmarks.