Search papers, labs, and topics across Lattice.
This paper introduces Reinforce to Learn, Elect to Reason (RLER), a dual-paradigm approach for video reasoning that decouples evidence generation from answer selection. During training, RLER uses group-relative reinforcement learning with novel rewards (frame-sensitive, think-transparency, anti-repetition) to encourage structured and verifiable reasoning traces. At inference, RLER employs a train-free orchestrator to generate diverse reasoning candidates, score them based on evidence consistency, and perform an evidence-weighted election, achieving state-of-the-art results across eight benchmarks with a 6.3% average improvement over base models.
Explicitly teaching models to generate and leverage verifiable evidence during both training and inference unlocks state-of-the-art video reasoning performance, even with a small ensemble of candidates.
Video reasoning has advanced with large multimodal models (LMMs), yet their inference is often a single pass that returns an answer without verifying whether the reasoning is evidence-aligned. We introduce Reinforce to Learn, Elect to Reason (RLER), a dual paradigm that decouples learning to produce evidence from obtaining a reliable answer. In RLER-Training, we optimize the policy with group-relative reinforcement learning (RL) and 3 novel task-driven rewards: Frame-sensitive reward grounds reasoning on explicit key frames, Think-transparency reward shapes readable and parsable reasoning traces, and Anti-repetition reward boosts information density. These signals teach the model to emit structured, machine-checkable evidence and potentiate reasoning capabilities. In RLER-Inference, we apply a train-free orchestrator that generates a small set of diverse candidates, parses their answers and cited frames, scores them by evidence consistency, confidence, transparency, and non-redundancy, and then performs a robust evidence-weighted election. This closes the loop between producing and using evidence, improving reliability and interpretability without enlarging the model. We comprehensively evaluate RLER against various open-source and RL-based LMMs on 8 representative benchmarks. RLER achieves state of the art across all benchmarks and delivers an average improvement of 6.3\% over base models, while using on average 3.1 candidates per question, indicating a favorable balance between compute and quality. The results support a simple thesis: making evidence explicit during learning and electing by evidence during inference is a robust path to trustworthy video reasoning.