Search papers, labs, and topics across Lattice.
2
0
3
Disentangling perception and reasoning with role-specific rewards in multimodal LLMs boosts accuracy by 7 points, revealing a critical bottleneck in existing joint optimization approaches.
Stop wasting tokens: a novel RL framework slashes LRM token generation by 40% without sacrificing accuracy by adaptively controlling reasoning length.