Search papers, labs, and topics across Lattice.
1
0
3
Rewarding *correct* answers in multimodal reasoning can actually *worsen* reasoning quality, but a simple groupwise ranking of solution trajectories significantly boosts reliability.