Search papers, labs, and topics across Lattice.
2
0
5
Disentangling perception and reasoning with role-specific rewards in multimodal LLMs boosts accuracy by 7 points, revealing a critical bottleneck in existing joint optimization approaches.
Get better image captions without more data: reinforcement learning can train vision-language models to focus on image details by maximizing the similarity between images retrieved using the generated captions.