Search papers, labs, and topics across Lattice.
3
0
6
3
Forget expensive human annotations: this unsupervised method trains reward models that steer LLM reasoning just as well as, or even better than, their supervised counterparts.
Multimodal LLMs still struggle to faithfully recreate webpages from videos, particularly in capturing fine-grained style and motion, despite advances in other areas.
Even the best multimodal agents struggle with realistic visual scenarios, achieving only 27% accuracy on the new AgentVista benchmark that demands long-horizon tool use across web search, image search, and code.