Search papers, labs, and topics across Lattice.
2
0
4
0
Even the top-performing MLLMs struggle with visual reasoning, achieving only 64% accuracy on a benchmark designed to reflect real-world diversity.
VLMs can get a 10% boost in spatial reasoning and 3D understanding by training on just 10,000 synthetic images generated automatically from task keywords.