Search papers, labs, and topics across Lattice.
2
0
4
0
Even the top-performing MLLMs struggle with visual reasoning, achieving only 64% accuracy on a benchmark designed to reflect real-world diversity.
Camera pose, largely ignored in video LLMs, unlocks significant gains in spatial reasoning and even improves general video QA when used as a lightweight supervisory signal.