Search papers, labs, and topics across Lattice.
2
0
5
Current vision-language-action models choke on dynamic robotic manipulation because they lack spatiotemporal reasoning, but a new dataset and architecture, DOMINO and PUMA, close the gap.
VideoLLMs can now watch and think *simultaneously*, achieving 15x faster response times and improved accuracy on video understanding tasks.