Search papers, labs, and topics across Lattice.
3
0
6
0
Today's best video models achieve near-zero success rates on interactive video generation, revealing a stark gap in multimodal reasoning and physical grounding.
Vision-language models falter at the fine-grained temporal recognition crucial for surgical video understanding, while SurgRec excels.
Current video benchmarks are too simple; UniVBench offers the first unified framework to measure the integrated capabilities of video foundation models using complex, multi-shot videos and a standardized evaluation system.