Search papers, labs, and topics across Lattice.
Ant Group
2
0
5
0
Omni-modal LLMs can ace captioning and QA, but AVID reveals they're surprisingly bad at spotting audio-visual inconsistencies in videos, a crucial skill for trustworthy AI.
Ditch the slow, iterative zooming during MLLM inference: Region-to-Image Distillation lets you bake those agentic zooming benefits directly into a single forward pass.