Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
1
0
3
0
Omni-modal LLMs can ace captioning and QA, but AVID reveals they're surprisingly bad at spotting audio-visual inconsistencies in videos, a crucial skill for trustworthy AI.