Search papers, labs, and topics across Lattice.
HKUST(GZ
2
0
4
Current audio-visual generation models struggle to maintain coherence and alignment when scaling to minute-long content, a problem exposed by the new LongAV-Compass benchmark.
Current MLLM benchmarks are missing the forest for the trees: Agentic-MME reveals that strong final-answer accuracy masks surprisingly poor tool use and planning in complex multimodal tasks.