Search papers, labs, and topics across Lattice.
University of Technology Sydney
3
0
6
0
LLMs can achieve state-of-the-art unsupervised multimodal entity linking by reasoning over diverse evidence types, including graph-based neighborhood information.
MLLMs still struggle to reason about everyday situations when they require identifying and using visual clues, despite excelling at tasks relying on pre-existing knowledge.
Finally, AI can automate the tedious process of Foley sound design, generating perfectly synced stereo audio that even meets professional film production standards.