Search papers, labs, and topics across Lattice.
1
0
3
AVLLMs may "hear" at intermediate layers, but they largely ignore audio cues in favor of vision when generating text, revealing a fundamental modality bias.