Search papers, labs, and topics across Lattice.
2
0
5
3
AVLLMs may "hear" at intermediate layers, but they largely ignore audio cues in favor of vision when generating text, revealing a fundamental modality bias.
LALMs can be easily tricked into "hearing" things that aren't there, with success rates as high as 95% on targeted attacks.