Search papers, labs, and topics across Lattice.
4
0
7
3
Audio-language models can now reason about 30-minute-long audio clips with timestamp-grounded intermediate steps, unlocking a new level of fine-grained understanding.
Steering vectors work primarily by nudging the output value (OV) circuit in attention, not by re-weighting attention scores, and can be drastically sparsified without losing effectiveness.
AVLLMs may "hear" at intermediate layers, but they largely ignore audio cues in favor of vision when generating text, revealing a fundamental modality bias.
LALMs can be easily tricked into "hearing" things that aren't there, with success rates as high as 95% on targeted attacks.