Search papers, labs, and topics across Lattice.
1
0
3
Masking just 5% of attention heads in vision-language models tanks performance on long-context tasks, revealing a surprisingly sparse and critical set of "multimodal retrieval heads" that attend to both text and images.