Search papers, labs, and topics across Lattice.
1
0
3
Attention sinks, considered essential in autoregressive language models, turn out to be surprisingly prunable in diffusion language models, leading to better efficiency.