Search papers, labs, and topics across Lattice.
University of Toronto
1
0
3
7
Masked diffusion language models can now achieve 21.8x better compute efficiency than autoregressive models, thanks to binary encoding and index shuffling.