Search papers, labs, and topics across Lattice.
2
0
5
LLMs can predict multiple tokens in parallel without any training, simply by cleverly probing their embedding space with dynamically generated mask tokens.
Diffusion language models have surprisingly redundant early layers, enabling nearly 20% FLOPs reduction at inference time via layer skipping without sacrificing performance.