Search papers, labs, and topics across Lattice.
1
0
3
1
Diffusion language models can achieve faster convergence and improved accuracy simply by swapping token-choice routing for expert-choice routing, and further benefit from allocating more compute to early denoising steps.