Search papers, labs, and topics across Lattice.
The University of Hong Kong
3
0
5
Sparse prefilling can dramatically accelerate long-context inference in diffusion language models, achieving up to 28x speedup without sacrificing quality.
By structuring diffusion-based driving models around a "scaffold" of frozen structural tokens, Fast-dDrive achieves a 12x speedup over autoregressive baselines while improving trajectory accuracy.
Swap out slow, one-token-at-a-time generation in VLMs for a 6x speed boost, without sacrificing quality, using a surprisingly simple direct conversion to block-diffusion decoding.