Search papers, labs, and topics across Lattice.
University of Michigan, Ann Arbor
3
0
7
Sparse prefilling can dramatically accelerate long-context inference in diffusion language models, achieving up to 28x speedup without sacrificing quality.
LLM agents struggle to generalize from experience to reusable skills, often performing worse than simply replaying past trajectories, revealing a critical gap in current abstraction methods.
Text-based speculative decoding falls flat for vision-language models, but ViSkip dynamically adapts to vision tokens for state-of-the-art acceleration.