Search papers, labs, and topics across Lattice.
Corresponding Author
4
0
8
0
dMoE slashes the memory footprint of Mixture-of-Experts Diffusion LLMs by up to 80% without sacrificing performance, finally making them practical.
Naively quantizing autoregressive video diffusion models tanks performance due to exponentially increasing error accumulation across frames and heterogeneous outlier patterns, but Q-ARVD solves it.
DMax unlocks faster diffusion language model decoding by reframing the process as iterative self-correction in embedding space, achieving up to 2x speedup without sacrificing accuracy.
LLMs can be finetuned to hide malicious prompts and responses in plain sight using steganography, bypassing safety filters and creating an "invisible safety threat."