Search papers, labs, and topics across Lattice.
2
0
5
5
Unlock the full potential of your pretrained video diffusion models with a surprisingly simple four-stage post-training framework that drastically improves visual quality, temporal coherence, and instruction following.
Achieve real-time, synchronized audio-visual generation at 25 FPS by distilling a bidirectional diffusion model into a fast, autoregressive architecture, overcoming training instability with novel alignment and token handling techniques.