Search papers, labs, and topics across Lattice.
3
0
6
1
Unlock the full potential of your pretrained video diffusion models with a surprisingly simple four-stage post-training framework that drastically improves visual quality, temporal coherence, and instruction following.
Achieve real-time, synchronized audio-visual generation at 25 FPS by distilling a bidirectional diffusion model into a fast, autoregressive architecture, overcoming training instability with novel alignment and token handling techniques.
VLA models can be compressed to 29% of their original VRAM with minimal performance loss by intelligently quantizing different channels based on their impact on action execution.