Search papers, labs, and topics across Lattice.
2
0
6
2
Semantic masks are all you need: predicting mask dynamics in world models yields surprisingly robust and generalizable robot policies compared to predicting raw pixels.
Achieve real-time, synchronized audio-visual generation at 25 FPS by distilling a bidirectional diffusion model into a fast, autoregressive architecture, overcoming training instability with novel alignment and token handling techniques.