Search papers, labs, and topics across Lattice.
5
0
9
5
Unlock the full potential of your pretrained video diffusion models with a surprisingly simple four-stage post-training framework that drastically improves visual quality, temporal coherence, and instruction following.
Bridging the gap between human manipulation and robotic control, JoyAI-RA unlocks enhanced cross-embodiment behavior learning through multi-source pretraining.
Spatial reasoning gets a major boost: OpenSpatial-3M, a new dataset, enables models to leapfrog existing benchmarks by 19%.
Existing image editing models fall short when it comes to precise spatial manipulations, but a new benchmark and dataset reveal the path to closing the gap.
Achieve real-time, synchronized audio-visual generation at 25 FPS by distilling a bidirectional diffusion model into a fast, autoregressive architecture, overcoming training instability with novel alignment and token handling techniques.