Search papers, labs, and topics across Lattice.
3
0
6
Achieving long-range consistency in video generation without excessive computational overhead is now feasible with MilliVid's hierarchical token approach.
Forget finetuning video models for each robot: a single, action-free video world model can drive diverse robots when paired with a carefully designed inverse dynamics model.
Encoder-decoder architectures can beat decoder-only transformers in novel view synthesis, overturning conventional wisdom with a compute-optimal design (SVSM) that slashes training costs.