Search papers, labs, and topics across Lattice.
1
0
3
9
By jointly modeling video dynamics and actions, DiT4DiT achieves 10x sample efficiency and 7x faster convergence in robot policy learning, showing that video generation can be a powerful scaling proxy.