Search papers, labs, and topics across Lattice.
1
0
3
4
Current text-to-video models can generate visually appealing videos, but they often fail to accurately depict how actions change the state of objects, like a potato being peeled.