Search papers, labs, and topics across Lattice.
2
0
4
MLLMs are failing to visually track events in videos, performing only modestly above baseline despite strong results on other benchmarks.
Generate minute-long videos with compelling narrative structure and local realism, even with limited long-form training data, by cleverly combining supervised flow matching for global coherence with mode-seeking alignment to a short-video teacher for local fidelity.